Skip to main content

UDiscoverIt Downloads — Materials Discovery

All projects

Materials Discovery

Credit: NIST

The materials discovery project is a collaboration between the Institute for Computational Sustainability (ICS), the Cornell Materials Science & Engineering Department, and the Joint Center for Artificial Photosynthesis (JCAP).

This is contributing to a larger effort in high-throughput materials science, which is part of the Materials Genome Initiative (MGI).

Density of States Prediction for Materials Discovery via Contrastive Learning from Probabilistic Embeddings

Machine learning for materials discovery has largely focused on predicting an individual scalar rather than multiple related properties, where spectral properties are an important example. Fundamental spectral properties include the phonon density of states (phDOS) and the electronic density of states (eDOS), which individually or collectively are the origins of a breadth of materials observables and functions. Building upon the success of graph attention networks for encoding crystalline materials, we introduce a probabilistic embedding generator specifically tailored to the prediction of spectral properties. Coupled with supervised contrastive learning, our materials-to-spectrum (Mat2Spec) model outperforms state-of-the-art methods for predicting ab initio phDOS and eDOS for crystalline materials. We demonstrate Mat2Spec's ability to identify eDOS gaps below the Fermi energy, validating predictions with ab initio calculations and thereby discovering candidate thermoelectrics and transparent conductors. Mat2Spec is an exemplar framework for predicting spectral properties of materials via strategically incorporated machine learning techniques.


Related Publications

DRNets for Crystal-Structure Phase Mapping

Crystal-structure phase mapping is a core, long-standing challenge in materials science that requires identifying crystal phases, or mixtures thereof, in x-ray diffraction measurements of synthesized materials. Phase mapping algorithms have been developed that excel at solving systems with up to several unique phase mixtures, wherein each phase has a readily distinguishable diffraction pattern. However, complexities such as dozens of phase mixtures, alloy-dependent variation in diffraction patterns, and multiple compositional degrees of freedom pose challenges for materials science experts and state-of-the-art algorithms, creating a major bottleneck in high-throughput materials discovery. Herein we show how to automate crystal-structure phase mapping. We formulate phase mapping as an unsupervised pattern demixing problem and describe how to solve it using Deep Reasoning Networks (DRNets). Given the scientific complexity of crystal-structure phase mapping, we also provide an intuitive explanation of DRNets framework based on Multi-MNIST-Sudoku, a variant of the Sudoku game that involves demixing two completed overlapping hand-written Sudokus. DRNets combine deep learning with constraint reasoning for incorporating prior scientific knowledge and consequently require only a modest amount of (unlabeled) data. DRNets compensate for the limited data by exploiting and magnifying the rich prior-knowledge about the thermodynamic rules governing the mixtures of crystals. DRNets are designed with an interpretable latent space for encoding prior-knowledge domain constraints and seamlessly integrate constraint reasoning into neural network optimization. DRNets surpass previous approaches on crystal-structure phase mapping, unraveling the Bi-Cu-V oxide phase diagram, and aiding the discovery of solar-fuels materials.


Related Publications

Active Learning for Autonomous Synthesis of Metastable Materials

Autonomous experimentation enabled by artificial intelligence (AI) offers a new paradigm for accelerating scientific discovery. Non-equilibrium materials synthesis is emblematic of complex, resource-intensive experimentation whose acceleration would be a watershed for materials discovery and development. The mapping of non-equilibrium synthesis phase diagrams has recently been accelerated via high throughput experimentation but still limits materials research because the parameter space is too vast to be exhaustively explored. We demonstrate accelerated synthesis and exploration of metastable materials through hierarchical autonomous experimentation governed by the Scientific Autonomous Reasoning Agent (SARA). SARA integrates robotic materials synthesis and characterization along with a hierarchy of AI methods that efficiently reveal the structure of processing phase diagrams. SARA designs lateral gradient laser spike annealing (lg-LSA) experiments for parallel materials synthesis and employs optical spectroscopy to rapidly identify phase transitions. Efficient exploration of the multi-dimensional parameter space is achieved with nested active learning (AL) cycles built upon advanced machine learning models that incorporate the underlying physics of the experiments as well as end-to-end uncertainty quantification. With this, and the coordination of AL at multiple scales, SARA embodies AI harnessing complex scientific tasks. We demonstrate its performance by autonomously mapping synthesis phase boundaries for the Bi2O3 system, leading to orders-of-magnitude acceleration in the establishment of a synthesis phase diagram that includes conditions for kinetically stabilizing δ-Bi2O3 at room temperature, a critical development for electrochemical technologies such as solid oxide fuel cells.


Related Publications


We introduce the Hierarchical Correlation Learning for Multi-property Prediction framework (H-CLMP, pronounced H-CLAMP) that seamlessly integrates (i) prediction using only a material's composition, (ii) learning and exploitation of correlations among target properties in multi-target regression, and (iii) leveraging training data from tangential domains via generative transfer learning. The model is demonstrated for prediction of spectral optical absorption of complex metal oxides spanning 69 3-cation metal oxide composition spaces. H-CLMP accurately predicts non-linear composition-property relationships in composition spaces for which no training data is available, which broadens the purview of machine learning to the discovery of materials with exceptional properties. This achievement results from the principled integration of latent embedding learning, property correlation learning, generative transfer learning, and attention models. The best performance is obtained using H-CLMP with Transfer learning (H-CLMP(T)) wherein a generative adversarial network is trained on computational density of states data and deployed in the target domain to augment prediction of optical absorption from composition. H-CLMP(T) aggregates multiple knowledge sources with a framework that is well-suited for multi-target regression across the physical sciences.


Related Publications


The Multi-Component Background Learning (MCBL) model is an unsupervised probabilistic learning approach that analyzes large spectroscopic data collections to identify multiple background sources and establish the probability that any given data point contains a signal of interest. The approach is suitable to any type of data where the signal of interest is a positive addition to the background signals. While the model can incorporate prior knowledge, it does not require knowledge of the signals since the shapes of the background signals, the noise levels, and the signal of interest are simultaneously learned via a probabilistic matrix factorization framework. Automated identification of interpretable signals by unsupervised probabilistic learning avoids the injection of human bias and expedites signal extraction in large datasets.


Related Publications


A C++ source code package for IAFD is available. IAFD is a solver for the phase map identification problem, based on convolutive nonnegative matrix factorization. It includes performance improvements in the handling of constraints in comparison to AgileFD, and also supports additional constraints.


Related Publications


A C++ source code package for AgileFD is available. AgileFD is a solver for the phase map identification problem, based on convolutional nonnegative matrix factorization, with extensions to address additional challenges in this problem, including physical constraints. An example of the use of AgileFD and Phase Mapper is described in the video: Using Phase Mapper to discover a new light absorber material at JCAP.


Related Publications

Data Instances

We provide a synthetic instance generator for the phase map identification problem, as well as a sample synthetic dataset. A real instances dataset contains the XRD diffraction patterns for various real binary, ternary and quaternary systems. This synthetic generator requires Python and is cross-platform. For more information about how to use the generator, please read the README file contained in the package, as well as the associated publication.


Related Publications

Publications — Materials Discovery