# UDiscoverIt Downloads — Materials Discovery

The materials discovery project is a collaboration between the Institute for Computational Sustainability (ICS), the Cornell Materials Science & Engineering Department, and the Joint Center for Artificial Photosynthesis (JCAP).

This is contributing to a larger effort in high-throughput materials science, which is part of the Materials Genome Initiative (MGI).

Machine learning for materials discovery has largely focused on predicting an individual scalar rather than multiple related properties, where spectral properties are an important example. Fundamental spectral properties include the phonon density of states (phDOS) and the electronic density of states (eDOS), which individually or collectively are the origins of a breadth of materials observables and functions. Building upon the success of graph attention networks for encoding crystalline materials, we introduce a probabilistic embedding generator specifically tailored to the prediction of spectral properties. Coupled with supervised contrastive learning, our materials-to-spectrum (Mat2Spec) model outperforms state-of-the-art methods for predicting ab initio phDOS and eDOS for crystalline materials. We demonstrate Mat2Spec's ability to identify eDOS gaps below the Fermi energy, validating predictions with ab initio calculations and thereby discovering candidate thermoelectrics and transparent conductors. Mat2Spec is an exemplar framework for predicting spectral properties of materials via strategically incorporated machine learning techniques.

### Files

### Related Publications

- Shufeng Kong, Francesco Ricci, Dan Guevarra, Jeffrey B. Neaton, Carla P. Gomes, John M. Gregoire. Density of states prediction for materials discovery via contrastive learning from probabilistic embeddings.
*Nature Communications*(2022). [pdf]

Crystal-structure phase mapping is a core, long-standing challenge in materials science that requires identifying crystal phases, or mixtures thereof, in x-ray diffraction measurements of synthesized materials. Phase mapping algorithms have been developed that excel at solving systems with up to several unique phase mixtures, wherein each phase has a readily distinguishable diffraction pattern. However, complexities such as dozens of phase mixtures, alloy-dependent variation in diffraction patterns, and multiple compositional degrees of freedom pose challenges for materials science experts and state-of-the-art algorithms, creating a major bottleneck in high-throughput materials discovery. Herein we show how to automate crystal-structure phase mapping. We formulate phase mapping as an unsupervised pattern demixing problem and describe how to solve it using Deep Reasoning Networks (DRNets). Given the scientific complexity of crystal-structure phase mapping, we also provide an intuitive explanation of DRNets framework based on Multi-MNIST-Sudoku, a variant of the Sudoku game that involves demixing two completed overlapping hand-written Sudokus. DRNets combine deep learning with constraint reasoning for incorporating prior scientific knowledge and consequently require only a modest amount of (unlabeled) data. DRNets compensate for the limited data by exploiting and magnifying the rich prior-knowledge about the thermodynamic rules governing the mixtures of crystals. DRNets are designed with an interpretable latent space for encoding prior-knowledge domain constraints and seamlessly integrate constraint reasoning into neural network optimization. DRNets surpass previous approaches on crystal-structure phase mapping, unraveling the Bi-Cu-V oxide phase diagram, and aiding the discovery of solar-fuels materials.

### Files

- DRNets Git repository (source code and data)
- [Video] DRNets can solve Sudoku, speed scientific discovery
- [Video] Deep Reasoning Networks: Combining deep learning with reasoning for discovery

### Related Publications

- Di Chen, Yiwei Bai, Sebastian Ament, Wenting Zhao, Dan Guevarra, Lan Zhou, Bart Selman, R. Bruce van Dover, John M. Gregoire, Carla P. Gomes. Automating crystal-structure phase mapping by combining deep learning with constraint reasoning.
*Nat. Mach. Intell. 3(9)*: 812-822 (2021)

Autonomous experimentation enabled by artificial intelligence (AI) offers a new paradigm for accelerating scientific discovery. Non-equilibrium materials synthesis is emblematic of complex, resource-intensive experimentation whose acceleration would be a watershed for materials discovery and development. The mapping of non-equilibrium synthesis phase diagrams has recently been accelerated via high throughput experimentation but still limits materials research because the parameter space is too vast to be exhaustively explored. We demonstrate accelerated synthesis and exploration of metastable materials through hierarchical autonomous experimentation governed by the Scientific Autonomous Reasoning Agent (SARA). SARA integrates robotic materials synthesis and characterization along with a hierarchy of AI methods that efficiently reveal the structure of processing phase diagrams. SARA designs lateral gradient laser spike annealing (lg-LSA) experiments for parallel materials synthesis and employs optical spectroscopy to rapidly identify phase transitions. Efficient exploration of the multi-dimensional parameter space is achieved with nested active learning (AL) cycles built upon advanced machine learning models that incorporate the underlying physics of the experiments as well as end-to-end uncertainty quantification. With this, and the coordination of AL at multiple scales, SARA embodies AI harnessing complex scientific tasks. We demonstrate its performance by autonomously mapping synthesis phase boundaries for the Bi_{2}O_{3} system, leading to orders-of-magnitude acceleration in the establishment of a synthesis phase diagram that includes conditions for kinetically stabilizing δ-Bi_{2}O_{3} at room temperature, a critical development for electrochemical technologies such as solid oxide fuel cells.

### Files

- Software to process the microscope images and reflectance data, and to perform the GP based active learning
- Raw microscope images and reflectance data for the lg-LSA annealed Bi
_{2}O_{3}thin film - [Video] SARA - the Scientific Autonomous Reasoning Agent

### Related Publications

- Sebastian Ament, Maximilian Amsler, Duncan R. Sutherland, Ming-Chiang Chang, Dan Guevarra, Aine B. Connolly, John M. Gregoire, Michael O. Thompson, Carla P. Gomes, R. Bruce van Dover. Autonomous materials synthesis via hierarchical active learning of nonequilibrium phase diagrams.
*Science Advances*(2021). [pdf]

We introduce the Hierarchical Correlation Learning for Multi-property Prediction framework (H-CLMP, pronounced H-CLAMP) that seamlessly integrates (i) prediction using only a material's composition, (ii) learning and exploitation of correlations among target properties in multi-target regression, and (iii) leveraging training data from tangential domains via generative transfer learning. The model is demonstrated for prediction of spectral optical absorption of complex metal oxides spanning 69 3-cation metal oxide composition spaces. H-CLMP accurately predicts non-linear composition-property relationships in composition spaces for which no training data is available, which broadens the purview of machine learning to the discovery of materials with exceptional properties. This achievement results from the principled integration of latent embedding learning, property correlation learning, generative transfer learning, and attention models. The best performance is obtained using H-CLMP with Transfer learning (H-CLMP(T)) wherein a generative adversarial network is trained on computational density of states data and deployed in the target domain to augment prediction of optical absorption from composition. H-CLMP(T) aggregates multiple knowledge sources with a framework that is well-suited for multi-target regression across the physical sciences.

### Files

### Related Publications

- Shufeng Kong, Dan Guevarra, Carla P. Gomes, John M. Gregoire. Materials representation and transfer learning for multi-property prediction.
*Applied Physics Reviews*(2021). [pdf]

The Multi-Component Background Learning (MCBL) model is an unsupervised probabilistic learning approach that analyzes large spectroscopic data collections to identify multiple background sources and establish the probability that any given data point contains a signal of interest. The approach is suitable to any type of data where the signal of interest is a positive addition to the background signals. While the model can incorporate prior knowledge, it does not require knowledge of the signals since the shapes of the background signals, the noise levels, and the signal of interest are simultaneously learned via a probabilistic matrix factorization framework. Automated identification of interpretable signals by unsupervised probabilistic learning avoids the injection of human bias and expedites signal extraction in large datasets.

### Files

### Related Publications

- Sebastian E. Ament, Helge S. Stein, Dan Guevarra, Lan Zhou, Joel A. Haber, David A. Boyd, Mitsutaro Umehara, John M. Gregoire, Carla P. Gomes. Multi-component background learning automates signal detection for spectroscopic data.
*npj Computational Materials*(2019). [pdf]

A C++ source code package for IAFD is available. IAFD is a solver for the phase map identification problem, based on convolutive nonnegative matrix factorization. It includes performance improvements in the handling of constraints in comparison to AgileFD, and also supports additional constraints.

### Files

### Related Publications

- Carla P. Gomes, Junwen Bai, Yexiang Xue, Johan Björck, Brendan Rappazzo, Sebastian Ament, Richard Bernstein, Shufeng Kong, Santosh K. Suram, R. Bruce van Dover, John M. Gregoire. CRYSTAL: a multi-agent AI system for automated mapping of materials' crystal structures.
*MRS Communications*(2019). [pdf] - Junwen Bai, Johan Bjorck, Yexiang Xue, Santosh K. Suram, John M. Gregoire, Carla P. Gomes. Relaxation Methods for Constrained Matrix Factorization Problems: Solving the Phase Mapping Problem in Materials Discovery.
*CPAIOR 2017*: 104-112 [pdf]

A C++ source code package for AgileFD is available. AgileFD is a solver for the phase map identification problem, based on convolutional nonnegative matrix factorization, with extensions to address additional challenges in this problem, including physical constraints. An example of the use of AgileFD and Phase Mapper is described in the video: Using Phase Mapper to discover a new light absorber material at JCAP.

### Files

### Related Publications

- Junwen Bai, Yexiang Xue, Johan Bjorck, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Santosh K. Suram, Robert Bruce van Dover, John M. Gregoire, Carla P. Gomes. Phase Mapper: Accelerating Materials Discovery with AI.
*AI Mag. 39(1)*: 15-26 (2018) [pdf] - Santosh K. Suram, Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Lan Zhou, R. Bruce van Dover, Carla P. Gomes, John M. Gregoire. Automated Phase Mapping with AgileFD and its Application to Light Absorber Discovery in the V-Mn-Nb Oxide System.
*ACS Combinatorial Science*(2017). [pdf]

We provide a synthetic instance generator for the phase map identification problem, as well as a sample synthetic dataset. A real instances dataset contains the XRD diffraction patterns for various real binary, ternary and quaternary systems. This synthetic generator requires Python and is cross-platform. For more information about how to use the generator, please read the README file contained in the package, as well as the associated publication.

### Files

- Synthetic instance generator
- Synthetic generator README
- Synthetic instances dataset
- Real instances dataset

### Related Publications

- Ronan Le Bras, Richard Bernstein, John M. Gregoire, Santosh K. Suram, Carla P. Gomes, Bart Selman, R. Bruce van Dover. Challenges in Materials Discovery - Synthetic Generator and Real Datasets.
*AAAI 2014*: 438-443 [pdf]

### 2024

- Yingheng Wang, Shufeng Kong, John M. Gregoire, Carla P. Gomes. Conformal Crystal Graph Transformer with Robust Encoding of Periodic Invariance.
*AAAI 2024*: 283-291 [pdf]

### 2023

- Yimeng Min, Ming-Chiang Chang, Shufeng Kong, John M. Gregoire, R. Bruce van Dover, Michael O. Thompson, Carla P. Gomes. Physically Informed Graph-Based Deep Reasoning Net for Efficient Combinatorial Phase Mapping.
*ICMLA 2023*: 392-399 [pdf] - Yuanqi Du, Yingheng Wang, Yining Huang, Jianan Canal Li, Yanqiao Zhu, Tian Xie, Chenru Duan, John M. Gregoire, Carla Pedro Gomes. M
^{2}Hub: Unlocking the Potential of Machine Learning for Materials Discovery.*NeurIPS 2023*[pdf] - Junwen Bai, Yuanqi Du, Yingheng Wang, Shufeng Kong, John M. Gregoire, Carla Gomes. Xtal2DoS: Attention-based Crystal to Sequence Learning for Density of States Prediction.
*CoRR abs/2302.01486*(2023) [pdf] - Yuanqi Du, Yingheng Wang, Yining Huang, Jianan Canal Li, Yanqiao Zhu, Tian Xie, Chenru Duan, John M. Gregoire, Carla P. Gomes. M
^{2}Hub: Unlocking the Potential of Machine Learning for Materials Discovery.*CoRR abs/2307.05378*(2023) [pdf] - Ming-Chiang Chang, Sebastian Ament, Maximilian Amsler, Duncan R. Sutherland, Lan Zhou, John M. Gregoire, Carla P. Gomes, R. Bruce van Dover, Michael O. Thompson. Probabilistic Phase Labeling and Lattice Refinement for Autonomous Material Research.
*CoRR abs/2308.07897*(2023) [pdf]

### 2022

- Shufeng Kong, Francesco Ricci, Dan Guevarra, Jeffrey B. Neaton, Carla P. Gomes, John M. Gregoire. Density of states prediction for materials discovery via contrastive learning from probabilistic embeddings.
*Nature Communications*(2022). [pdf] - Junwen Bai, Yuanqi Du, Yingheng Wang, Shufeng Kong, John Gregoire, Carla P Gomes. Xtal2DoS: Attention-based Crystal to Sequence Learning for Density of States Prediction.
*NeurIPS 2022 AI for Science: Progress and Promises*(2022). [pdf]

### 2021

- Di Chen, Yiwei Bai, Sebastian Ament, Wenting Zhao, Dan Guevarra, Lan Zhou, Bart Selman, R. Bruce van Dover, John M. Gregoire, Carla P. Gomes. Automating crystal-structure phase mapping by combining deep learning with constraint reasoning.
*Nat. Mach. Intell. 3(9)*: 812-822 (2021) - Sebastian Ament, Maximilian Amsler, Duncan R. Sutherland, Ming-Chiang Chang, Dan Guevarra, Aine B. Connolly, John M. Gregoire, Michael O. Thompson, Carla P. Gomes, R. Bruce van Dover. Autonomous synthesis of metastable materials.
*CoRR abs/2101.07385*(2021) [pdf] - Shufeng Kong, Dan Guevarra, Carla P. Gomes, John M. Gregoire. Materials Representation and Transfer Learning for Multi-Property Prediction.
*CoRR abs/2106.02225*(2021) [pdf] - Di Chen, Yiwei Bai, Sebastian Ament, Wenting Zhao, Dan Guevarra, Lan Zhou, Bart Selman, R. Bruce van Dover, John M. Gregoire, Carla P. Gomes. Automating Crystal-Structure Phase Mapping: Combining Deep Learning with Constraint Reasoning.
*CoRR abs/2108.09523*(2021) [pdf] - Shufeng Kong, Francesco Ricci, Dan Guevarra, Jeffrey B. Neaton, Carla P. Gomes, John M. Gregoire. Density of States Prediction for Materials Discovery via Contrastive Learning from Probabilistic Embeddings.
*CoRR abs/2110.11444*(2021). [pdf] - Sebastian Ament, Maximilian Amsler, Duncan R. Sutherland, Ming-Chiang Chang, Dan Guevarra, Aine B. Connolly, John M. Gregoire, Michael O. Thompson, Carla P. Gomes, R. Bruce van Dover. Autonomous materials synthesis via hierarchical active learning of nonequilibrium phase diagrams.
*Science Advances*(2021). [pdf] - Carla P. Gomes, Daniel Fink, R. Bruce van Dover, John M. Gregoire. Computational sustainability meets materials science.
*Nature Reviews Materials*(2021). [pdf] - Shufeng Kong, Dan Guevarra, Carla P. Gomes, John M. Gregoire. Materials representation and transfer learning for multi-property prediction.
*Applied Physics Reviews*(2021). [pdf]

### 2020

- Di Chen, Yiwei Bai, Wenting Zhao, Sebastian Ament, John M. Gregoire, Carla P. Gomes. Deep Reasoning Networks for Unsupervised Pattern De-mixing with Constraint Reasoning.
*ICML 2020*: 1500-1509 [pdf] - Duncan R. Sutherland, Aine Boyer Connolly, Maximilian Amsler, Ming-Chiang Chang, Katie Rose Gann, Vidit Gupta, Sebastian Ament, Dan Guevarra, John M. Gregoire, Carla P. Gomes, R. Bruce van Dover, Michael O. Thompson. Optical Identification of Materials Transformations in Oxide Thin Films.
*ACS Combinatorial Science*(2020). [pdf]

### 2019

- Junwen Bai, Zihang Lai, Runzhe Yang, Yexiang Xue, John M. Gregoire, Carla P. Gomes. Imitation Refinement for X-ray Diffraction Signal Processing.
*ICASSP 2019*: 3337-3341 [pdf] - Sebastian Ament, John M. Gregoire, Carla P. Gomes. Exponentially-Modified Gaussian Mixture Model: Applications in Spectroscopy.
*CoRR abs/1902.05601*(2019) [pdf] - Di Chen, Yiwei Bai, Wenting Zhao, Sebastian Ament, John M. Gregoire, Carla P. Gomes. Deep Reasoning Networks: Thinking Fast and Slow.
*CoRR abs/1906.00855*(2019) [pdf] - Carla P. Gomes, Bart Selman, John M. Gregoire. Artificial intelligence for materials discovery.
*MRS Bulletin*(2019). [pdf] - Sebastian E. Ament, Helge S. Stein, Dan Guevarra, Lan Zhou, Joel A. Haber, David A. Boyd, Mitsutaro Umehara, John M. Gregoire, Carla P. Gomes. Multi-component background learning automates signal detection for spectroscopic data.
*npj Computational Materials*(2019). [pdf] - Carla P. Gomes, Junwen Bai, Yexiang Xue, Johan Björck, Brendan Rappazzo, Sebastian Ament, Richard Bernstein, Shufeng Kong, Santosh K. Suram, R. Bruce van Dover, John M. Gregoire. CRYSTAL: a multi-agent AI system for automated mapping of materials' crystal structures.
*MRS Communications*(2019). [pdf]

### 2018

- Junwen Bai, Yexiang Xue, Johan Bjorck, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Santosh K. Suram, Robert Bruce van Dover, John M. Gregoire, Carla P. Gomes. Phase Mapper: Accelerating Materials Discovery with AI.
*AI Mag. 39(1)*: 15-26 (2018) [pdf] - Junwen Bai, Sebastian Ament, Guillaume Perez, John M. Gregoire, Carla P. Gomes. An Efficient Relaxed Projection Method for Constrained Non-negative Matrix Factorization with Application to the Phase-Mapping Problem in Materials Science.
*CPAIOR 2018*: 52-62 [pdf] - Junwen Bai, Runzhe Yang, Yexiang Xue, John M. Gregoire, Carla P. Gomes. Imitation Refinement.
*CoRR abs/1805.08698*(2018) [pdf]

### 2017

- Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Liane Longpre, Santosh K. Suram, Robert Bruce van Dover, John M. Gregoire, Carla P. Gomes. Phase-Mapper: An AI Platform to Accelerate High Throughput Materials Discovery.
*AAAI 2017*: 4635-4643 [pdf] - Junwen Bai, Johan Bjorck, Yexiang Xue, Santosh K. Suram, John M. Gregoire, Carla P. Gomes. Relaxation Methods for Constrained Matrix Factorization Problems: Solving the Phase Mapping Problem in Materials Discovery.
*CPAIOR 2017*: 104-112 [pdf] - Santosh K. Suram, Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Lan Zhou, R. Bruce van Dover, Carla P. Gomes, John M. Gregoire. Automated Phase Mapping with AgileFD and its Application to Light Absorber Discovery in the V-Mn-Nb Oxide System.
*ACS Combinatorial Science*(2017). [pdf]

### 2016

- Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Liane Longpre, Santosh K. Suram, Robert Bruce van Dover, John M. Gregoire, Carla P. Gomes. Phase-Mapper: An AI Platform to Accelerate High Throughput Materials Discovery.
*CoRR abs/1610.00689*(2016) [pdf]

### 2015

- Stefano Ermon, Ronan Le Bras, Santosh K. Suram, John M. Gregoire, Carla P. Gomes, Bart Selman, Robert Bruce van Dover. Pattern Decomposition with Complex Combinatorial Constraints: Application to Materials Discovery.
*AAAI 2015*: 636-643 [pdf] - Yexiang Xue, Stefano Ermon, Carla P. Gomes, Bart Selman. Uncovering Hidden Structure through Parallel Problem Decomposition for the Set Basis Problem.
*AAAI Workshop: Computational Sustainability 2015*[pdf] - Yexiang Xue, Stefano Ermon, Carla P. Gomes, Bart Selman. Uncovering Hidden Structure through Parallel Problem Decomposition for the Set Basis Problem: Application to Materials Discovery.
*IJCAI 2015*: 146-155 [pdf]

### 2014

- Ronan Le Bras, Richard Bernstein, John M. Gregoire, Santosh K. Suram, Carla P. Gomes, Bart Selman, R. Bruce van Dover. Challenges in Materials Discovery - Synthetic Generator and Real Datasets.
*AAAI 2014*: 438-443 [pdf] - Yexiang Xue, Stefano Ermon, Carla P. Gomes, Bart Selman. Uncovering Hidden Structure through Parallel Problem Decomposition.
*AAAI 2014*: 3144-3145 [pdf] - Ronan Le Bras, Carla P. Gomes, Bart Selman. On the Erdős Discrepancy Problem.
*CP 2014*: 440-448 [pdf] - Ronan Le Bras, Yexiang Xue, Richard Bernstein, Carla P. Gomes, Bart Selman. A Human Computation Framework for Boosting Combinatorial Solvers.
*HCOMP 2014*: 121-132 [pdf] - Stefano Ermon, Ronan Le Bras, Santosh K. Suram, John M. Gregoire, Carla P. Gomes, Bart Selman, Robert Bruce van Dover. Pattern Decomposition with Complex Combinatorial Constraints: Application to Materials Discovery.
*CoRR abs/1411.7441*(2014) [pdf]

### 2013

- Ronan LeBras, Richard Bernstein, Carla P. Gomes, Bart Selman, R. Bruce van Dover. Crowdsourcing Backdoor Identification for Combinatorial Optimization.
*IJCAI 2013*: 2840-2847 [pdf] - Marcelo Finger, Ronan LeBras, Carla P. Gomes, Bart Selman. Solutions for Hard and Soft Constraints Using Optimized Probabilistic Satisfiability.
*SAT 2013*: 233-249 [pdf]

### 2012

- Stefano Ermon, Ronan LeBras, Carla P. Gomes, Bart Selman, R. Bruce van Dover. SMT-Aided Combinatorial Materials Discovery.
*SAT 2012*: 172-185 [pdf]

### 2011

- Ronan LeBras, Theodoros Damoulas, John M. Gregoire, Ashish Sabharwal, Carla P. Gomes, R. Bruce van Dover. Constraint Reasoning and Kernel Clustering for Pattern Decomposition with Scaling.
*CP 2011*: 508-522 [pdf]