# UDiscoverIt Materials Discovery Downloads

The materials discovery project is a collaboration between the Institute for Computational Sustainability (ICS), the van Dover Group (Materials Science & Engineering Department), the Energy Materials Center at Cornell (emc²), and the Joint Center for Artificial Photosynthesis (JCAP).

This is contributing to a larger effort in high-throughput materials science, which is part of the Materials Genome Initiative (MGI).

We introduce the Hierarchical Correlation Learning for Multi-property Prediction framework (H-CLMP, pronounced H-CLAMP) that seamlessly integrates (i) prediction using only a material's composition, (ii) learning and exploitation of correlations among target properties in multi-target regression, and (iii) leveraging training data from tangential domains via generative transfer learning. The model is demonstrated for prediction of spectral optical absorption of complex metal oxides spanning 69 3-cation metal oxide composition spaces. H-CLMP accurately predicts non-linear composition-property relationships in composition spaces for which no training data is available, which broadens the purview of machine learning to the discovery of materials with exceptional properties. This achievement results from the principled integration of latent embedding learning, property correlation learning, generative transfer learning, and attention models. The best performance is obtained using H-CLMP with Transfer learning (H-CLMP(T)) wherein a generative adversarial network is trained on computational density of states data and deployed in the target domain to augment prediction of optical absorption from composition. H-CLMP(T) aggregates multiple knowledge sources with a framework that is well-suited for multi-target regression across the physical sciences.

### Files

### Related Publications

- Shufeng Kong, Dan Guevarra, Carla P. Gomes, John M. Gregoire. Materials Representation and Transfer Learning for Multi-Property Prediction.
*CoRR abs/2106.02225*(2021) [pdf]

The Multi-Component Background Learning (MCBL) model is an unsupervised probabilistic learning approach that analyzes large spectroscopic data collections to identify multiple background sources and establish the probability that any given data point contains a signal of interest. The approach is suitable to any type of data where the signal of interest is a positive addition to the background signals. While the model can incorporate prior knowledge, it does not require knowledge of the signals since the shapes of the background signals, the noise levels, and the signal of interest are simultaneously learned via a probabilistic matrix factorization framework. Automated identification of interpretable signals by unsupervised probabilistic learning avoids the injection of human bias and expedites signal extraction in large datasets.

### Files

### Related Publications

- Sebastian E. Ament, Helge S. Stein, Dan Guevarra, Lan Zhou, Joel A. Haber, David A. Boyd, Mitsutaro Umehara, John M. Gregoire, Carla P. Gomes. Multi-component background learning automates signal detection for spectroscopic data.
*npj Computational Materials*(2019). [pdf]

A C++ source code package for IAFD is available. IAFD is a solver for the phase map identification problem, based on convolutive nonnegative matrix factorization. It includes performance improvements in the handling of constraints in comparison to AgileFD, and also supports additional constraints.

### Files

### Related Publications

- Carla P. Gomes, Junwen Bai, Yexiang Xue, Johan Björck, Brendan Rappazzo, Sebastian Ament, Richard Bernstein, Shufeng Kong, Santosh K. Suram, R. Bruce van Dover, John M. Gregoire. CRYSTAL: a multi-agent AI system for automated mapping of materials' crystal structures.
*MRS Communications*(2019). [pdf] - Junwen Bai, Johan Bjorck, Yexiang Xue, Santosh K. Suram, John M. Gregoire, Carla P. Gomes. Relaxation Methods for Constrained Matrix Factorization Problems: Solving the Phase Mapping Problem in Materials Discovery.
*CPAIOR 2017*: 104-112 [pdf]

A C++ source code package for AgileFD is available. AgileFD is a solver for the phase map identification problem, based on convolutional nonnegative matrix factorization, with extensions to address additional challenges in this problem, including physical constraints. An example of the use of AgileFD and Phase Mapper is described in the video: Using Phase Mapper to discover a new light absorber material at JCAP.

### Files

### Related Publications

- Junwen Bai, Yexiang Xue, Johan Bjorck, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Santosh K. Suram, Robert Bruce van Dover, John M. Gregoire, Carla P. Gomes. Phase Mapper: Accelerating Materials Discovery with AI.
*AI Mag. 39(1)*: 15-26 (2018) [pdf] - Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Liane Longpre, Santosh K. Suram, Robert Bruce van Dover, John M. Gregoire, Carla P. Gomes. Phase-Mapper: An AI Platform to Accelerate High Throughput Materials Discovery.
*AAAI 2017*: 4635-4643 [pdf] - Santosh K. Suram, Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Lan Zhou, R. Bruce van Dover, Carla P. Gomes, John M. Gregoire. Automated Phase Mapping with AgileFD and its Application to Light Absorber Discovery in the V-Mn-Nb Oxide System.
*ACS Combinatorial Science*(2017). [pdf]

We provide a synthetic instance generator for the phase map identification problem, as well as a sample synthetic dataset. A real instances dataset contains the XRD diffraction patterns for various real binary, ternary and quaternary systems. This synthetic generator requires Python and is cross-platform. For more information about how to use the generator, please read the README file contained in the package, as well as the associated publication.

### Files

- Synthetic instance generator
- Synthetic generator README
- Synthetic instances dataset
- Real instances dataset

### Related Publications

### 2021

- Sebastian Ament, Maximilian Amsler, Duncan R. Sutherland, Ming-Chiang Chang, Dan Guevarra, Aine B. Connolly, John M. Gregoire, Michael O. Thompson, Carla P. Gomes, R. Bruce van Dover. Autonomous synthesis of metastable materials.
*CoRR abs/2101.07385*(2021) [pdf] - Shufeng Kong, Dan Guevarra, Carla P. Gomes, John M. Gregoire. Materials Representation and Transfer Learning for Multi-Property Prediction.
*CoRR abs/2106.02225*(2021) [pdf]

### 2020

- Di Chen, Yiwei Bai, Wenting Zhao, Sebastian Ament, John M. Gregoire, Carla P. Gomes. Deep Reasoning Networks for Unsupervised Pattern De-mixing with Constraint Reasoning.
*ICML 2020*: 1500-1509 [pdf]

### 2019

- Junwen Bai, Zihang Lai, Runzhe Yang, Yexiang Xue, John M. Gregoire, Carla P. Gomes. Imitation Refinement for X-ray Diffraction Signal Processing.
*ICASSP 2019*: 3337-3341 [pdf] - Sebastian Ament, John M. Gregoire, Carla P. Gomes. Exponentially-Modified Gaussian Mixture Model: Applications in Spectroscopy.
*CoRR abs/1902.05601*(2019) [pdf] - Di Chen, Yiwei Bai, Wenting Zhao, Sebastian Ament, John M. Gregoire, Carla P. Gomes. Deep Reasoning Networks: Thinking Fast and Slow.
*CoRR abs/1906.00855*(2019) [pdf] - Carla P. Gomes, Bart Selman, John M. Gregoire. Artificial intelligence for materials discovery.
*MRS Bulletin*(2019). [pdf] - Sebastian E. Ament, Helge S. Stein, Dan Guevarra, Lan Zhou, Joel A. Haber, David A. Boyd, Mitsutaro Umehara, John M. Gregoire, Carla P. Gomes. Multi-component background learning automates signal detection for spectroscopic data.
*npj Computational Materials*(2019). [pdf] - Carla P. Gomes, Junwen Bai, Yexiang Xue, Johan Björck, Brendan Rappazzo, Sebastian Ament, Richard Bernstein, Shufeng Kong, Santosh K. Suram, R. Bruce van Dover, John M. Gregoire. CRYSTAL: a multi-agent AI system for automated mapping of materials' crystal structures.
*MRS Communications*(2019). [pdf]

### 2018

- Junwen Bai, Yexiang Xue, Johan Bjorck, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Santosh K. Suram, Robert Bruce van Dover, John M. Gregoire, Carla P. Gomes. Phase Mapper: Accelerating Materials Discovery with AI.
*AI Mag. 39(1)*: 15-26 (2018) [pdf] - Junwen Bai, Sebastian Ament, Guillaume Perez, John M. Gregoire, Carla P. Gomes. An Efficient Relaxed Projection Method for Constrained Non-negative Matrix Factorization with Application to the Phase-Mapping Problem in Materials Science.
*CPAIOR 2018*: 52-62 [pdf] - Junwen Bai, Runzhe Yang, Yexiang Xue, John M. Gregoire, Carla P. Gomes. Imitation Refinement.
*CoRR abs/1805.08698*(2018) [pdf]

### 2017

- Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Liane Longpre, Santosh K. Suram, Robert Bruce van Dover, John M. Gregoire, Carla P. Gomes. Phase-Mapper: An AI Platform to Accelerate High Throughput Materials Discovery.
*AAAI 2017*: 4635-4643 [pdf] - Junwen Bai, Johan Bjorck, Yexiang Xue, Santosh K. Suram, John M. Gregoire, Carla P. Gomes. Relaxation Methods for Constrained Matrix Factorization Problems: Solving the Phase Mapping Problem in Materials Discovery.
*CPAIOR 2017*: 104-112 [pdf] - Santosh K. Suram, Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Lan Zhou, R. Bruce van Dover, Carla P. Gomes, John M. Gregoire. Automated Phase Mapping with AgileFD and its Application to Light Absorber Discovery in the V-Mn-Nb Oxide System.
*ACS Combinatorial Science*(2017). [pdf]

### 2016

- Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Liane Longpre, Santosh K. Suram, Robert Bruce van Dover, John M. Gregoire, Carla P. Gomes. Phase-Mapper: An AI Platform to Accelerate High Throughput Materials Discovery.
*CoRR abs/1610.00689*(2016) [pdf]

### 2015

- Stefano Ermon, Ronan Le Bras, Santosh K. Suram, John M. Gregoire, Carla P. Gomes, Bart Selman, Robert Bruce van Dover. Pattern Decomposition with Complex Combinatorial Constraints: Application to Materials Discovery.
*AAAI 2015*: 636-643 [pdf] - Yexiang Xue, Stefano Ermon, Carla P. Gomes, Bart Selman. Uncovering Hidden Structure through Parallel Problem Decomposition for the Set Basis Problem.
*AAAI Workshop: Computational Sustainability 2015*[pdf] - Yexiang Xue, Stefano Ermon, Carla P. Gomes, Bart Selman. Uncovering Hidden Structure through Parallel Problem Decomposition for the Set Basis Problem: Application to Materials Discovery.
*IJCAI 2015*: 146-155 [pdf]

### 2014

- Ronan Le Bras, Richard Bernstein, John M. Gregoire, Santosh K. Suram, Carla P. Gomes, Bart Selman, R. Bruce van Dover. Challenges in Materials Discovery - Synthetic Generator and Real Datasets.
*AAAI 2014*: 438-443 [pdf] - Yexiang Xue, Stefano Ermon, Carla P. Gomes, Bart Selman. Uncovering Hidden Structure through Parallel Problem Decomposition.
*AAAI 2014*: 3144-3145 [pdf] - Ronan Le Bras, Carla P. Gomes, Bart Selman. On the Erdős Discrepancy Problem.
*CP 2014*: 440-448 [pdf] - Ronan Le Bras, Yexiang Xue, Richard Bernstein, Carla P. Gomes, Bart Selman. A Human Computation Framework for Boosting Combinatorial Solvers.
*HCOMP 2014*[pdf] - Stefano Ermon, Ronan Le Bras, Santosh K. Suram, John M. Gregoire, Carla P. Gomes, Bart Selman, Robert Bruce van Dover. Pattern Decomposition with Complex Combinatorial Constraints: Application to Materials Discovery.
*CoRR abs/1411.7441*(2014) [pdf]

### 2013

- Ronan LeBras, Richard Bernstein, Carla P. Gomes, Bart Selman, R. Bruce van Dover. Crowdsourcing Backdoor Identification for Combinatorial Optimization.
*IJCAI 2013*: 2840-2847 [pdf] - Marcelo Finger, Ronan LeBras, Carla P. Gomes, Bart Selman. Solutions for Hard and Soft Constraints Using Optimized Probabilistic Satisfiability.
*SAT 2013*: 233-249 [pdf]

### 2012

- Stefano Ermon, Ronan LeBras, Carla P. Gomes, Bart Selman, R. Bruce van Dover. SMT-Aided Combinatorial Materials Discovery.
*SAT 2012*: 172-185 [pdf]

### 2011

### Contact Information

Dept. Computer Science

353 Gates Hall

Cornell University

Ithaca, NY 14853 USA

Faculty of Computing and Information Science

Dept. Information Science

Dyson School of Applied Economics and Management

607-255-9189 (voice); 607-255-4428 (fax)

**gomes at cs.cornell.edu**

http://www.cs.cornell.edu/gomes