Skip to main content
Carla Gomes

Carla P. Gomes

gomes at
Professor and Director
Institute for Computational Sustainability

UDiscoverIt Materials Discovery Downloads

All projects

Materials Discovery

Credit: NIST

The materials discovery project is a collaboration between the Institute for Computational Sustainability (ICS), the van Dover Group (Materials Science & Engineering Department), the Energy Materials Center at Cornell (emc²), and the Joint Center for Artificial Photosynthesis (JCAP).

This is contributing to a larger effort in high-throughput materials science, which is part of the Materials Genome Initiative (MGI).


We introduce the Hierarchical Correlation Learning for Multi-property Prediction framework (H-CLMP, pronounced H-CLAMP) that seamlessly integrates (i) prediction using only a material's composition, (ii) learning and exploitation of correlations among target properties in multi-target regression, and (iii) leveraging training data from tangential domains via generative transfer learning. The model is demonstrated for prediction of spectral optical absorption of complex metal oxides spanning 69 3-cation metal oxide composition spaces. H-CLMP accurately predicts non-linear composition-property relationships in composition spaces for which no training data is available, which broadens the purview of machine learning to the discovery of materials with exceptional properties. This achievement results from the principled integration of latent embedding learning, property correlation learning, generative transfer learning, and attention models. The best performance is obtained using H-CLMP with Transfer learning (H-CLMP(T)) wherein a generative adversarial network is trained on computational density of states data and deployed in the target domain to augment prediction of optical absorption from composition. H-CLMP(T) aggregates multiple knowledge sources with a framework that is well-suited for multi-target regression across the physical sciences.


Related Publications


The Multi-Component Background Learning (MCBL) model is an unsupervised probabilistic learning approach that analyzes large spectroscopic data collections to identify multiple background sources and establish the probability that any given data point contains a signal of interest. The approach is suitable to any type of data where the signal of interest is a positive addition to the background signals. While the model can incorporate prior knowledge, it does not require knowledge of the signals since the shapes of the background signals, the noise levels, and the signal of interest are simultaneously learned via a probabilistic matrix factorization framework. Automated identification of interpretable signals by unsupervised probabilistic learning avoids the injection of human bias and expedites signal extraction in large datasets.


Related Publications


A C++ source code package for IAFD is available. IAFD is a solver for the phase map identification problem, based on convolutive nonnegative matrix factorization. It includes performance improvements in the handling of constraints in comparison to AgileFD, and also supports additional constraints.


Related Publications


A C++ source code package for AgileFD is available. AgileFD is a solver for the phase map identification problem, based on convolutional nonnegative matrix factorization, with extensions to address additional challenges in this problem, including physical constraints. An example of the use of AgileFD and Phase Mapper is described in the video: Using Phase Mapper to discover a new light absorber material at JCAP.


Related Publications

Data Instances

We provide a synthetic instance generator for the phase map identification problem, as well as a sample synthetic dataset. A real instances dataset contains the XRD diffraction patterns for various real binary, ternary and quaternary systems. This synthetic generator requires Python and is cross-platform. For more information about how to use the generator, please read the README file contained in the package, as well as the associated publication.


Related Publications

Materials Discovery Publications












Contact Information

Dept. Computer Science
353 Gates Hall
Cornell University
Ithaca, NY 14853 USA

Faculty of Computing and Information Science
Dept. Information Science
Dyson School of Applied Economics and Management

607-255-9189 (voice); 607-255-4428 (fax)
gomes at