# Datasets

These datasets accompany the following publication:

Shufeng Kong, Dan Guevarra, Carla P. Gomes, John M. Gregoire. "[Materials Representation and Transfer Learning for Multi-Property Prediction](https://arxiv.org/abs/2106.02225)." arXiv:2106.02225 (2021).

We introduce the Hierarchical Correlation Learning for Multi-property Prediction framework (H-CLMP, pronounced H-CLAMP) that seamlessly integrates (i) prediction using only a material's composition, (ii) learning and exploitation of correlations among target properties in multi-target regression, and (iii) leveraging training data from tangential domains via generative transfer learning. The model is demonstrated for prediction of spectral optical absorption of complex metal oxides spanning 69 3-cation metal oxide composition spaces. H-CLMP accurately predicts non-linear composition-property relationships in composition spaces for which no training data is available, which broadens the purview of machine learning to the discovery of materials with exceptional properties. This achievement results from the principled integration of latent embedding learning, property correlation learning, generative transfer learning, and attention models. The best performance is obtained using H-CLMP with Transfer learning (H-CLMP(T)) wherein a generative adversarial network is trained on computational density of states data and deployed in the target domain to augment prediction of optical absorption from composition. H-CLMP(T) aggregates multiple knowledge sources with a framework that is well-suited for multi-target regression across the physical sciences.

## DOS dataset

The DOS dataset was collected from the [Materials Project](https://materialsproject.org/).

## Spectral optical absorption dataset

Our spectral optical absorption dataset was released through in the [CaltechDATA repository](https://data.caltech.edu/records/1878), DOI: <https://doi.org/10.22002/D1.1878>.

## Preprocessed dataset

This preprocessed dataset is for H-CLMP. We generated a dict file that combines the generated DOS features and spectral optical absorption properties via the compounds' elemental compositions.
