The Multi-Component Background Learning (MCBL) model is an unsupervised probabilistic learning approach that analyzes large spectroscopic data collections to identify multiple background sources and establish the probability that any given data point contains a signal of interest. The approach is suitable to any type of data where the signal of interest is a positive addition to the background signals. While the model can incorporate prior knowledge, it does not require knowledge of the signals since the shapes of the background signals, the noise levels, and the signal of interest are simultaneously learned via a probabilistic matrix factorization framework. Automated identification of interpretable signals by unsupervised probabilistic learning avoids the injection of human bias and expedites signal extraction in large datasets.
- Sebastian E. Ament, Helge S. Stein, Dan Guevarra, Lan Zhou, Joel A. Haber, David A. Boyd, Mitsutaro Umehara, John M. Gregoire, Carla P. Gomes. Multi-component background learning automates signal detection for spectroscopic data. npj Computational Materials (2019). [pdf]
A C++ source code package for IAFD is available. IAFD is a solver for the phase map identification problem, based on convolutive nonnegative matrix factorization. It includes performance improvements in the handling of constraints in comparison to AgileFD, and also supports additional constraints.
- Carla P. Gomes, Junwen Bai, Yexiang Xue, Johan Björck, Brendan Rappazzo, Sebastian Ament, Richard Bernstein, Shufeng Kong, Santosh K. Suram, R. Bruce van Dover, John M. Gregoire. CRYSTAL: a multi-agent AI system for automated mapping of materials' crystal structures. MRS Communications (2019). [pdf]
- Junwen Bai, Johan Bjorck, Yexiang Xue, Santosh K. Suram, John M. Gregoire, Carla P. Gomes. Relaxation Methods for Constrained Matrix Factorization Problems: Solving the Phase Mapping Problem in Materials Discovery. CPAIOR 2017: 104-112 [pdf]
A C++ source code package for AgileFD is available. AgileFD is a solver for the phase map identification problem, based on convolutional nonnegative matrix factorization, with extensions to address additional challenges in this problem, including physical constraints. An example of the use of AgileFD and Phase Mapper is described in the video: Using Phase Mapper to discover a new light absorber material at JCAP.
- Junwen Bai, Yexiang Xue, Johan Bjorck, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Santosh K. Suram, Robert Bruce van Dover, John M. Gregoire, Carla P. Gomes. Phase Mapper: Accelerating Materials Discovery with AI. AI Magazine 39(1): 15-26 (2018) [pdf]
- Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Liane Longpre, Santosh K. Suram, Robert Bruce van Dover, John M. Gregoire, Carla P. Gomes. Phase-Mapper: An AI Platform to Accelerate High Throughput Materials Discovery. AAAI 2017: 4635-4643 [pdf]
- Santosh K. Suram, Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Lan Zhou, R. Bruce van Dover, Carla P. Gomes, John M. Gregoire. Automated Phase Mapping with AgileFD and its Application to Light Absorber Discovery in the V-Mn-Nb Oxide System. ACS Combinatorial Science (2017). [pdf]
We provide a synthetic instance generator for the phase map identification problem, as well as a sample synthetic dataset. A real instances dataset contains the XRD diffraction patterns for various real binary, ternary and quaternary systems. This synthetic generator requires Python and is cross-platform. For more information about how to use the generator, please read the README file contained in the package, as well as the associated publication.
- Synthetic instance generator
- Synthetic generator README
- Synthetic instances dataset
- Real instances dataset
- Ronan Le Bras, Richard Bernstein, John M. Gregoire, Santosh K. Suram, Carla P. Gomes, Bart Selman, R. Bruce van Dover. Challenges in Materials Discovery - Synthetic Generator and Real Datasets. AAAI 2014: 438-443 [pdf]
Pareto Frontier Tree Structured Networks
This is a C++ source code package implementing a dynamic programming algorithm for computing the Pareto frontier (for two criteria) on tree structured networks. A dataset for hydroelectric dams, using the energy and GHG emissions criteria, are provided.
- Dynamic Programming for Computing the Pareto Frontier (for Energy and GHGs) on Tree Structured Networks
- Rafael M. Almeida, Qinru Shi, Jonathan M. Gomes-Selman, Xiaojian Wu, Yexiang Xue, Hector Angarita, Nathan Barros, Bruce R. Forsberg, Roosevelt García-Villacorta, Stephen K. Hamilton, John M. Melack, Mariana Montoya, Guillaume Perez, Suresh A. Sethi, Carla P. Gomes, Alexander S. Flecker. Reducing greenhouse gas emissions of Amazon hydropower with strategic dam planning. Nature Communications (2019). [pdf]
- Jonathan M. Gomes-Selman, Qinru Shi, Yexiang Xue, Roosevelt García-Villacorta, Alexander S. Flecker, Carla P. Gomes. Boosting Efficiency for Computing the Pareto Frontier on Tree Structured Networks. CPAIOR 2018: 263-279 [pdf]
- Xiaojian Wu, Jonathan Gomes-Selman, Qinru Shi, Yexiang Xue, Roosevelt García-Villacorta, Elizabeth Anderson, Suresh Sethi, Scott Steinschneider, Alexander Flecker, Carla P. Gomes. Efficiently Approximating the Pareto Frontier: Hydropower Dam Placement in the Amazon Basin. AAAI 2018: 849-859 [pdf]
607-255-9189 (voice); 607-255-4428 (fax)
gomes at cs.cornell.edu