Interleaved Agile Combinatorial Factor Decomposition (IAFD) Modified 4/24/2019 Version 1.2 ================================================ Copyright 2017 Institute for Computational Sustainability Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ *** Citation *** This software accompanies the following publications: Gomes, C. P., Bai, J., Xue, Y., Bjorck, J., Rappazzo, B., Ament, S., Bernstein, R., Suram, S. K., van Dover, R. B., Gregoire, J. M. (2019). CRYSTAL: a multi-agent AI system for automated mapping of materials' crystal structures. MRS Communications, 1-9. DOI: 10.1557/mrc.2019.50 Bai, J., Bjorck, J., Xue, Y., Suram, S. K., Gregoire, J., & Gomes, C. (2017). Relaxation Methods for Constrained Matrix Factorization Problems: Solving the Phase Mapping Problem in Materials Discovery. Fourteenth International Conference on Integration of Artificial Intelligence and Operations Research Techniques in Constraint Programming (CPAIOR), 104-112. DOI: 10.1007/978-3-319-59776-8_9 Bibtex: @article{Gomes2019, author = {Gomes, Carla P. and Bai, Junwen and Xue, Yexiang and Bj{\"{o}}rck, Johan and Rappazzo, Brendan and Ament, Sebastian and Bernstein, Richard and Kong, Shufeng and Suram, Santosh K. and van Dover, R. Bruce and Gregoire, John M.}, doi = {10.1557/mrc.2019.50}, issn = {2159-6859}, journal = {MRS Communications}, month = {apr}, pages = {1--9}, title = {{CRYSTAL: a multi-agent AI system for automated mapping of materials' crystal structures}}, url = {https://www.cambridge.org/core/product/identifier/S2159685919000508/type/journal{\_}article}, year = {2019} } @article{Bai2017, author = {Bai, Junwen and Bjorck, Johan and Xue, Yexiang and Suram, Santosh K. and Gregoire, John and Gomes, Carla}, doi = {10.1007/978-3-319-59776-8_9}, journal = {Fourteenth International Conference on Integration of Artificial Intelligence and Operations Research Techniques in Constraint Programming (CPAIOR)}, pages = {104--112}, title = {{Relaxation Methods for Constrained Matrix Factorization Problems: Solving the Phase Mapping Problem in Materials Discovery}}, url = {http://link.springer.com/10.1007/978-3-319-59776-8{\_}9}, year = {2017} } *** Introduction and Prerequisites *** IAFD has two external library dependencies which must be installed: 1) Armadillo, see: http://arma.sourceforge.net 2) ILOG/CPLEX Optimization Studio: https://www.ibm.com/developerworks/downloads/ws/ilogcplex/ IAFD also uses TCLAP (http://tclap.sourceforge.net/) to process command-line arguments and Tino Kluge's spline implementation (http://kluge.in-chemnitz.de/opensource/spline/). These are header-only includes and are provided in the source distribution. *** Compilation *** An example Makefile for GNU make and g++ is included, but will need to be modified to reflect the installation location of CPLEX, and the BLAS library (like openblas) used with Armadillo. Additional changes will be needed for alternate compilers. *** Usage *** ./iafd [-h] [OPTIONS] --inst INSTANCE_FILENAME --m M --k K --sol SOLUTION_FILENAME Options: --pointCorrect correct or further refine a single point without changing anything else --initH the file for H initialization --MatchSigmaTol sigma boundary when matching with ICSD patterns --MatchShiftTol shift tolerance when matching with ICSD patterns --ConnectAgent whether to enforce Connectivity rule or not --AlloyAgent whether to enforce Alloy rule or not --AGrounds the rounds of AgileFD-Gibbs loop --sticks the file containing all the stick patterns --AlloyTol the tolerance for calculating shifts --neighbors (required) the file telling the neighbors of every sample point --oneVersion whether to enforce one-used-shifted-version constraint --slice specify the slice constraint --Gibbs whether to enforce Gibbs phase rule --mipgap (required) mipgap for MIP --sparsity The overall sparsity coefficient --stepsize Initial stepsize. default shift value 0 means the user just want to use the std stepsize --sampleInit The filename containing initialzation from single-phase sample points --valueInit Initialization file containing seeds for phases and phase freezing --rec Whether output reconstructed signals --humanInput Human Input txt file --shiftInfo Whether you want to create a text file that contains information of shifts of each sample point --beta The weighting coefficient of the sparsity term --seed random seed for the random number generator.(The default value -1 means time(0)) --c Related to termination criterion: In one iteration, if (old_cost-new_cost) The maximum time(seconds) spent to train the model that you could accept --m (required) The number of possible different shifts --k (required) The number of phases --sol (required) The output file name of the solution --inst (required) Input instance file --addNoise whether to add noise or not --noiseStd define the standard deviation if add noise --, --ignore_rest Ignores the rest of the labeled arguments following this flag. --version Displays version information and exits. -h, --help Displays usage information and exits. *** Command Example *** ./iafd --inst input/Ta-Rh-Pd/Ta-Rh-Pd_inst.txt --m 30 --k 6 --time 10 --sol output/Ta-Rh-Pd_output.txt --c 1e-5 --beta 1.0 --mipgap 0.1 --Gibbs --sparsity 0.1 --neighbors input/Ta-Rh-Pd/Ta-Rh-Pd_edges.txt --AGrounds 3 --MatchShiftTol 0.1 --MatchSigmaTol 2.0 --ConnectAgent --AlloyAgent --AlloyTol 0.003 --sticks input/Ta-Rh-Pd/sticks/sticks.txt *** Instance File Format *** For more information, see this publication and associated datasets: Le Bras, R., Bernstein, R., Gregoire, J. M., Suram, S. K., Gomes, C. P., Selman, B., & van Dover, R. B. (2014). A Computational Challenge Problem in Materials Discovery: Synthetic Problem Generator and Real-World Datasets. In Twenty-Eighth International Conference on Artificial Intelligence (AAAI'14). Example: Description=Human readable description of the instance, including origin and any preprocessing UUID=unique-identifier Format_Version=1.0 //Number of elements M=3 //Element labels Elements=Ta,Rh,Pd //Sample count N=197 //Coordinate labels for coordinate systems, e.g. substrate deposition and composition Deposition=X,Y Composition=Ta,Rh,Pd // Coordinate values data: lists of length N X=-4.7857,-6.81E-05,4.7857,...... Y=-33.5,-33.5,-33.5,....... Ta=0.8072182,0.71555525,0.5822375,...... Rh=0.10984136,0.11910693,0.1169005,...... Pd=0.08294043,0.16533777,0.30086198,...... //Q values: data domain: length L Q=16.000000,16.150000,16.300000,16.450000,16.600000,...... //Intensity measurements: length L for each of N samples I1=3.508038,6.712338,5.096438,...... ...... I197=0.000000,18.186840,19.199840,...... *** Value Initialization File Format *** FLAG: --valueInit A value initialization file specifies the basis patterns to use for initialization, as well as related configuration options. The format is as follows: // Comments can be included as complete lines beginning with two slashes // Q values corresponding to the basis vectors, in the same units as in the instance file (e.g. nm^-1 or A^-1). // Basis vectors are resampled, so these values do not need to exactly match the ones in the instance file. Q=1,1.1,1.2,... // Basis patterns are only shifted to the right (positive shift) in IAFD, // so initial or frozen basis patterns should be specified as far // to the left as expected. The V parameter is a multiplicative shift which // is applied to the Q vector and affects all basis patterns in the same way. // The following effectively shifts all basis patterns 1% to the left. V=0.99 // B1...BK specify the basis patterns you wish to initialize. The indices must be // sequential, but fewer than K can be specified if desired. B1=0.3123,0.545234,...... B2=0.4324,0.454243,...... B3=0.42345,1.42344,...... ... // Whether to freeze each basis pattern: 0=seed, 1=freeze F1=1 F2=0 F3=1 ... // Lists sample indices in which each phase is allowed to appear. For example, // phase 1 could appear at sample points 1,2,54,65,76......, etc. If S is not // specified for an initialized basis, it can appear at any sample by default. S1=1,2,54,65,76,...... S2=1,2,4,7,111,...... S3=67,89,111,...... ... *** Sample Initialization File Format *** FLAG: --sampleInit Sample initialization is an alternative to value initialization, where the desired basis patterns are taken from samples in the instance file. To initialize phase 1 using sample point 123 and phase 2 from sample point 56: B1=123 B2=56 Q should not be specified because the patterns come from the instance file. However, V, F, and S are set the same as in a value initialization file. *** H tensor Initialization File Format *** FLAG: --initH Usually, users could use SOLUTION(s) generated by IAFD as the initialization file for initializing H tensor. If users want to use an independent initialization file for intializing H tensor, please follow the following format: H[*](1,1)=0.000000,0.000000,0.000000,0.321602,0.357336,0.168312,0.000000,0.000000,0.000000,0.000000 ...... H[*](9,1)=0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000 H[*](1,2)=0.000000,0.000000,0.000000,0.299825,0.333138,0.110080,0.000000,0.000000,0.000000,0.000000 ...... H[*](9,2)=0.000000,0.000000,0.000000,0.056948,0.063276,0.023566,0.000000,0.000000,0.000000,0.000000 ...... The m-th value in "H[*](k,n)=......" denotes the activation of the m-th shifted version of phase k at sample n. *** ICDD sticks *** FLAG: --sticks Matching basis patterns with icdd sticks is embedded into the code. But the sticks should be specified in advance. Users can use FLAG "--sticks" to tell the path of the sticks file. The file should follow the format: Q0=6.467400,8.588200,9.204500,12.101500,12.934900,13.392000,16.587100,17.176300,17.400900,17.765100,... P0=0.126100,0.200200,0.035000,0.032000,0.102100,0.068100,0.010000,0.006000,0.121100,0.211200,1.00000,... Q1=17.020800,17.102500,21.050000,21.116100,21.590100,24.770400,24.890200,26.424900,29.183400,30.0547,... P1=0.203200,0.202200,1.000000,0.983000,0.191200,0.264300,0.329300,0.119100,0.001000,0.084100,0.08410,... ... ... Each pair (Qi,Pi) specifies one ICDD pattern (q values and corresponding intensities). *** Neighbors *** FLAG: --neighbors This flag is used to specify neighbors: 0,1,7,8,9,18 1,0,9,2 2,9,10,3,1 3,2,4,10,11,12,21 4,13,3,12,5,23 5,36,6,13,24,4 6,5,14,15,24,25 7,16,17,28,0,8 ...... Each line tells the neighbors of one sample. For example, "0,1,7,8,9,18" means sample 0 has neighbors 1,7,8,9,18. Note that all the edges are bidirectional. Thus sample 1 also has one neighbor sample 0. *** Solution File Format *** For more information, see this publication and associated datasets: Le Bras, R., Bernstein, R., Gregoire, J. M., Suram, S. K., Gomes, C. P., Selman, B., & van Dover, R. B. (2014). A Computational Challenge Problem in Materials Discovery: Synthetic Problem Generator and Real-World Datasets. In Twenty-Eighth International Conference on Artificial Intelligence (AAAI'14). Example: Description=Human-readable description of solution method, parameters, and corresponding instance UUID=unique-identifier-of-instance Format_Version=1.0 // Number of phases K=5 //List of solution models: each entry lists the variable prefixes associated with a particular mode //[Q,R,C] or [Q,R] is required in order to provide an algorithm-agnostic representation (Q assumed to //match instance file if not listed). Params=[Q,R,C],[Q,B,C,S],[Q,B,H] //Values for the listed parameters at each sample; representations for most parameters can be algorithm-specific //Q values typically match the instance file Q=...... //Basis vector representation B1=...... ...... B5=...... // Phase concentrations at each sample C1=...... ...... C197=...... //Representation of each phase as reconstructed at each sample R1_1=...... ...... R1_5=...... ...... R_197_5=...... //Per-phase multiplicative (scalar) shift at each sample, S1=...... ...... S197=...... //H[*](k,n) Tensor (specific to IAFD - S is computed as a weighted aveage using these as weights) H[*](1,1)=...... ...... H[*](5,197)=...... *** Contact me *** If you have any questions or suggestions, please feel free to contact me. This is my email address: jb2467@cornell.edu.