ChIMES Active Learning Driver Configuration File Options
Optional config.py Variables:
Assorted General Options
Input variable |
Variable type |
Required |
Default |
Value/Options/Notes |
---|---|---|---|---|
|
str |
N |
“” |
E-mail address for driver to sent status updates to. If blank (“”), no emails are sent. |
|
int |
N |
1 |
Only used for active learning strategies are selected. Seed for random number generator. |
|
list of str |
Y |
None |
List of atom types in system of interest, e.g. [“C”,”H”,”O”]. |
|
int |
Y |
None |
Number of different state points at which to conduct iterative learning. |
|
list of str |
Y |
[] |
List of species to track in molanal output, e.g. ["C1 O1 1(O-C)", "C1 O2 2(O-C)"]. |
|
int |
N |
0 |
Cycle at which to start including stress tensors from ALC generated configrations. |
|
str |
N |
“ALL” |
How stress tensors should be included in the fit. Options are: “DIAG” or “ALL”. |
|
int |
N |
float |
Thermal smearing temperature in K; if "None", different values are used for each case, set in the ALL_BASE_FILES traj_list.dat. |
General HPC Options
Input variable |
Variable type |
Required |
Default |
Value/Options/Notes |
---|---|---|---|---|
|
int |
N |
36 |
Number of processors per node on HPC platform. |
|
str |
N |
pbronze |
Charge bank/account name on HPC platform. |
|
str |
N |
slurm |
HPC platform type (Only “slurm” supported currently). |
|
str |
N |
/usr/tce/bin/python |
Full path to python2.X exectuable on HPC platform. |
|
bool |
N |
True |
Controls whether driver status updates are e-mailed to user. |
ChIMES LSQ Options
Input variable |
Variable type |
Required |
Default |
Value/Options/Notes |
---|---|---|---|---|
|
str |
N |
|
Path to base files required by the driver (e.g. ChIMES input files, VASP, input files, etc.) |
|
str |
N |
|
Absolute path to ChIMES_lsq executable. |
|
str |
N |
|
Absolute path to ChIMES_lsq.py (formely, lsq2.py). |
|
str |
N |
|
Absolute path to post_proc_lsq2.py. |
|
bool |
N |
False |
Should ALC-0 (or 1 if no clustering) weights be read directly from a user specified file? |
|
str |
N |
None |
Set if |
|
special |
N |
1.0 |
Weights to apply to full-frame forces - many options, see note below. |
|
special |
N |
5.0 |
Weights to apply to gas phase forces - many options, see note below. |
|
special |
N |
0.1 |
Weights to apply to full-frame energies - many options, see note below. |
|
special |
N |
0.1 |
Weights to apply to gas phase energies - many options, see note below. |
|
special |
N |
250.0 |
Weights to apply to full-frame stress tensor components - many options, see note below. |
|
str |
N |
dlasso |
Regression algorithm to use for fitting; only dlasso supported for now |
|
float |
N |
1e-5 |
Regression regularization variable. |
|
bool |
N |
True |
Controls whether A-matrix is normalized prior to solution. |
|
int |
N |
4 |
Number of nodes to use when running chimes_lsq. |
|
str |
N |
pbatch |
Queue to submit chimes_lsq job to. |
|
str |
N |
“04:00:00” |
Walltime for chimes_lsq job. |
|
int |
N |
8 |
Number of nodes to use when running dlasso |
|
int |
N |
|
Number of procs per node to use when running dlasso |
|
str |
N |
pbatch |
Queue to submit the dlasso job to |
|
str |
N |
“04:00:00” |
Walltime for dlasso job |
Note
There are numerous options available for weighting, and weights are applied separately to full-frame forces, gas phase forces, full-frame energies, gas phase energies, and full-frame stress.
If a WEIGHTS_*
option is set to a single floating point value, that value is applied to all candidate data of that type, e.g., if WEIGHTS_FORCE
= 1.0, all full-frame forces will be assigned a weight of 1.0.
Additional weighting styles can be selected by letter:
w = a0
w = a0*(this_cycle-1)^a1 # NOTE: treats this_cycle = 0 as this_cycle = 1
w = a0*exp(a1*|X|/a2)
w = a0*exp(a1[X-a2]/a3)
w = n_atoms^a0
where “X” is the value being weighted.
WEIGHTS_FORCE = [["B"],[1.0,-1.0]]
would select weighting style B and apply a weight of 1.0 to each full-frame force component in the first ALD cycle; weighting would decrease by a factor (this_cycle)^(-1.0) each cycle.
Multiple weighting schemes can be combined as well. For example WEIGHTS_FORCE = [ ["A","B"], [[100.0 ],[1.0,-1.0]]]
would add an additional multiplicative factor of 100 to the previous example.
Molecular Dynamics Options
Input variable |
Variable type |
Required |
Default |
Value/Options/Notes |
---|---|---|---|---|
|
str |
Y |
None |
Iterative MD method. Options are “CHIMES” (used for ChIMES model development) or “DFTB” (used when generating ChIMES corrections to DFTB). |
|
str |
N |
None |
Only used when |
|
str |
N |
|
Only used when |
|
str |
N |
|
Used when |
|
list of int |
N |
[4] * |
Number of nodes to use for MD jobs at each case. Number can be different for each case (e.g., [2,2,4,8] for four cases). |
|
list of str |
N |
[“pbatch”] * |
Queue type to use for MD jobs at each case. Can be different for each case. |
|
list of str |
N |
[“4:00:00”] * |
Walltime to use for MD jobs at each case. Can be different for each case. |
|
str |
N |
|
Absolute path to MD input files like case-0.indep-0.run_md.in |
|
float |
N |
1.0E6 |
ChIMES penalty function prefactor. |
|
float |
N |
0.02 |
ChIMES pentalty function kick-in distance |
|
str |
N |
None |
Absolute path to molanal executable. |
CHIMES_MD_SER
is used for old i/o based ChIMES/DFTB linking - update required, but needs bad_cfg printing in DFTB+ (requires change to interface)
Correction Fitting Options
Input variable |
Variable type |
Required |
Default |
Value/Options/Notes |
---|---|---|---|---|
|
bool |
N |
False |
Is this ChIMES model being fit as a correction to another method? |
|
str |
N |
None |
Method type being corrected. Currently only “DFTB” is supported |
|
str |
N |
None |
?!?!?!?!IS THIS A PATH OR A FILENAME? Files needed to run simulations/single points with the method to be corrected |
|
str |
N |
None |
Executable to use when subtracting existing forces/energies/stresses from method to be corrected |
|
bool |
N |
None |
???!?!! String or path??? Should electron temperatures be set to values in traj_list.dat (false) or in specified file location, for correction calculation? Only needed if correction method is QM-based. |
Note
Note: If corrections are used, ChIMES_MD_{NODES,QUEUE,TIME}
are all used to specify DFTB runs. These should be renamed to simulation_{...}
for the generalized MD block (which should become SIM block). If FIT_CORRECTION
is True
, temperaturess in traj_list.dat
are ignored by correction FES subtraction. Instead, searches for <filesnames>.temps
where .temps
replaces whatever last extension was, in CORRECTED_TYPE_FILES
.
Hierarchical Fitting Options
Input variable |
Variable type |
Default |
Value/Options/Notes |
---|---|---|---|
|
bool |
False |
Is this a hierarchical fit (i.e., building on existing parameters?”) |
|
list of str |
None |
List of parameter files to build on, which should be in ALL_BASE_FILES/HIERARCH_PARAMS |
|
str |
None |
Executable to use when subtracting existing parameter contributions |
Note
Consider the case of fitting 2+3+4-body C/N parameters on top of existing C- and N- parameter sets.
Users must create a new folder, HIERARCH_PARAMS
in their ALL_BASE_FILES
directory and place in it the pure-C and pure-N parameter files, i.e.:
$: ls -l <my_fit>/ALL_BASE_FILES/HIERARCH_PARAMS
-rw------- 1 rlindsey rlindsey 169630 May 1 10:55 C.params.txt.reduced
-rw------- 1 rlindsey rlindsey 160015 May 1 10:55 N.params.txt.reduced
Hierarchical fitting also requires special options in ALL_BASE_FILES/ALC-0_BASEFILES/fm_setup.in
to ensure base the parameter types (e.g., in {C,N}.params.txt.reduced) are properly excluded from the fit. First, one must ensure that requested polynomial orders are greater or equal to those in the reference ALL_BASE_FILES/HIERARCH_PARAMS
parameter files. Next, add the highlighted lines to fm_setup.in
:
# Snippet from ALL_BASE_FILES/ALC-0_BASEFILES/fm_setup.in
# PAIRTYP #
CHEBYSHEV 25 10 4 -1 1
# CHBTYPE #
MORSE
# SPLITFI #
false
# HIERARC #
true
Users must also specify which interactions to exclude from the fit (i.e., interactions fully described by the ALL_BASE_FILES/HIERARCH_PARAMS files. For the present C/N fitting example, those lines would look like:
####### TOPOLOGY VARIABLES #######
EXCLUDE 1B INTERACTION: 2
C
N
EXCLUDE 2B INTERACTION: 2
C C
N N
EXCLUDE 3B INTERACTION: 2
C C C
N N N
EXCLUDE 4B INTERACTION: 2
C C C C
N N N N
Users must also ensure that the fm_setup.in
topolgy contents are consistent with those in the ALL_BASE_FILES/HIERARCH_PARAMS files. For the present C/N fitting example, those would be the highlighted lines below:
# NATMTYP #
2
# TYPEIDX # # ATM_TYP # # ATMCHRG # # ATMMASS #
1 C 0.0 12.0107
2 N 0.0 14.0067
# PAIRIDX # # ATM_TY1 # # ATM_TY1 # # S_MINIM # # S_MAXIM # # S_DELTA # # MORSE_LAMBDA # # USEOVRP # # NIJBINS # # NIKBINS # # NJKBINS #
1 C C 0.98 5.0 0.01 1.4 false 0 0 0
2 N N 0.86 8.0 0.01 1.09 false 0 0 0
3 C N 1.0 5.0 0.01 1.34 false 0 0 0
Users must explicitly define how many (and which) many-body interactions will be fit, and the corresponding outer cutoffs to use. Note that the option ALL
cannot be used when performing hierarchical fits.
SPECIAL 3B S_MAXIM: SPECIFIC 2
CCCNCN CC CN CN 5.0 5.0 5.0
CNCNNN CN CN NN 5.0 5.0 5.0
SPECIAL 4B S_MAXIM: SPECIFIC 3
CCCCCNCCCNCN CC CC CN CC CN CN 4.5 4.5 4.5 4.5 4.5 4.5
CCCNCNCNCNNN CC CN CN CN CN NN 4.5 4.5 4.5 4.5 4.5 4.5
CNCNCNNNNNNN CN CN CN NN NN NN 4.5 4.5 4.5 4.5 4.5 4.5
Note
Each training trajectory file in ALL_BASE_FILES/ALC-0_BASEFILES needs a corresponding .temps file that gives the temperature for each frame WHY?!?!?.
TO DO ADD VASP MODULES TO CODE
Reference QM Method Options
Input variable |
Variable type |
Default |
Value/Options/Notes |
---|---|---|---|
|
str |
WORKING_DIR + “ALL_BASE_FILES/VASP_BASEFILES” |
Absolute path to QM input files generic to all QM methods. Can specify separately if multiple methods are being used (see code-specific options below) |
|
str |
VASP |
Specifies which nominal QM code to use for bulk configurations; options are “VASP” or “DFTB+” |
|
int |
VASP |
Specifies which nominal QM code to use for gas configurations; options are “VASP”, “DFTB+”, and “Gaussian” |
VASP-Specific Options
Input variable |
Variable type |
Default |
Value/Options/Notes |
---|---|---|---|
|
str |
|
Absolute path to VASP input filess. |
|
int |
6 |
Number of nodes to use for VASP jobs |
|
int |
|
Number of processors to use per node for VASP jobs |
|
str |
“04:00:00” |
Walltime for VASP calculations (HH:MM:SS) |
|
str |
“pbatch” |
Queue to submit VASP jobs to |
|
str |
None |
A path to a VASP executable must be specified if |
|
str |
“mkl” |
Modules to load during VASP run |
DFTB+ -Specific Options
Input variable |
Variable type |
Default |
Value/Options/Notes |
---|---|---|---|
|
str |
|
Absolute path to DFTB+ input files. |
|
int |
1 |
Number of nodes to use for VASP jobs |
|
int |
1 |
Number of processors to use per node for VASP jobs |
|
str |
“04:00:00” |
Walltime for VASP calculations (HH:MM:SS) |
|
str |
“pbatch” |
ueue to submit VASP jobs to |
|
str |
None |
A path to a VASP executable must be specified if |
|
str |
“mkl” |
Modules to load during VASP run |
Gaussian-Specific Options
Input variable |
Variable type |
Default |
Value/Options/Notes |
---|---|---|---|
|
int |
4 |
Number of nodes to use for Gaussian jobs |
|
int |
|
Number of processors to use per node for Gaussian jobs |
|
str |
“04:00:00” |
Walltime for Gaussian calculations (HH:MM:SS) |
|
str |
“pbatch” |
ueue to submit Gaussian jobs to |
|
str |
None |
A path to a Gaussian executable must be specified if |
|
str |
None |
Absolute path to Gaussian scratch directory |
|
str |
None |
Name of file containing single atom energies from Gaussian and target planewave method |
Note
The file specified for GAUS_REF
is structured like:
<chemical symbol> <Gaussian energy> <planewave code energy>
<chemical symbol> <Gaussian energy> <planewave code energy>
<chemical symbol> <Gaussian energy> <planewave code energy>
...
<chemical symbol> <Gaussian energy> <planewave code energy>
Energies are expected in kcal/mol and there should be an entry for each atom type of interest.