Cluster AL Fitting Mode

_images/ALD_workflow.png

Fig. 1: The ChIMES Active Learning Driver Workflow.

When developing models for molecular reacting systems, our cluster-based active learning (AL) can be advantageous. This AL strategy attemps to improve description of conformational energetics and nominal reaction barriers. This is achieved by supplementing the basic mode by carving out candidate molecules and nominal transition states from DFT- and ChIMES-generated simulation trajectories , down-selecting a maximally informative subset, and adding them to the training set. Details of this strategy are outlined in R.K. Lindsey et al, JCP 2020.

… Warning:

This capability will only run on Slurm systems, and may require a specific Slurm specification. See the ``utilities/new*sh`` files for details.

Example Fit: Water

Note

Files for this example are located in ./<al_driver base folder>/examples/cluster_based_active_learning_single_statepoint-VASP/

In this section, an example 1-iteration fit for water at 1000 K and 1.25 g/cc is overviewed. The model will include up-to-three body interactions with the following hyperparameters. Note: This example is intended to run quickly and will not yield neither a quality nor stable model.

Hyperparameter

Value

2-body order

12

2-body outer cutoff

5

3-body order

3

3-body outer cutoff

3.5

OO inner cutoff

2.31

HH inner cutoff

1.18

OH inner cutoff

0.82

Morse lambda

1.00

Tersoff parameter

0.75

Input Files

The neccesary input files and directory tree structure are provided in the example folder, i.e.:

$: tree

.
├── ALL_BASE_FILES
│   ├── ALC-0_BASEFILES
│   │   ├── fm_setup.in
│   │   ├── reactive_water.temps
│   │   ├── reactive_water.xyzf
│   │   └── traj_list.dat
│   ├── CHIMESMD_BASEFILES
│   │   ├── bonds.dat
│   │   ├── case-0.indep-0.input.xyz
│   │   ├── case-0.indep-0.run_md.in
│   │   └── run_molanal.sh
│   ├── QM_BASEFILES
│   │   ├── 1000.INCAR
│   │   ├── H.POTCAR
│   │   ├── KPOINTS
│   │   └── O.POTCAR
│   ├── run_md.cluster
│   ├── loose_bond_crit.dat
│   └── tight_bond_crit.dat

Beginning with the contents of the ALC-0_BASEFILES folder: fm_setup.in, traj_list.dat, and the training trajectory (reactive_water.xyzf) require no special treatment for cluster-based AL. However, an additional file (reactive_water.temps) is now required. This file must have the same name as the training .xyzf file and end with a “.temps” extension. For each frame in the .xyzf file, the .temps file contains the corresponding target system temperature.

The contents of the CHIMESMD_BASEFILES and QM_BASEFILES foldera also requires no special treatment.

Three new files are required, which sit directly in the ALL_BASE_FILES folder: run_md.cluster, tight_bond_crit.dat, and loose_bond_crit.dat. The run_md.cluster file can be taken exactly as provided in the example folder. The tight* and loose* files provide the bonding distanct criteria used to identify molecules and nominal transition state species, respctively. The format of each file is as follows: The first line gives a space-separated list of each element present in the system (e.g., “O H”). The second line gives the unique number of atom pair types formed by those atoms, e.g., O an H can form 3 pairs, O O, O H, and H H. Then, one line is given for each pair, which gives the two atom types and the corresponding distance criteria, (e.g., “H H 1.4”).

Contents of the config.py file must be modified to reflect your HPC system and absolute paths prior to running this example. File contents specific to/required for cluster-based AL are highlighted below:

  1################################
  2##### General options
  3################################
  4
  5ATOM_TYPES = ['O', 'H']
  6NO_CASES = 1
  7
  8DRIVER_DIR     = "/p/lustre3/lindsey11/al_driver-myLLfork/"
  9WORKING_DIR    = "/p/lustre3/lindsey11/al_driver-myLLfork/examples/cluster_based_active_learning_single_statepoint-VASP/"
 10CHIMES_SRCDIR  = "/p/lustre3/lindsey11/test_chimes_lsq-for-LL_to_ext_PR/chimes_lsq-LLfork/src/"
 11
 12################################
 13##### General HPC options
 14################################
 15
 16HPC_ACCOUNT = "iap"
 17HPC_PYTHON  = "/usr/tce/bin/python3"
 18HPC_SYSTEM  = "slurm"
 19HPC_PPN     = 56
 20
 21HPC_EMAIL   = False
 22
 23################################
 24##### ChIMES LSQ
 25################################
 26
 27ALC0_FILES    = WORKING_DIR + "ALL_BASE_FILES/ALC-0_BASEFILES/"
 28CHIMES_LSQ    = CHIMES_SRCDIR + "../build/chimes_lsq"
 29CHIMES_SOLVER = CHIMES_SRCDIR + "../build/chimes_lsq.py"
 30CHIMES_POSTPRC= CHIMES_SRCDIR + "../build/post_proc_chimes_lsq.py"
 31
 32# Generic weight settings
 33
 34WEIGHTS_FORCE = [ ["A"], [[1.0  ]] ]
 35WEIGHTS_FGAS  = [ ["A"], [[1.0  ]] ]
 36WEIGHTS_ENER  = [ ["A"], [[0.3  ]] ]
 37WEIGHTS_EGAS  = [ ["A"], [[1.0  ]] ]
 38WEIGHTS_STRES = [ ["A"], [[100.0]] ]
 39
 40REGRESS_ALG   = "dlasso"
 41REGRESS_VAR   = "1.0E-5"
 42REGRESS_NRM = True
 43
 44# Stress tensor settings
 45
 46STRS_STYLE    = "ALL" # Options: "DIAG" or "ALL"
 47
 48CHIMES_BUILD_NODES = 1
 49CHIMES_BUILD_QUEUE = "pdebug"
 50CHIMES_BUILD_TIME  = "01:00:00"
 51
 52CHIMES_SOLVE_NODES = 2
 53CHIMES_SOLVE_QUEUE = "pdebug"
 54CHIMES_SOLVE_TIME  = "01:00:00"
 55
 56################################
 57##### Do Cluster-based active learning
 58################################
 59
 60DO_CLUSTER = True
 61MAX_CLUATM = 10
 62TIGHT_CRIT = WORKING_DIR + "ALL_BASE_FILES/tight_bond_crit.dat"
 63LOOSE_CRIT = WORKING_DIR + "ALL_BASE_FILES/loose_bond_crit.dat"
 64CLU_CODE   = DRIVER_DIR + "/utilities/new_ts_clu.cpp"
 65
 66MEM_BINS = 40
 67MEM_CYCL = MEM_BINS/10
 68MEM_NSEL = 100
 69MEM_ECUT = 4000.0
 70
 71CALC_REPO_ENER_CENT_QUEUE = "pdebug"
 72CALC_REPO_ENER_CENT_TIME = "1:00:00"
 73
 74CALC_REPO_ENER_QUEUE =  "pdebug"
 75CALC_REPO_ENER_TIME  =  "1:00:00"
 76
 77
 78################################
 79##### Molecular Dynamics
 80################################
 81
 82MD_STYLE = "CHIMES"
 83CHIMES_MD_MPI = CHIMES_SRCDIR + "../build/chimes_md"
 84CHIMES_MD_SER = CHIMES_SRCDIR + "../build/chimes_md-serial"
 85
 86MOLANAL  = CHIMES_SRCDIR + "../contrib/molanal/src/"
 87MOLANAL_SPECIES = ["H2O","H3O", "OH"]
 88
 89MD_NODES = [1] * NO_CASES
 90MD_QUEUE = ['pdebug'] * NO_CASES
 91MD_TIME  = ['00:05:00'] * NO_CASES
 92
 93################################
 94##### QM-Specific variables (Single point calculations)
 95################################
 96
 97QM_FILES = WORKING_DIR + "ALL_BASE_FILES/QM_BASEFILES"
 98
 99VASP_EXE      = "/p/lustre3/lindsey11/vasp_std.5.4.4"
100VASP_TIME    = "01:00:00"
101VASP_NODES   = 1
102VASP_PPN     = 56
103VASP_QUEUE   = "pdebug"
104VASP_MODULES = "intel-classic/19.1.2 mvapich2/2.3.6 mkl"

The variable DO_CLUSTER controls whether cluster-based AL is used. This variable is false by default; when false, no variables in the “Do Cluster-based active learning” block above need be specified. MAX_CLUATM controls the maximum number of atoms that a molecule can be comprised of. TIGHT_CRIT and LOOSE_CRIT are the full paths to the tight and loose bond criteria files in the ALL_BASE_FILES folder. CLU_CODE is the path to the cluster-extraction code.

The next chunk of variables control the cluster down-selection process. MEM_BINS is the number of bins in the cluster energy histogram, MEM_CYCLE is the number of Monte Carlo cycles to perform during the down-selection process, MEM_NSEL is the number of molecules to select each AL cycle, and MEM_ECUT a cutoff the ignores any molecules whose absolute energy is greater than MEM_ECUT.

CALC_REPO_ENER_CENT* and CALC_REPO_ENER specify computational resources for assinging ChIMES energies to each candidate cluster.