A python optimizer for developing molecular simulation forcefield.
Author: MP, WD
To use this code, an input text file is needed to configure all stuffs to control the optimizing process. You also need your own post-processing script to extract the values of the properties you chose as the optimization target.
To run:
$ opt.py YOUR_CONFIG_FILE
The input configuration file is organized in the standard .INI file format, which is also similar to the .mdp format in Gromacs. All options are grouped in four sections: simulation, properties, parameters, and optimization. See config.sample as a example to create your config file, and below is a brief explanation for options in each section.
[ simulation ]
mode = "test" for test function or "simulation" to run optimization
execute = Bash command that will be used to run your simulation. E.g. "mpirun -np 4 lammps -in in.lmp"
path = The path under which you run your simulation, relative to the
current executing directory.
processScript = A script to post-process your simulation output to obtain
targeted property values.Note: the values of options "inFileName" and "processScript" indicate a
relative path to the value of option "path". E.g., "inFileName = in.lammps"
points to the input file under the simulation folder, and "processScript = ../process.sh" points to a script file in the parent directory.
[ optimization ]
optMethod = The optimization algorithm. Note: the value of "optMethod" could be one of the following:
- "
Nelder-Mead"
[ properties ]
totalProperties = The number of targeted properties, which should match the
output by your post-processing script.
propertyNName = The name you use to identify the Nth targeted property.
propertyNRef = The reference value for the Nth property.
propertyNSpecial = The special handling for the target-function of the Nth
property. ;(optional)
propertyNSpecialArg = The argument for the special handling, which is only
useful when "propertyNSpecial = scaled". ;(optional)Note: If "totalProperties = M", in total M bunches of "propertyN*"
options should be given, where N = 1, 2, ..., M.
"propertyNName" can be left blank, and if so, it will be automatically
assigned as "q_N".
The value of "propertyNSpecial" could be one of the following, and then
"propertyNSpecialArg" is only useful for value "scaled", indicating the
scaling coefficient:
- "
log": calculate the target-function for the logarithmized value; - "
scaled": calculate the scaled target-function.
[ parameters ]
initParaTableFile = A tabulated text file you prepared, listing the initial
values of the forcefield parameters to be optimized (see
below for further explanation about the required format).
paraTableFile = The tabulated file that will be generated during every
optimizing step, saving the output parameter values.
ffTemplate = A template of the forcefield file you prepared for running the
simulation (see rules below).
ffForSimulation = The forcefield file that will be read for your simulation,
written based on the "ffTemplate".The "initParaTableFile" should contain only 2 lines. The 1st line starts
with "#" and then lists the parameter "names" (or "tags"), and the 2nd
line lists the corresponding values from which you want to start the
optimization. Now assume you have such a set of parameters to optimize:
# epsilon_1 sigma_1 epsilon_2 sigma_2 epsilon_3 sigma_3
0.1 10 0.2 20 0.3 30
The parameter names are important because they are used in the "ffTemplate"
so that the code finds the correct positions in the template and replace them
with corresponding numbers and runs the simulation. Your template forcefield
file looks something like this:
pair_coeff 1 1 @epsilon_1 @sigma_1
pair_coeff 1 2 @epsilon_2 @sigma_2
pair_coeff 1 3 @epsilon_3 @sigma_3 Note the "@" placeholder that indicates here the following string should be
replaced, and the code tries to match it from the "initParaTableFile" (1st
step) or "paraTableFile". So, a real force field file ("ffForSimulation")
you would expect to be generated in the 1st step will be:
pair_coeff 1 1 0.1 10
pair_coeff 1 2 0.2 20
pair_coeff 1 3 0.3 30 Then after each step, the freshly optimized parameter values are saved into
"paraTableFile" --thus, it will contain the same number of columns as
"initParaTableFile" and N+2 rows after N iteration steps -- and the
"ffForSimulation" are updated.
Note: all files identified in this section imply a path relative to the executing directory, or an absolute path is also accepted.
You can use whatever script you like to post-process your data. The only
requirement is: the final numbers of all target property should be saved in
one line, separated by blank space, in a file named "res.postprocess" under
your simulation path.
For example, it can be a shell script like:
q1=$(SOME_COMMANDS_TO_OBTAIN_PROPERTY1)
q2=$(SOME_COMMANDS_TO_OBTAIN_PROPERTY2)
q3=$(SOME_COMMANDS_TO_OBTAIN_PROPERTY3)
q4=$(SOME_COMMANDS_TO_OBTAIN_PROPERTY4)
echo ${q1} ${q2} ${q3} ${q4} > res.postprocess