qcalibrate
qcalibrate is a python script written in the Paton lab to assist with performing conformational sampling on transition state structures. This conformational sampling will help ensure that you have the lowest energy structures and more accurate pathways, so it's an important part of research.
This script uses Gaussian and CREST and currently is only in place on ACME. My tutorial will go through the different steps that qcalibrate runs through in such a way that you can repeat this process manually if you are on another computer.
qcalibrate Processes
This module will go through the steps of the qcalibrate script, as well as the location of each part in the event you will need to troubleshoot the scripts. This will also serve as a step-by-step guide for the event that you want to run the processes individually or on another computer without qcalibrate.
Step 1: Modredundant Optimization
As the set-up for qcalibrate, you must create a Gaussian input file for a modredundant optimization of your transition state structure (more details in the Set-up section). This optimization is first run, with the bond constraints that you have set up for the transition state. This step should not take very long, since you will be starting with an optimized transiton state.
The modredundant optimization is initially run to ensure that
you have found a transiton state before you move along with
the conformation search and following optimizations. There is
a script which checks for imaginary vibrational modes in the
Gaussian output files (/opt/apps/bin/calibrate/check_im_freq
).
This is to ensure that you have a good starting point before
spending more computing power on something that isn't a
transition state after all.
This step of the process is called in the script
/opt/apps/bin/qcalibrate
.
Step 2: Constrained CREST Search
Once your starting structure has been verified as a
transition state structure, the constraints from the input
modredundant optimization are used in a CREST conformation
search. The script /opt/apps/bin/CALIBRATE
will then
convert the Gaussian output file into xyz coordinates and
start a CREST search.
CREST constraints are put into a file named constrain.inp
using a script opt/apps/bin/calibrate/constraints.py
.
With these constraints, the conformational sampling is done
(called on line 66 of opt/apps/bin/CALIBRATE
).
Once the CREST search has finished, the structures will
be clustered into 10 (or a number of your choosing) representative
structures using the CREGEN module. Following the clustering
(on line 76 of opt/apps/bin/CALIBRATE
), the output xyz files
are converted into more Gaussian input files to proceed with
the qcalibrate process.
Warning
When this search is finished, qcalibrate also deletes all the
files that are unnecessary moving forward, so it might be
harder if you want to troubleshoot. To get around this, you
can create your own copy of the files and run them,
commenting out the line of code that cleans up
(line 114 of /opt/apps/bin/CALIBRATE
).
Step 3: Constrained Optimization
Once you have the clustered conformations of your transition
states, the same constraints/frozen bonds that were included
in your initial qcalibrate input are applied to each structure
and a constrained geometry optimization is started. These
will all run in serial, so conf_01
will run first, and when
it has finished conf_02
will start. This is part of why the
process can take so long and it might be beneficial to increase
the total wall time for your project.
Note
These constrained optimizations also use the same level of theory that you started with in your initial input file, so be mindful of that when you create the input.
Once the optimizations are completed, they will be put into
a folder called constrained_opts/
and further analyzed.
At this stage, the output files are checked for imaginary frequencies (signifying a transition state), as well as filtered based on energy. GoodVibes is run on all the conformers that had a normal termination to get energy information about the ensemble. This is done as a way of reducing computational cost if the conformations are too high in energy, as anything more than 5 kcal/mol higher in energy than the lowest energy conformer is discarded at this step.
Step 4: Unrestrained Transition State Optimization
After the modredundant optimizations have completed and high
energy conformers filtered out, those geometries are used
as starting points for an unrestrained TS optimization.
The keywords used in the Gaussian input files are
opt=(ts,calcfc,noeigentest)
as opposed to the opt=modredundant
that the process was started with. These optimizations also
take place in serial, so it can take some time to complete.
Note
If you run out of time at this point, you should just
re-submit all of the input files that end
in *ts_conf_XX.com
(XX represents conformer number), as
this is one of the last steps in the process. At this
point, it is just an ensemble of TS optimizations.
Once these jobs are complete, the files are checked for
imaginary frequencies (those without are filtered out and
moved to ts_fail/
) and GoodVibes is started again.
At this stage, conformers are still filtered out if they
are more than 5 kcal/mol higher in energy than the lowest
energy conformer, but at this point the geometries are
fully relaxed to the transtion states. You will have an
output labeled "best" that is the lowest energy TS
conformer to come out of this protocol, but if you want the
full ensemble structures are in the ts_opt/
folder.
Note
You should at this point visualize all of your conformers that you plan to continue with to ensure that they represent the transition state that you want. This protocol just checks to see that they have at least one imaginary frequency that has an intensity greater than the cutoff, not that it's the TS you were searching for or started with.
This is the end of the qcalibrate protocol. The output for each
step will be in the individual folders created with each step.
Another file called calibrate.out
will also be created which
will give an overview of the whole process. This is a good place
to look if anything seems wrong to start the troubleshooting
process.
Set-up
If you are on ACME, then qcalibrate is already installed and ready to
be called on the computer. If you are curious about any errors/troubleshooting,
the code exists in the directory /opt/apps/bin/
as CALIBRATE
,
qcalibrate
, and all files in calibrate/
.
You want to start with a Gaussian input file from an optimized transition state structure. So once you have found a transition state, create an input file using that geometry and set up for a modredundant optimization. This includes putting these keywords in the route line for your calculation:
functional/basis opt=modredundant freq
For the optimization, you also need to specify bonds to constrain at the bottom of the file. Here, you will want to freeze the bonds that are involved in the transition state that you are trying to find. For example, if your transiton state is for a bond formation between C1 and O4, the bottom of your input file should include this line:
B 1 4 F
This will ensure that the correct constraints are imposed during the conformation search so you don't lose the transition state that you have.
Another key feature of this input is that you must specify a checkpoint file. You can do this with a simple line at the start of the file that looks something like this:
%chk=file_name_ts_conf_search.chk
Warning
The checkpoint file must be defined in the first line of the input file, or the script will not work properly.
With all of these pieces you are ready to run qcalibrate and start performing a conformation search of your transition state. In case these instructions were confusing, here is an example input for a proton transfer transition state between atoms 2 and 23 (with the proton being atom 27). This file includes an extra constraint between atoms 3 and 21 to ensure these atoms do not bond prematurely.
%chk=mnah_ts1_conf_search.chk
%nprocshared=16
%mem=64GB
# m062x/6-31+G(d) emp=gd3 opt(modredundant) scrf(cpcm,solvent=water) freq
mnah_ts1_conf_search
-1 1
C -1.59684100 -0.47303900 1.52918800
C -0.22857300 0.14916100 1.80214900
C 0.04361500 1.41083200 1.20889400
N -0.58941700 1.83851900 0.13565700
C -1.59564800 1.04046000 -0.44130900
C -2.08561900 -0.04818600 0.16274100
H -2.33375600 -0.18827900 2.29427800
H -1.49470300 -1.56354200 1.57382800
H 0.85067300 2.03804600 1.56705400
H -1.95798800 1.37619200 -1.40661200
C -3.19120000 -0.76252600 -0.54804800
N -3.80415300 -1.73833800 0.15624600
O -3.52400400 -0.48022700 -1.70101400
C -0.13906300 3.01499100 -0.60859300
H 0.59183400 3.55301300 -0.00724600
H -0.99444200 3.65955900 -0.81740100
H 0.32883200 2.69124800 -1.54040800
H -3.58796000 -1.94293800 1.12129600
H -4.58510400 -2.22517000 -0.26420700
H 0.10903600 0.08312900 2.83618600
O 2.24957000 0.91384800 -0.20996100
P 2.49177900 -0.58867800 -0.35956100
O 1.38892000 -1.42317600 0.36391500
O 3.83491600 -0.96187800 0.57321100
O 2.79887500 -1.07387800 -1.77022500
H 4.63226600 -0.59717800 0.16010100
H 0.59336900 -0.54284100 1.15568300
B 2 27 F
B 27 23 F
B 3 21 F
Running the Code
This is an automated workflow, so it only takes one line to run all parts! If you don't want to run all parts of the code yet, I recommend creating your own copy of the code and running it individually. We are planning to add this kind of functionality to the existing script, but that isn't finished yet.
Once you have your input file created, you can easily run the code and start your conformer search. This code and be run with only one line, but there are a few arguments that you can implement to customize how the script runs. Here is the basic command you can use to run the script:
qcalibrate -i name_of_input.com
Running qcalibrate creates a new directory in the working directory where all of the jobs will be run. The new directory is named after the initial input file, and inside more directories will be made for the constrained CREST search, the constrained optimization, and the successful/failed TS optimizations.
This will start a slurm job with all default settings. Now, I will go through the different ways that you can alter the script.
Note
All add-ons and ways to change the qcalibrate defaults
are also described if you run the command
qcalibrate -h
.
Frequency Cutoff
With the -f
flag, you can change the frequency cutoff value.
By default, when qcalibrate checks to make sure that Gaussian
optimized (constrained or not) structures include at least one
imaginary frequency, it also ensures that the frequency has a
magnitude of at least 100 cm-1. This is a helpful feature,
since it filters out any jobs that had insignificant imaginary
frequencies that are not likely to be your transition state.
If you know that the frequency of the transition state you're looking for is close to -1000 cm-1, then you can adjust the cutoff to make sure that you only search for large frequencies. Similarly, if the frequency of the transition state that you want is less intense than -100 cm-1, you can change the cutoff so that qcalibrate will keep your transition state structures.
Here is an example of implementing the frequency cutoff:
qcalibrate -i name_of_input.com -f 1000
This command will only keep optimizations (constrained or not) if the output file includes an imaginary frequency of -1000 cm-1 or higher in intensity.
Wall Time
If you run the qcalibrate command with no flags, your job will
be submitted to the slurm queue with a maximum wall time of one
day (24 hours). This is, however, the maximum time for all steps
of the qcalibrate process, so 24 hours might not be long enough for
your conformer search to finish completely. If you want to increase
your wall time, you can use the -t
flag. With this, you can change
your job to be able to run for any length of time (while it fits
on ACME's time limits).
An example of a command which changes the maximum wall time of the job is:
qcalibrate -i name_of_input.com -t 1-12:00:00
This command will submit the qcalibrate job to the queue for a maximum time of 36 hours (or 1 day and 12 hours).
Note
If you are requesting times that are longer than one day, you can either include how many days and extra hours (1-12:00:00) or the total number of hours (36:00:00) to achieve the same effect.
Number of Processors
Another customizable feature of the qcalibrate script is to
change the number of processors that your job is running on.
The default value for processors to use is 16, but you can
change this to use more or less, depending on your needs.
You can alter the number of processors with the -p
flag:
qcalibrate -i name_of_input.com -p 8
This command will submit to 8 processors on ACME instead of the default 16. On ACME, you can submit to a maximum of 32 processors per core.
Number of Conformers
As part of the qcalibrate command, you can adjust the final number of conformers that you want to proceed with for the coformer search. After the initial search, the CREGEN module of CREST is used to cluster the ensemble into a specified number of representative structures.
By default, the number of conformers to be considered each
time you run qcalibrate is 10. However, you can adjust this
number to suit your needs with the -n
flag:
qcalibrate -i name_of_input.com -n 5
This command will cluster to 5 representative structures after the initial CREST conformer search. These 5 structures will be used for DFT constrained optimizaitons and continue through the workflow.