17 September 2012

238. Calculating pKa, part 2: CBS extrapolation basics

I'm typing this up as I'm learning. Be prepared that there may be outright errors, fuzzy thinking and misunderstood concepts in this post. Use it as an inspiration for learning about Complete Basis Set extrapolation -- not as an authoritative guide. What follows is just what I think I've understood. It's a short post since I'll just put it here in order that I can refer to it later.

The general idea
The basic idea behind Complete Basis Set extrapolation seems to be that as you make your basis sets larger and larger by adding Zeta functions to your valence orbitals (making your valence orbitals more 'flexible') the energies you get start to approach those you would get with an infinitely large or complete (and thus 'true') basis set. Exactly how quickly this approach (exponential?) occurs is a matter for debate, so there's a number of ways of extrapolating it.

In practical terms, what seems to be done is:
1. The structure of a molecule is first optimised using a specific level of theory (e.g. MP2/cc-aug-pvdz) in the gas phase. This structure is used for all subsequent calculations.
2. A single point calculation at MP2/aug-cc-pVDZ is done. The electronic energy is recorded.
3. A single point calculation at MP2/aug-cc-pVTZ is done. The electronic energy is recorded.
4. A single point calculation at MP2/aug-cc-pVQZ is done. The electronic energy is recorded.
5. A single point calculation at MP2/aug-cc-pV5Z is done. The electronic energy is recorded.
etc.

where D=double(2), T=triple (3), Q=Quadruple (4)

Using MP2 and the Dunning-Knoll series of correlation consistent basis sets, for formate I got
2   -188.772348824917 (MP2/aug-cc-pVDZ)
3   -188.930859478806 (MP2/aug-cc-pVTZ)
4   -188.984671946038 (MP2/aug-cc-pVQZ)


Which looks like this:
Not everyone will appreciate the default choice of green and red by gnuplot...
The fitted line is MP2energy=1*exp(-0.599028936687644*Zetafunctions)-189.082170648532

Plotting an exponential with three unknown based only on three points is obviously poor form, but the point is fairly clear.

The extrapolated energy is -189.082170648532 hartree. For practical use you will need to obtain enthalpy/entropy corrections and solvation energies using a set method. Also, there are normally corrections for unpaired electrons etc.

The Peterson Scheme:
 (Peterson, K. A.; Woon, D. E.; Dunning, T. H., Jr. J. Chem. Phys. 1994100, 7410, link)

Instead of a straight A*exp(-B*x)+C fit you do E(x)=A+B*exp(-(x-1))+C*exp(-(x-1)**2), where A is the CBS energy.

So, using gnuplot we can put our energies in 'cbs.dat':
2 -188.772348824917
3 -188.930859478806
4 -188.984671946038

and fit it in gnuplot using:

set xrange [0:5]
f(x)=A+B*exp(-(x-1))+C*exp(-(x-1)**2)
fit f(x) 'cbs.dat' u 1:2 via A,B,C
plot 'cbs.dat' u 1:2, f(x) lc 3



which gives us a CBS energy of -189.016 Hartree (c.f. -189.082 using a simple exponential).

The basic underlying principle behind CBS is thus pretty clear even to someone who's not skilled in the art of computational chemistry.

The difficulties seem to be:
1. What basis set family to use (cc-pVXZ ?)
2. Whether to use diffuse/polarisation/extra orbital functions (aug, p, d+)?
3.  What level to do solvation calc on (DFT, MP2; COSMO, CPCM)?
4. What level to do structural optimisation at?
5. What level to do frequency calculation at?

Remember that a great many basis sets haven't been parametrised for elements beyond Ar.

Also, as far as solvation calculations are concerned an issue seems to be that the correct way to calculate solvation energy seems to be using gas phase structures -- and NOT structures optimised using a solvation model. This can cause issues if the gas phase and solution phase conformations are considerably different.

2 comments:

  1. Thank you for posting this tutorial. It really helped. Gnuplot works pretty well than the usual MS Excel. But I only have one question. How do you increase the digit size of the return values of the fitting parameters? (i.e 189.016 to something like 189.016xxxxx)

    ReplyDelete
    Replies
    1. Something like this will work:
      gnuplot> g=5.39
      gnuplot> print sprintf("%2.10f", g)
      5.3900000000

      Also use with "set fit errorvariables" so that you can print them

      Delete