14 September 2012

236. Calculating pKa, part 1:Example (attempt) of an isodesmic reactions in NWChem

Back to learning about computational approaches to chemistry. The usual warnings apply: why would you trust anything that I say about anything? I'm writing anonymously, and I may misunderstand things at times. So make sure that you compare what I write with that of other sources and make up your own mind.

Anyway, I found a fairly detailed presentation in which they were using Gaussian 98 here: https://www.uow.edu.au/~adamt/Trevitt_Research/Links_files/pKa%20workshop%20slides.pdf


While that should be ok to reproduce, it's not that straightforward to do even with Gaussian, since G09 and G03 don't report solvation parameters in the same way (or detail) as G98.

I'm also a lot keener on NWChem than Gaussian for various reasons, not least that it's 'free' (both libre and gratis) while Gaussian inc. has been accused of doing somewhat unfriendly things in the name of protecting their business interests.

See here and here for an example, and then here for a rebuttal from Gaussian inc. I know that EMSL/Pacific Northwest National Lab that develop NWChem and ECCE are prohibited from using Gaussian since they are considered as being competitors.


Back to science.

Our test example will be acetate, and we'll use formic acid to correct our results.
The fact that this post is very long is due to the amount of detail supplied -- I prefer to show some of the more obvious things so that people can learn from what I post -- and I learn by writing the post.

But first let's just do everything using direct methods.

We work with a thermochemical cycle:

IF we can't calculate the DG_solution directly (i.e. too expensive) we can optimise our structures in the gas phase, and then calculate the solvation energy for those structures.

Then DG_sol=DG_gas+DG_solvation(B)-DGsolvation(A).
(more generally sum of DG_solv(prod) - sum of DG_solv(reactants)).


1. pKa of Acetic acid using direct methods

We can either do
H3CCOOH -> H3COO- + H+
or
H3CCOOH + H2O -> H3COO- + H3O+

Steps:
Optimise acetic acid and acetate in the gas phase and do frequency calculations to get the enthalpy and entropy. Then use the gas phase structures and do single point calculations using COSMO to get the electrostatic solvation energies. Finally, use standard state corrections.

A. Optimise acetic acid in the gas phase and do frequency calculation

Title "aceticacid_gas"
Start  aceticacid_gas
echo
charge 0

geometry autosym units angstrom
 C     0.0402340     0.0308110     0.0402340
 H     -0.600803     -0.611482     0.679201
 H     0.679201     -0.611482     -0.600803
 H     -0.607055     0.659335     -0.607055
 C     0.903928     0.890481     0.903928
 O     0.814831     2.23989     0.814831
 O     1.78275     0.299052     1.78275
 H     2.25438     1.03276     2.25438
end

basis "ao basis" spherical print
  H library "6-31++G**"
  O library "6-31++G**"
  C library "6-31++G**"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

driver
  default
end

task dft optimize
task dft freq

which gives

         Total DFT energy =     -229.102415550663
      One electron energy =     -550.833155571059
           Coulomb energy =      230.555870626149
    Exchange-Corr. energy =      -29.497734561879
 Nuclear repulsion energy =      120.672603956126

 Numeric. integr. density =       31.999999816803

     Total iterative time =      3.6s
and
 Temperature                      =   298.15K
 frequency scaling parameter      =   1.0000

 Zero-Point correction to Energy  =   38.683 kcal/mol  (  0.061646 au)
 Thermal correction to Energy     =   41.568 kcal/mol  (  0.066243 au)
 Thermal correction to Enthalpy   =   42.160 kcal/mol  (  0.067186 au)

 Total Entropy                    =   69.198 cal/mol-K
   - Translational                =   38.179 cal/mol-K (mol. weight =  60.0211)
   - Rotational                   =   23.855 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    7.164 cal/mol-K

 Cv (constant volume heat capacity) =   14.327 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    8.368 cal/mol-K

So that G=-229.102415550663*(627.503 kcal/Hartree)+(42.160 kcal/mol-298.15*(69.198 cal/molK)/1000)=-1.4372e+05 kcal/mol

B. Optimise acetate in the gas phase and do frequency calculation

Title "acetate_gas"

Start  acetate_gas

echo

charge -1

geometry autosym units angstrom
 C     0.0405721     0.0285481     0.0405721
 H     -0.601438     -0.613690     0.678620
 H     0.678620     -0.613690     -0.601438
 H     -0.605857     0.658809     -0.605857
 C     0.904975     0.886806     0.904975
 O     0.825186     2.23364     0.825186
 O     1.77103     0.316179     1.77103
end

basis "ao basis" spherical print
  H library "6-31++G**"
  O library "6-31++G**"
  C library "6-31++G**"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken 
end

driver
  default
end

task dft optimize
task dft freq                     

which gives

         Total DFT energy =     -228.540046314754
      One electron energy =     -539.474204198295
           Coulomb energy =      229.063209931484
    Exchange-Corr. energy =      -29.339040486703
 Nuclear repulsion energy =      111.209988438759

 Numeric. integr. density =       32.000000595562

     Total iterative time =      2.9s

and

 Temperature                      =   298.15K
 frequency scaling parameter      =   1.0000

 Zero-Point correction to Energy  =   30.023 kcal/mol  (  0.047845 au)
 Thermal correction to Energy     =   32.271 kcal/mol  (  0.051427 au)
 Thermal correction to Enthalpy   =   32.863 kcal/mol  (  0.052371 au)

 Total Entropy                    =   64.022 cal/mol-K
   - Translational                =   38.129 cal/mol-K (mol. weight =  59.0133)
   - Rotational                   =   23.766 cal/mol-K (symmetry #  =        1)
   - Vibrational                  =    2.127 cal/mol-K

 Cv (constant volume heat capacity) =   11.112 cal/mol-K
   - Translational                  =    2.979 cal/mol-K
   - Rotational                     =    2.979 cal/mol-K
   - Vibrational                    =    5.153 cal/mol-K
so that G=-228.540046314754*(627.503 kcal/Hartree)+(32.271 kcal/mol-298.15*(64.022 cal/molK)/1000)=-1.4338e+05 kcal/mol

Putting A and B together: (-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))=344.54 kcal/mol

We haven't accounted for solvation or the proton yet.

C. Solvation of acetic acid

Title "aceticacid_solvation"
Start  aceticacid_solvation

echo

charge 0

geometry autosym units angstrom
 C     -0.313400     -1.37257     0.00000
 H     -0.932151     -1.56367     -0.882188
 H     -0.932151     -1.56367     0.882188
 H     0.551887     -2.03461     0.00000
 C     0.149260     0.0607660     0.00000
 O     1.30165     0.439500     0.00000
 O     -0.897630     0.927778     0.00000
 H     -0.523912     1.82535     0.00000
end

basis "ao basis" spherical print
  H library "6-31++G**"
  O library "6-31++G**"
  C library "6-31++G**"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

cosmo
end

task dft energy

which gives
                  COSMO solvation results
                  -----------------------

                 gas phase energy =      -229.1024156124
                 sol phase energy =      -229.1172649438
 (electrostatic) solvation energy =         0.0148493315 (    9.32 kcal/mol)

D. Solvation of acetate

Title "acetate_solvation"
Start  acetate_solvation

echo

charge -1

geometry autosym units angstrom
 C     -0.0308736     -1.36399     0.00000
 H     0.503418     -1.74042     0.882388
 H     0.503418     -1.74042     -0.882388
 H     -1.05531     -1.75261     0.00000
 C     -0.00485953     0.199667     0.00000
 O     -1.12642     0.778065     0.00000
 O     1.14855     0.713544     0.00000
end

basis "ao basis" spherical print
  H library "6-31++G**"
  O library "6-31++G**"
  C library "6-31++G**"
END

dft
  mult 1
  direct
  noio
  XC b3lyp
  grid fine
  mulliken
end

cosmo
end

task dft energy

which gives
                  COSMO solvation results
                  -----------------------
  
                 gas phase energy =      -228.5400463128
                 sol phase energy =      -228.6567490452
 (electrostatic) solvation energy =         0.1167027324 (   73.23 kcal/mol)


Putting A, B and solvation energies together: 
[(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))]-[73.23-9.32]=344.54-63.910=280.63 kcal/mol

E. The Proton
If you try to do any calculations on an isolated proton you get and SCFE of zero, and you won't do much better in terms of thermochemical data. Yet, monoatomic gases obviously still posses entropy and enthalpy. Instead, the document I cite above uses the ideal gas partition function for ideal monoatomic gases which gives a value of 6.28 kcal/mol for the free energy of a proton.  The last reference states that the free energy for a proton in the gas phase is experimentally determined to be -6.28 kcal/mol and that the free energy of hydration is -264.61 kcal/mol (experimental).

[(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))-6.28-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))]-[73.23+264.61-9.32]=338.26-328.52=9.74 kcal/mol =40.752 kJ/mol
We're not done yet though -- make sure to continue to 'F. Standard States'

F. Standard States

We know that

DG=DG^0+RT ln (Q).

We have a bit of a problem. We're doing calculations in the gas phase (pressure) but looking at predicting solution values (concentration). Also, I don't fully get it yet, so my explanation is probably a bit fuzzy.

 So if A+B-> C+D we get Q=(C*D)/(A*B) but for A -> C+D we get Q=(C*D)/(A) or (1 bar x1 bar/1 bar)= 1 bar = 101350 Pa => nRT/P=1*8.314*298.15/101350=0.024458 m3= 24.46 L. The concentration of each species is 1 mol/bar which in volume terms means 1/24.46 L.

And the A -> C + D situation is what we have if we look at
HA -> H+ + A-

So,
Q=((1/24.46)*(1/24.46)/(1/24.46))=1/2.4.46 which gives
DG=DG^0-RTln(24.46)=DG^0-7924.9 J/mol

Thus, we need to correct for the standard state: 40.752 -7924.9/1000=32.827 kJ/mol

G. Calculating pKa

Since (in solution so that concentrations are 1 M)
DG=DG^0+RTln(K), where K=([H3COO][H+])/([H3COOH])
DG=DG^0+(RT ln([HCOO]/[H3COOH])+RTln (10^(-pH)), where RT ln([HCOO]/[H3COOH])=RTln(1/1)=0 so
DG=DG^0+RTln (10^(-pH))=DG^0+RT log (10^(-pH)/log(e)=DG^0-(RT/log(e)) * pH
Which for equilbrium, where pH=pKa and DG=0, turns into
pKa=DG^0*log(e)/RT


pKa= DG*log(e)/(RT)=(32.827*1000)*log(e)/(8.314*298.15)=5.75


Not that great as predictions go (exp: pKa=4.75). Looking at some of the literature one error lies in the size of the solvation energies. Possibly one should tune the parameters used in the COSMO.

2. Isodemic reaction/correction

Using formic acid
This approach is based on 1) the similarity between two compounds and 2) us knowing the DG_solution parameter for one of them.

Assuming that we know that formic acid has a pKa of 3.75, then DG_solution=pKa*RT/log(e)=3.75*8.314*298.15/log10(e)/1000=21.404 kJ/mol. The reverse reaction is -21.404 kJ/mol.

We skip a few steps.
Here are the calculated parameters for formic acid (using the same method as above):

Formic acid


SCFE: -189.772804709496 Hartree
Enthalpy correction: 23.773 kcal/mol
Entropy correction: 59.339 cal/mol
Solvation energy: 9.99 kcal/mol




Formate
SCFE:  -189.217943605798 Hartree
Enthalpy correction: 15.016 kcal/mol
Entropy correction: 56.992 cal/mol
Solvation energy: 72.47 kcal/mol

[Just for kicks we quickly look at what the prediction is:
accounting for everything (solvation, proton etc.)
((-189.217943605798*627.503+(15.016-298.15*56.992/1000)-72.47) +(-6.28-264.61))-(-189.772804709496*627.503+(23.773-298.15*59.339/1000)-9.99)=6.7498 kcal/mol
6.7498*4.184-7924.9/1000=20.316 kJ/mol <=> pKa=1000*20.316*log10(e)/(8.314*298.15)=3.56]

The isodesmic approach:

Here we look at
H3CCOOH + -OOCH -> H3CCOO- +HOOCH

This combined reaction has a
DG_solution=DG_solution(acetic acid/acetate)+DG_solution(formate/formic acid)
<=>
DG_solution(acetic acid/acetate)= DG_solution-DG_solution(formic acid/formate)
=DG_solution-(-21.404 kJ/mol)
=DG_gas+[DG_sol(acetate)+DG_sol(formic acid)-DG_sol(acetic acid)-DG_sol(formate)]+21.404 kJ/mol
=
(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))+(-189.772804709496*627.503+(23.773-298.15*(59.339/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))-(-189.217943605798*627.503+(15.016-298.15*(56.992/1000)))+(-73.23-9.99+9.32+72.47)+5.116 kcal/mol= 8.11 kcal/mol= 33.93 kJ/mol

Here we don't need to fiddle with standard states or experimental values for solvation of the proton.

pKa= DG*log(e)/(RT)=(33.93*1000)*log(e)/(8.314*298.15)=5.94
which is even worse than before...
We want ca 27 kJ/mol = 6.48 kcal/mol. Paradoxically this may be due to the ab initio approach to the pKa of formic acid actually giving a very reasonable value.

Using Propanoic acid
So let's try the isodesmic approach using propanoic acid as our reference instead.


Propanoic acid


SCFE:  -268.419515785389 Hartree
Enthalpy correction:  60.894 kcal/mol
Entropy correction:  75.184 cal/mol
Solvation energy:  8.15 kcal/mol




Propanoate
SCFE:   -267.857478200414 Hartree
Enthalpy correction: 52.096 kcal/mol
Entropy correction:  75.392 cal/mol
Solvation energy:  72.14 kcal/mol

pKa=4.86 <=> 27.739 kJ/mol= 6.63 kcal/mol

(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))+(-268.419515785389*627.503+(60.894-298.15*(75.184/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))-(-267.857478200414*627.503+(52.096-298.15*(75.392/1000)))+(-73.23-8.15+9.32+72.14)+6.63 kcal/mol=7.4324 kcal/mol=31.097

pKa= DG*log(e)/(RT)=(31.097*1000)*log(e)/(8.314*298.15)=5.45

It's a bit better, but still a bit off.

3. Conclusion:
The isodesmic approach is not magic and it relies on the similarity of two compounds, one for which there are experimental data, causing similar computational issues. Under the right conditions it's a useful approach, whereas under other conditions -- where a body of experimental data exists -- it might just be easier to determine the correlation between experimental and calculated data via fitting.

The approach worked better for the acetate/propanoate pair than the formate/acetate pair -- and one would consider acetic acid and propanoic acid to be more similar than formic acid and the higher acids. We're still far off from obtaining a perfect result though.

An additional problem is obviously the sensitivity of pKa to the DG -- one pH unit is about 1.36 kcal/mol, which is very small given the usual errors in DFT level calculations. I've seen indications online (google!) that the accuracy of b3lyp is about 3 kcal/mol, and one can always debate the accuracy of a highly empirical method like COSMO.






9 comments:

  1. First of all, thank you for sharing this information. I have taken a look on your workflow and I would like to ask you some questions, all of them related to the solvation calculations: (1) Why do you not directly consider the “sol phase energy” as solvation energy?; (2) Why are you only considering the electrostatic component? What about the terms of cavitation and dispersion? (3) For the protonated and deprotonated forms, the electrostatic component is positive. However, in order to calculate DDG(solvation) you have used the DG(A-) with a changed sign. Why? Thank you in advance and looking forward to hearing from you!

    ReplyDelete
    Replies
    1. Thank you for your comment. It's been a long time since I posted this, and back then I knew less than I do now (or rather, back then I didn't know how little I knew), so I might well have made a few mistakes. However, let's see if I can address your questions.

      Question 3 is the most straightforward one to answer:
      we're working with a thermodynamic cycle (see figure above). We start with the solution state for A, remove it from solution (the cost of which is that we lose the solvation energy i.e. -DG(A-)), transform A to B, then solvate B (so we gain +DG(B)).

      DG is a state function, so it doesn't matter which path we take to get DGsolution -- we can either get DGsolution directly (which used to be prohibitively expensive computationally), or indirectly by desolvating A (-DG(A)), doing the reaction in the gas phase (DG(A->B)), and solvation the product (+DG(B)).

      Delete
    2. Answer to Question 1: the 'sol phase energy' is the gas phase energy plus (well, minus) the solvation energy.
      DGsolv=Egas-Esolv=-228.5400463128-(-228.6567490452)=.1167027324
      Using DG vs E is bit unfortunate, but I stuck to the notation that the literature used.

      Note that the structure is optimised in the gas phase, so we are just looking at the lowering in energy that we get from solvating the molecule.

      Delete
    3. Questions 2 is the most difficult one. I actually don't know whether the cavitation energy should be included or not. Looking at http://www.nwchem-sw.org/index.php/Release62:COSMO_Solvation_Model it seems that it should be:
      "It should be noted that one must in general take into account the standard state correction besides the electrostatic and cavitation/dispersion contribution to the solvation free energy, when a comparison to experimental data is made"

      Delete
    4. If you're using Gaussian, the following might be of interest:
      "SMD
      Do an IEFPCM calculation with radii and non-electrostatic terms for Truhlar and coworkers’ SMD solvation model [Marenich09]. This is the recommended choice for computing ΔG of solvation, which accomplished by performing gas phase and SCRF=SMD calculations for the system of interest and taking the difference the resulting energies. "
      See http://www.gaussian.com/g_tech/g_ur/k_scrf.htm

      Delete
  2. Thank you, your post help me to calculate enthalpies and Gibbs energies at different temperatures. However, for considering the effect of solvatation I used SMD which calculates the free energy of solvatation at 298K. How can I calculate it at others temperatures?
    Congratulation for your blog, I frequently read it.

    ReplyDelete
    Replies
    1. This answer is very late reply. You can easily set the temperature by adding keyword temp in freq block. For example,

      freq
      temp 1 298.15 ## default.
      end

      freq
      temp 1 398.15
      end

      freq
      temp 2 298.15 398.15
      end

      Delete
  3. I realise it's been a long time since you published this but I only read this very interesting post today.

    I'm not sure about the correction for the standard state.

    As most if not all QM solvent models were parametrised with reference to experimental values (usually given as molar hydration energies) the corresponding energies are corrected. In the gas phase it's of course a different story, but in reality the dilution will be very high (there want be partial pressures in the order of 1 bar).

    ReplyDelete
  4. I do see that it's a standard to include it though, e.g., https://onlinelibrary.wiley.com/doi/epdf/10.1002/qua.24699

    ReplyDelete