I've written a python script that cam
1. do automatic conversion of nwchem basis set files to .BAS and .POT
2. generate entries that can be added to the category file
What it currently can't do is generate a .pag file.
The python script is not in this post. I'll release it soon though.
The structure:
ECCE stores basis sets in
server/data/Ecce/system/GaussianBasisSetLibrary/.
The number of files associated with a basis set varies, and the way a basis set is set up seems to vary as well depending on who added it.
Each basis set needs at least the following files:
basis.BAS
basis.BAS.meta
.DAV/basis.BAS.pag
.DAV/basis.BAS.dir
In addition, the basis set needs to be added to the correct category by being added to one of the following files:
Charge
correlation_consistent
DFTOrbital
diffuse
ecp
ECPOrbital
Exchange
other_generally_contracted
other_segmented
polarization
pople
rydberg
e.g. 6-31G goes to pople, while LANL2DZ/ECP goes to ECPOrbital.
Looking at the basis set tool in ECCE you have the following categories/subcategories:
Orbital: Pople Shared, Other Segmented, Corr. Consistent, Other Gen. Contr., ECP Orbital, DFT Orbital.
Auxiliary: Polarization, Diffuse, Rydberg.
ECP:
DFT: Charge Fitting, Exchange Fitting.
What it means is that you can 'mix and match' by adding your .BAS or .POT files to different category files (e.g. you can have LANL2DZ dp both ECPOrbital, ecp and polarization, all at the same time. See below for how basis sets can be broken up.
Example: The simple cases: 3-21G, 3-21G*, 3-21++G*
For a basis set like 3-21G there are two files:
3-21G.BAS and
3-21G.BAS.meta.
In addition grep shows that there's an entry in the file
pople for 3-21G.
The .BAS file:
The entry for C in 3-21G.BAS looks like this:
atom=C
contraction shell=S num_primitives=3 num_coefficients=1
172.2560 0.0617669
25.91090 0.358794
5.533350 0.700713
contraction shell=SP num_primitives=2 num_coefficients=2
3.664980 -0.395897 0.236460
0.770545 1.215840 0.860619
contraction shell=SP num_primitives=1 num_coefficients=2
0.195857 1.000000 1.000000
Nothing too strange. For example, the nwchem format for C in 3-21g is:
basis "C_3-21G" CARTESIAN
C S
172.2560000 0.0617669
25.9109000 0.3587940
5.5333500 0.7007130
C SP
3.6649800 -0.3958970 0.2364600
0.7705450 1.2158400 0.8606190
C SP
0.1958570 1.0000000 1.0000000
end
Writing a python script that translates between the two is simple.
The .BAS.meta file:
The
3-21G.BAS.meta file looks like this:
references
Elements References
-------- ----------
H - Ne: J.S. Binkley, J.A. Pople, W.J. Hehre, J. Am. Chem. Soc 102 939 (1980)
Na - Ar: M.S. Gordon, J.S. Binkley, J.A. Pople, W.J. Pietro and W.J. Hehre,
J. Am. Chem. Soc. 104, 2797 (1983).
K - Ca: K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 7, 359 (1986).
Ga - Kr: K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 7, 359 (1986).
Sc - Zn: K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 8, 861 (1987).
Y - Cd: K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 8, 880 (1987).
Cs : A 3-21G quality set derived from the Huzinage MIDI basis sets.
E.D. Glendening and D. Feller, J. Phys. Chem. 99, 3060 (1995)
references
info
3-21G Split Valence Basis
-------------------------
Elements Contraction References
H - He: (3s) -> [2s] J.S. Binkley, J.A. Pople and W.J. Hehre,
Li - Ne: (6s,3p) -> [3s,2p] J. Am. Chem. Soc. 102, 939 (1980).
Na - Ar: (9s,6p) -> [4s,3p] M.S. Gordon, J.S. Binkley, J.A. Pople, W.J.
Pietro and W.J. Hehre, J. Am. Chem. Soc.
104, 2797 (1983)
K - Ca: (12s,9p) -> [5s,4p] K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 7,
Ga - Kr: (12s,9p,3d) -> [5s,4p,1d] 359 (1986).
Sc - Zn: (12s,9p,3d) -> [5s,4p,2d] K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 8,
861 (1987).
Rb - Sr: (15s,12p,3d)-> [6s,5p,1d]
Y - Cd: (15s,12p,6d)-> [6s,5p,3d] K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 8,
In - I: (15s,12p,6d)-> [6s,5p,2d] 880 (1987).
Cs : (18s,12p,6d)-> [6s,5p,2d] A 3-21G quality set derived from the
Huzinage MIDI basis sets.
E.D. Glendening and D. Feller, J. Phys.
Chem. 99, 3060 (1995).
The 3-21G basis set contains the same number of Gaussian primitives as the
STO-3G basis, but the valence electrons are described with two functions per
AO instead of one. In most cases the 3-21G basis set gives results which are
as good as the more expensive 4-31G and 6-31G sets.
3-21G Atomic Energies
ROHF
State UHF (noneq) ROHF (noneq) ROHF(equiv) HF Limit (equiv)
----- ---------- ----------- ----------- ---------
H 2-S -0.496199 -0.496199 -0.496199 -0.50000
He 1-S -2.835680 -2.835680 -2.835680 -2.86168
Li 2-S -7.381513 -7.381513 -7.381513 -7.43273
Be 1-S -14.486820 -14.486820 -14.486820 -14.57302
B 2-P -24.389762 -24.389634 -24.148989 -24.52906
C 3-P -37.481070 -37.480389 -37.480389 -37.68862
N 4-S -54.105390 -54.103658 -54.103658 -54.40094
O 3-P -74.393657 -74.392512 -74.391782 -74.80940
F 2-P -98.845009 -98.844645 -98.844230 -99.40935
Ne 1-S -127.132546 -127.803824 -127.803824 -128.54710
Na 2-S -160.854064 -160.854041 -160.854041 -161.85891
Mg 1-S -198.468103 -198.468103 -198.468103 -199.61463
Al 2-P -240.551046 -240.551024 -240.551010 -241.87671
Si 3-P -287.344431 -287.344419 -287.344393 -288.85436
P 4-S -339.000079 -339.000027 -339.000027 -340.71878
S 3-P -395.551336 -395.551083 -395.550591 -397.50490
Cl 2-P -457.276552 -457.276414 -457.276096 -459.48207
Ar 1-S -524.342962 -524.342962 -524.342962 -526.81751
K 2-S -596.152980 -596.152923 -596.152923 -599.16479
info
comments
2/16/95 - DFF - Modify the format of the literature citation.
12/07/93 - SJB - Add Nb to Xe.
8/4/93 - DFF - Add Y and Zr.
12/2/92 - DFF - Add Rb and Sr.
7/13/90 - DFF - Original creation of this file from MIA basis set library.
comments
Again, most of this can be extracted using a shell/python/perl script from the corresponding 3-21g nwchem basis set file.
The entry for 3-21G in '
pople':
name= 3-21G
files= 3-21G.BAS
atoms= H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs
This simple seems to be a list over the files that describe the basis set and the elements supported. Can be autogenerated using a script.
Intermission: polarization and diffuse orbitals, and ECP.
At this stage it's pretty simple. We now have a rough idea of what's needed. We just need to understand how to expand our basis sets.
For 3-21G* and 3-21++G* the polarisation and diffuse orbitals are separated into 3-21GS-AGG.BAS and 3-21GS.BAS, and 3-21PPGS-AGG.BAS and 3-21GS.BAS, and POPLDIFF.BAS. All -AGG.BAS files are empty, so I'm not sure why they are there.
Anyway, this might make it a bit clearer:
3-21G = 3-21G.BAS
3-21G* = 3-21G.BAS + 3-21GS.BAS
3-21++G* = 3-21G.BAS + 3-21GS.BAS + POPLDIFF
What happens to e.g. pople is this:
name= 3-21G*
files= 3-21GS-AGG.BAS 3-21G.BAS 3-21GS.BAS
atoms= H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar
atoms= Na Mg Al Si P S Cl Ar
name= 3-21++G*
files= 3-21PPGS-AGG.BAS 3-21G.BAS POPLDIFF.BAS 3-21GS.BAS
atoms= H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar
atoms= H Li Be B C N O F Ne Na Mg Al Si P S Cl Ar
atoms= Na Mg Al Si P S Cl
The -AGG.BAS files are empty. The first atoms line corresponds to entries in 3-21G.BAS, while for 3-21G* the second one corresponds to entries in 3-21GS.BAS. Likewise,
atoms= H Li Be B C N O F Ne Na Mg Al Si P S Cl Ar
are entries in POPLDIFF.BAS.
The good news: it's almost identical when it comes to ECP. Here's the
ECPOrbital entry for LANL2DZ:
name= LANL2DZ ECP
files= LANL2DZ.BAS LANL2DZ.POT
atoms= H Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Pb Bi U Np Pu
atoms= Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Pb Bi U Np Pu
and the
ecp entry:
name= LANL2DZ ECP
files= LANL2DZ.POT
atoms= Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Pb Bi U Np Pu
The
POT file is a little bit different from the .BAS file:
atom=Na ncore=10 lmax=2
ecp_potential%l=2%shell=d potential%num_exponents=5
1 175.5502590 -10.0000000
2 35.0516791 -47.4902024
2 7.9060270 -17.2283007
2 2.3365719 -6.0637782
2 0.7799867 -0.7299393
ecp_potential%l=0%shell=s-d potential%num_exponents=5
0 243.3605846 3.0000000
1 41.5764759 36.2847626
2 13.2649167 72.9304880
2 3.6797165 23.8401151
2 0.9764209 6.0123861
ecp_potential%l=1%shell=p-d potential%num_exponents=6
0 1257.2650682 5.0000000
1 189.6248810 117.4495683
2 54.5247759 423.3986704
2 13.7449955 109.3247297
2 3.6813579 31.3701656
2 0.9461106 7.1241813
.DAV files
The good news: the .DAV/basis.dir file is empty.
The bad news: .DAV/basis.pag is a binary file.
I haven't yet figured out the exact structure of it nor the best way to auto-generate it.
I think the best illustration is to show the od -c output for a few .POT.pag files:
LANL2DZ.POT.pag:
0000000 \b \0 371 003 354 003 345 003 340 003 325 003 312 003 302 003
0000020 233 003 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
0001620 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 002 D
0001640 A V : \0 h t t p : / / w w w . e
0001660 m s l . p n l . g o v / e c c e
0001700 : \0 M E T A D A T A \0 A U X I L
0001720 I A R Y \0 1 : c a t e g o r y \0
0001740 \0 e c p \0 1 : t y p e \0 \0 L A N
0001760 L 2 D Z E C P \0 1 : n a m e \0
0002000
SBKJC.POT.pag:
0000000 \b \0 371 003 356 003 347 003 342 003 327 003 314 003 304 003
0000020 235 003 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
0001620 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0
0001640 002 D A V : \0 h t t p : / / w w w
0001660 . e m s l . p n l . g o v / e c
0001700 c e : \0 M E T A D A T A \0 A U X
0001720 I L I A R Y \0 1 : c a t e g o r
0001740 y \0 \0 e c p \0 1 : t y p e \0 \0 S
0001760 B K J C E C P \0 1 : n a m e \0
0002000
Trial and error in making files for def2-svp has shown me that you can copy e.g. LANL2DZ.POT.pag to DEF2_ECP.POT.pag, and edit with vim (use binary mode -b) but that you'll need to add enough spaces to the name so that the files both end at the same place. E.g. this works:
0000000 \b \0 371 003 354 003 345 003 340 003 325 003 312 003 302 003
0000020 233 003 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
0001620 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 002 D
0001640 A V : \0 h t t p : / / w w w . e
0001660 m s l . p n l . g o v / e c c e
0001700 : \0 M E T A D A T A \0 A U X I L
0001720 I A R Y \0 1 : c a t e g o r y \0
0001740 \0 e c p \0 1 : t y p e \0 \0 d e f
0001760 2 - e c p \0 1 : n a m e \0
0002000
but this doesn't (removed a single space at the end of def2-ecp):
0000000 \b \0 371 003 354 003 345 003 340 003 325 003 312 003 302 003
0000020 233 003 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
0001620 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 002 D
0001640 A V : \0 h t t p : / / w w w . e
0001660 m s l . p n l . g o v / e c c e
0001700 : \0 M E T A D A T A \0 A U X I L
0001720 I A R Y \0 1 : c a t e g o r y \0
0001740 \0 e c p \0 1 : t y p e \0 \0 d e f
0001760 2 - e c p \0 1 : n a m e \0
0001777
Note that the names should correspond to the names of the nwchem basis sets and/or files e.g. either 3-21gs or 3-21G*. Or LANL2DZ ECP or lanl2dz_ecp.
As far as I understand the solution will lie in how
WebDAV uses
.pag files. I don't know anything about that just yet though.
Anyway, that's it for now. There's now enough information to write your own scripts.