1. do automatic conversion of nwchem basis set files to .BAS and .POT
2. generate entries that can be added to the category file
What it currently can't do is generate a .pag file.
The python script is not in this post. I'll release it soon though.
The structure:
ECCE stores basis sets in server/data/Ecce/system/GaussianBasisSetLibrary/.
The number of files associated with a basis set varies, and the way a basis set is set up seems to vary as well depending on who added it.
Each basis set needs at least the following files:
basis.BAS basis.BAS.meta .DAV/basis.BAS.pag .DAV/basis.BAS.dirIn addition, the basis set needs to be added to the correct category by being added to one of the following files:
e.g. 6-31G goes to pople, while LANL2DZ/ECP goes to ECPOrbital.Charge correlation_consistent DFTOrbital diffuse ecp ECPOrbital Exchange other_generally_contracted other_segmented polarization pople rydberg
Looking at the basis set tool in ECCE you have the following categories/subcategories:
What it means is that you can 'mix and match' by adding your .BAS or .POT files to different category files (e.g. you can have LANL2DZ dp both ECPOrbital, ecp and polarization, all at the same time. See below for how basis sets can be broken up.Orbital: Pople Shared, Other Segmented, Corr. Consistent, Other Gen. Contr., ECP Orbital, DFT Orbital. Auxiliary: Polarization, Diffuse, Rydberg. ECP: DFT: Charge Fitting, Exchange Fitting.
Example: The simple cases: 3-21G, 3-21G*, 3-21++G*
For a basis set like 3-21G there are two files: 3-21G.BAS and 3-21G.BAS.meta.
In addition grep shows that there's an entry in the file pople for 3-21G.
The .BAS file:
The entry for C in 3-21G.BAS looks like this:
Nothing too strange. For example, the nwchem format for C in 3-21g is:atom=C contraction shell=S num_primitives=3 num_coefficients=1 172.2560 0.0617669 25.91090 0.358794 5.533350 0.700713 contraction shell=SP num_primitives=2 num_coefficients=2 3.664980 -0.395897 0.236460 0.770545 1.215840 0.860619 contraction shell=SP num_primitives=1 num_coefficients=2 0.195857 1.000000 1.000000
Writing a python script that translates between the two is simple.basis "C_3-21G" CARTESIAN C S 172.2560000 0.0617669 25.9109000 0.3587940 5.5333500 0.7007130 C SP 3.6649800 -0.3958970 0.2364600 0.7705450 1.2158400 0.8606190 C SP 0.1958570 1.0000000 1.0000000 end
The .BAS.meta file:
The 3-21G.BAS.meta file looks like this:
Again, most of this can be extracted using a shell/python/perl script from the corresponding 3-21g nwchem basis set file.references Elements References -------- ---------- H - Ne: J.S. Binkley, J.A. Pople, W.J. Hehre, J. Am. Chem. Soc 102 939 (1980) Na - Ar: M.S. Gordon, J.S. Binkley, J.A. Pople, W.J. Pietro and W.J. Hehre, J. Am. Chem. Soc. 104, 2797 (1983). K - Ca: K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 7, 359 (1986). Ga - Kr: K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 7, 359 (1986). Sc - Zn: K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 8, 861 (1987). Y - Cd: K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 8, 880 (1987). Cs : A 3-21G quality set derived from the Huzinage MIDI basis sets. E.D. Glendening and D. Feller, J. Phys. Chem. 99, 3060 (1995) references info 3-21G Split Valence Basis ------------------------- Elements Contraction References H - He: (3s) -> [2s] J.S. Binkley, J.A. Pople and W.J. Hehre, Li - Ne: (6s,3p) -> [3s,2p] J. Am. Chem. Soc. 102, 939 (1980). Na - Ar: (9s,6p) -> [4s,3p] M.S. Gordon, J.S. Binkley, J.A. Pople, W.J. Pietro and W.J. Hehre, J. Am. Chem. Soc. 104, 2797 (1983) K - Ca: (12s,9p) -> [5s,4p] K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 7, Ga - Kr: (12s,9p,3d) -> [5s,4p,1d] 359 (1986). Sc - Zn: (12s,9p,3d) -> [5s,4p,2d] K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 8, 861 (1987). Rb - Sr: (15s,12p,3d)-> [6s,5p,1d] Y - Cd: (15s,12p,6d)-> [6s,5p,3d] K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 8, In - I: (15s,12p,6d)-> [6s,5p,2d] 880 (1987). Cs : (18s,12p,6d)-> [6s,5p,2d] A 3-21G quality set derived from the Huzinage MIDI basis sets. E.D. Glendening and D. Feller, J. Phys. Chem. 99, 3060 (1995). The 3-21G basis set contains the same number of Gaussian primitives as the STO-3G basis, but the valence electrons are described with two functions per AO instead of one. In most cases the 3-21G basis set gives results which are as good as the more expensive 4-31G and 6-31G sets. 3-21G Atomic Energies ROHF State UHF (noneq) ROHF (noneq) ROHF(equiv) HF Limit (equiv) ----- ---------- ----------- ----------- --------- H 2-S -0.496199 -0.496199 -0.496199 -0.50000 He 1-S -2.835680 -2.835680 -2.835680 -2.86168 Li 2-S -7.381513 -7.381513 -7.381513 -7.43273 Be 1-S -14.486820 -14.486820 -14.486820 -14.57302 B 2-P -24.389762 -24.389634 -24.148989 -24.52906 C 3-P -37.481070 -37.480389 -37.480389 -37.68862 N 4-S -54.105390 -54.103658 -54.103658 -54.40094 O 3-P -74.393657 -74.392512 -74.391782 -74.80940 F 2-P -98.845009 -98.844645 -98.844230 -99.40935 Ne 1-S -127.132546 -127.803824 -127.803824 -128.54710 Na 2-S -160.854064 -160.854041 -160.854041 -161.85891 Mg 1-S -198.468103 -198.468103 -198.468103 -199.61463 Al 2-P -240.551046 -240.551024 -240.551010 -241.87671 Si 3-P -287.344431 -287.344419 -287.344393 -288.85436 P 4-S -339.000079 -339.000027 -339.000027 -340.71878 S 3-P -395.551336 -395.551083 -395.550591 -397.50490 Cl 2-P -457.276552 -457.276414 -457.276096 -459.48207 Ar 1-S -524.342962 -524.342962 -524.342962 -526.81751 K 2-S -596.152980 -596.152923 -596.152923 -599.16479 info comments 2/16/95 - DFF - Modify the format of the literature citation. 12/07/93 - SJB - Add Nb to Xe. 8/4/93 - DFF - Add Y and Zr. 12/2/92 - DFF - Add Rb and Sr. 7/13/90 - DFF - Original creation of this file from MIA basis set library. comments
The entry for 3-21G in 'pople':
This simple seems to be a list over the files that describe the basis set and the elements supported. Can be autogenerated using a script.name= 3-21G files= 3-21G.BAS atoms= H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs
Intermission: polarization and diffuse orbitals, and ECP.
At this stage it's pretty simple. We now have a rough idea of what's needed. We just need to understand how to expand our basis sets.
For 3-21G* and 3-21++G* the polarisation and diffuse orbitals are separated into 3-21GS-AGG.BAS and 3-21GS.BAS, and 3-21PPGS-AGG.BAS and 3-21GS.BAS, and POPLDIFF.BAS. All -AGG.BAS files are empty, so I'm not sure why they are there.
Anyway, this might make it a bit clearer:
What happens to e.g. pople is this:3-21G = 3-21G.BAS 3-21G* = 3-21G.BAS + 3-21GS.BAS 3-21++G* = 3-21G.BAS + 3-21GS.BAS + POPLDIFF
The -AGG.BAS files are empty. The first atoms line corresponds to entries in 3-21G.BAS, while for 3-21G* the second one corresponds to entries in 3-21GS.BAS. Likewise,name= 3-21G* files= 3-21GS-AGG.BAS 3-21G.BAS 3-21GS.BAS atoms= H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar atoms= Na Mg Al Si P S Cl Ar name= 3-21++G* files= 3-21PPGS-AGG.BAS 3-21G.BAS POPLDIFF.BAS 3-21GS.BAS atoms= H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar atoms= H Li Be B C N O F Ne Na Mg Al Si P S Cl Ar atoms= Na Mg Al Si P S Cl
are entries in POPLDIFF.BAS.atoms= H Li Be B C N O F Ne Na Mg Al Si P S Cl Ar
The good news: it's almost identical when it comes to ECP. Here's the ECPOrbital entry for LANL2DZ:
and the ecp entry:name= LANL2DZ ECP files= LANL2DZ.BAS LANL2DZ.POT atoms= H Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Pb Bi U Np Pu atoms= Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Pb Bi U Np Pu
name= LANL2DZ ECP files= LANL2DZ.POT atoms= Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Pb Bi U Np Pu
The POT file is a little bit different from the .BAS file:
atom=Na ncore=10 lmax=2 ecp_potential%l=2%shell=d potential%num_exponents=5 1 175.5502590 -10.0000000 2 35.0516791 -47.4902024 2 7.9060270 -17.2283007 2 2.3365719 -6.0637782 2 0.7799867 -0.7299393 ecp_potential%l=0%shell=s-d potential%num_exponents=5 0 243.3605846 3.0000000 1 41.5764759 36.2847626 2 13.2649167 72.9304880 2 3.6797165 23.8401151 2 0.9764209 6.0123861 ecp_potential%l=1%shell=p-d potential%num_exponents=6 0 1257.2650682 5.0000000 1 189.6248810 117.4495683 2 54.5247759 423.3986704 2 13.7449955 109.3247297 2 3.6813579 31.3701656 2 0.9461106 7.1241813
.DAV files
The good news: the .DAV/basis.dir file is empty.
The bad news: .DAV/basis.pag is a binary file.
I haven't yet figured out the exact structure of it nor the best way to auto-generate it.
I think the best illustration is to show the od -c output for a few .POT.pag files:
LANL2DZ.POT.pag:
SBKJC.POT.pag:0000000 \b \0 371 003 354 003 345 003 340 003 325 003 312 003 302 003 0000020 233 003 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001620 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 002 D 0001640 A V : \0 h t t p : / / w w w . e 0001660 m s l . p n l . g o v / e c c e 0001700 : \0 M E T A D A T A \0 A U X I L 0001720 I A R Y \0 1 : c a t e g o r y \0 0001740 \0 e c p \0 1 : t y p e \0 \0 L A N 0001760 L 2 D Z E C P \0 1 : n a m e \0 0002000
0000000 \b \0 371 003 356 003 347 003 342 003 327 003 314 003 304 003 0000020 235 003 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001620 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 0001640 002 D A V : \0 h t t p : / / w w w 0001660 . e m s l . p n l . g o v / e c 0001700 c e : \0 M E T A D A T A \0 A U X 0001720 I L I A R Y \0 1 : c a t e g o r 0001740 y \0 \0 e c p \0 1 : t y p e \0 \0 S 0001760 B K J C E C P \0 1 : n a m e \0 0002000
Trial and error in making files for def2-svp has shown me that you can copy e.g. LANL2DZ.POT.pag to DEF2_ECP.POT.pag, and edit with vim (use binary mode -b) but that you'll need to add enough spaces to the name so that the files both end at the same place. E.g. this works:
but this doesn't (removed a single space at the end of def2-ecp):0000000 \b \0 371 003 354 003 345 003 340 003 325 003 312 003 302 003 0000020 233 003 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001620 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 002 D 0001640 A V : \0 h t t p : / / w w w . e 0001660 m s l . p n l . g o v / e c c e 0001700 : \0 M E T A D A T A \0 A U X I L 0001720 I A R Y \0 1 : c a t e g o r y \0 0001740 \0 e c p \0 1 : t y p e \0 \0 d e f 0001760 2 - e c p \0 1 : n a m e \0 0002000
Note that the names should correspond to the names of the nwchem basis sets and/or files e.g. either 3-21gs or 3-21G*. Or LANL2DZ ECP or lanl2dz_ecp.0000000 \b \0 371 003 354 003 345 003 340 003 325 003 312 003 302 003 0000020 233 003 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001620 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 002 D 0001640 A V : \0 h t t p : / / w w w . e 0001660 m s l . p n l . g o v / e c c e 0001700 : \0 M E T A D A T A \0 A U X I L 0001720 I A R Y \0 1 : c a t e g o r y \0 0001740 \0 e c p \0 1 : t y p e \0 \0 d e f 0001760 2 - e c p \0 1 : n a m e \0 0001777
As far as I understand the solution will lie in how WebDAV uses .pag files. I don't know anything about that just yet though.
Anyway, that's it for now. There's now enough information to write your own scripts.