Initial support for parsing xTB output (WIP)#1129
Initial support for parsing xTB output (WIP)#1129Shualdon wants to merge 8 commits intocclib:masterfrom
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1129 +/- ##
==========================================
- Coverage 87.79% 86.57% -1.22%
==========================================
Files 64 70 +6
Lines 13435 13699 +264
==========================================
+ Hits 11795 11860 +65
- Misses 1640 1839 +199
☔ View full report in Codecov by Sentry. |
|
I guess the next step would be to add some test coverage? Is there a list of the basic systems to run to add to the data files repo? |
| # H 5.25469642314449 -2.38603632440806 0.38612072928186 | ||
| # H 3.38728187535945 -4.19429919486669 -0.44692022446750 | ||
| # | ||
| # For sdf\mol input file - gives sdf\mol output coords |
There was a problem hiding this comment.
We will always print the final structure in the same format as the input, meaning the format can be any of those listed here.
There was a problem hiding this comment.
I guess the parser has to guess based on format after the final structure: line?
There was a problem hiding this comment.
Guessing formats can be difficult, therefore I stopped guessing the user input format but only rely on the name of the input file instead in xtb. If it is helpful xtb could write the detected input format as well.
There was a problem hiding this comment.
This is how I did it here - guessing the format from the input file (as the output prints it). I only did xyz and mol\sdf formats so far, as those are the formats I mostly use, but I plan to add more in the future.
Having xtb print the format can be useful, but I guess it's going to work similarly to how it is done now.
There was a problem hiding this comment.
An easy way to match the detection logic in xtb is to collect the coordinate file from the initial printout and reproduce the suffix / basename selection rules for the file format from the IO library:
There was a problem hiding this comment.
That's basically how I did it. I just need to add support to more formats.
There was a problem hiding this comment.
@Shualdon Im going to work on a more extensive review, but can the parsing of formats be abstracted into a function. I would eventually prefer that they are moved out of this parser to a more general area for reuse later. (Perhaps to the baseclass or something).
There was a problem hiding this comment.
That can be done. I will work on it.
| # * total energy : -37.3373741 Eh change -0.1278963E-08 Eh | ||
| # gradient norm : 0.0004128 Eh/α predicted 0.0000000E+00 (-100.00%) | ||
| # displ. norm : 0.0058476 α lambda -0.2852929E-06 | ||
| # maximum displ.: 0.0054052 α in ANC's #1, #7, #3, ... |
There was a problem hiding this comment.
This output is dependent on the optimizer used, xtb currently supports two variants of ANCopt and the RF based one is exchanged for the LBFGS based at 500 atoms. Also, the user can configure this in the input file always use the RF or LBFGS based version.
| # number of atoms : 18 | ||
| # number of electrons : 66 | ||
| # charge : 0 | ||
| # spin : 0.0 | ||
| # first test random number : 0.87181443679343 |
There was a problem hiding this comment.
This output will be is gone in the next latest xtb release.
| # Get Molecular Orbitals energies and HOMO index | ||
| # xTB trunctaes the MO list so we need to take care of that. | ||
| # Unkown energies will be given NaN as a value |
There was a problem hiding this comment.
Try running this part of the parser with higher electronic temperature --etemp 5000 or so.
|
Hi Omri, is there anything we can do to move this forward? It looks like a lot of work is already done and it shouldn't go to waste. |
|
Hi, sorry it's been long since I updated. My research is keeping me busy. I would continue to work on this and will have a new PR as soon as I can. |
|
There's no rush. I haven't looked at the code in detail, but if it's in a state that is already useful to you and is mostly tidy, I'm inclined to merge it as-is and updates can be made in follow-up PRs. |
|
Bump. @Shualdon can we merge since it's already somewhat useful? |
|
I will hand-check the output, since there are no tests, and if all is good I'll merge what's here for 1.8. |
|
Only the LMO examples are parsing right now, giving the following attributes: |
shivupa
left a comment
There was a problem hiding this comment.
This seems good for now. Future efforts should include
- better description of atomprop/bondprop in ccdata
- abstract out geometry parsing
- based on @awvwgk's review (thank you btw!) this might fail geom opts with different optimizers.
We can open an issue for future xtb parsing development
|
Hi everyone, thanks for the feedback and work on getting this PR merged! I'll work on improving it and submit another PR when it's ready. |
|
Hi Omri! I think the plan will be to merge this (as long as there are no objections). The list I included is for future PRs. |
|
I will fix the merge conflict and try to get everything here parsing then merge for 1.8. |
|
@berquist and friends: What's the current status of this PR? Looks like all tests pass but just merge conflicts have to be resolved? If this is in a fairly respectable state as-is, I'm happy to take a stab at expanding on it (+ adding tests). That might be best suited as a set of follow-up PRs though because I haven't quite decided yet when I'm going to start this effort. Hopefully it'd be soon, but we know how that goes. 😉 |
|
I think having a interested user indicates to me that this should be merged asap + add improvements later. I am ok with that to move things along on your end @Andrew-S-Rosen Sorry I realize this exactly what I said in this thread months ago, but this slipped through the cracks with version 2 development going on.. (also coming soon™️) |
|
@shivupa: No need to apologize. We've all been there! That's just how it goes. I'd certainly be more inclined to dig in a bit if there's interest in getting this merged 😄 I'm going to take a stab at making an ASE-based xTB calculator using cclib for the output parsing and jinja for the input setup, mostly as an excuse to learn jinja and find areas for improvement in the cclib parser. Will probably give that a spin starting sometime next week, if all goes according to plan. I'll add to cclib as appropriate based on my efforts and will prioritize test coverage. |
|
@shivupa: Alright, I'm committed to this know! 😉 I installed the PR and ran the most recent version of xTB, which produced an First, the good news: no errors were raised! 🥳 Next the not great news: the resulting Reproduction: from cclib.io import ccread
cclib_obj = ccread("xtb.out")
print(cclib_obj.getattributes())Returns the following: {'atomprop': {'convcn': [0.908, 0.908],
'q': [0.0, -0.0],
'c6aa': [3.062, 3.062],
'alpha': [2.737, 2.737]},
'bondprop': {'wbo': [[1.0, [1, 1.0]], [1.0, [0, 1.0]]]},
'homos': array([0], dtype=int32),
'metadata': {'package': 'XTB',
'methods': [],
'success': True,
'legacy_package_version': '6.6.1',
'coord_type': 'xyz'},
'moenergies': [array([-12.8334, 5.2022])]}So, there's certainly some work to be done! But I'm hopeful it won't be too painful. Perhaps an update to the xTB output file caused a parsing mismatch. I haven't looked at this PR's code yet. I'll keep you posted. |
| if 'CYCLE' == line.replace('.','').strip()[:5]: | ||
| scfenergies = [] | ||
| while line.strip()[4:41] != 'GEOMETRY OPTIMIZATION CONVERGED AFTER' or line.strip()[4:44] !='FAILED TO CONVERGE GEOMETRY OPTIMIZATION': | ||
| line = next(inputfile) | ||
| if line.strip()[2:14] == 'total energy': | ||
| energy = float(line.split()[4]) | ||
| scfenergies.append(energy) | ||
| if line.strip()[4:41] == 'GEOMETRY OPTIMIZATION CONVERGED AFTER': | ||
| break | ||
| self.set_attribute('scfenergies', scfenergies) |
There was a problem hiding this comment.
This will only populate the scfenergies in case a geometry optimization is done, not for any other runmode like single point (default) or hessian.
|
This is superseded by #1296. Andrew has made sure to keep Omri's commits and we will not squash the full branch. |
Hi,
As discussed in #789, I have added initial support for reading xTB output files.
Currently it only supports GFN2 calculations using the latest xTB version (6.4.1). It can probably read older versions, but there are some changes in the text it won't be able to parse, for now.
I added the "atomprop" and "bondprop" attributes (#789), and tried to use as much of the existing attributes as possible.
I tried as much as I can to follow the same style as the other parsers.
I also added some example output files.
This is a WIP, so I would be happy to get feedback.
Omri.