Initial support for parsing xTB output (WIP) by Shualdon · Pull Request #1129 · cclib/cclib

Shualdon · 2022-05-03T18:28:25Z

Hi,

As discussed in #789, I have added initial support for reading xTB output files.
Currently it only supports GFN2 calculations using the latest xTB version (6.4.1). It can probably read older versions, but there are some changes in the text it won't be able to parse, for now.
I added the "atomprop" and "bondprop" attributes (#789), and tried to use as much of the existing attributes as possible.
I tried as much as I can to follow the same style as the other parsers.
I also added some example output files.
This is a WIP, so I would be happy to get feedback.

Omri.

codecov · 2022-05-03T20:17:43Z

Codecov Report

Attention: 199 lines in your changes are missing coverage. Please review.

Comparison is base (d533158) 87.79% compared to head (2cb1669) 86.57%.
Report is 642 commits behind head on master.

❗ Current head 2cb1669 differs from pull request most recent head 02245be. Consider uploading reports for the commit 02245be to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1129      +/-   ##
==========================================
- Coverage   87.79%   86.57%   -1.22%     
==========================================
  Files          64       70       +6     
  Lines       13435    13699     +264     
==========================================
+ Hits        11795    11860      +65     
- Misses       1640     1839     +199

Files	Coverage Δ
cclib/io/ccio.py	`72.28% <100.00%> (+0.11%)`	⬆️
cclib/parser/__init__.py	`100.00% <100.00%> (ø)`
cclib/parser/data.py	`80.80% <ø> (ø)`
cclib/parser/xtbparser.py	`4.78% <4.78%> (ø)`

... and 8 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ghutchis · 2022-05-05T22:35:30Z

I guess the next step would be to add some test coverage? Is there a list of the basic systems to run to add to the data files repo?

awvwgk · 2022-05-14T11:59:45Z

cclib/parser/xtbparser.py

+        # H         5.25469642314449   -2.38603632440806    0.38612072928186
+        # H         3.38728187535945   -4.19429919486669   -0.44692022446750
+        #
+        # For sdf\mol input file - gives sdf\mol output coords


We will always print the final structure in the same format as the input, meaning the format can be any of those listed here.

I guess the parser has to guess based on format after the final structure: line?

Guessing formats can be difficult, therefore I stopped guessing the user input format but only rely on the name of the input file instead in xtb. If it is helpful xtb could write the detected input format as well.

This is how I did it here - guessing the format from the input file (as the output prints it). I only did xyz and mol\sdf formats so far, as those are the formats I mostly use, but I plan to add more in the future.
Having xtb print the format can be useful, but I guess it's going to work similarly to how it is done now.

An easy way to match the detection logic in xtb is to collect the coordinate file from the initial printout and reproduce the suffix / basename selection rules for the file format from the IO library:

https://github.com/grimme-lab/mctc-lib/blob/4e88c5239344dad805c91ae021694891ff0dd372/src/mctc/io/filetype.f90#L71-L128

That's basically how I did it. I just need to add support to more formats.

@Shualdon Im going to work on a more extensive review, but can the parsing of formats be abstracted into a function. I would eventually prefer that they are moved out of this parser to a more general area for reuse later. (Perhaps to the baseclass or something).

That can be done. I will work on it.

awvwgk · 2022-05-14T16:02:56Z

cclib/parser/xtbparser.py

+        #  * total energy  :   -37.3373741 Eh     change       -0.1278963E-08 Eh
+        #    gradient norm :     0.0004128 Eh/α   predicted     0.0000000E+00 (-100.00%)
+        #    displ. norm   :     0.0058476 α      lambda       -0.2852929E-06
+        #    maximum displ.:     0.0054052 α      in ANC's #1, #7, #3, ...


This output is dependent on the optimizer used, xtb currently supports two variants of ANCopt and the RF based one is exchanged for the LBFGS based at 500 atoms. Also, the user can configure this in the input file always use the RF or LBFGS based version.

awvwgk · 2022-05-14T16:05:55Z

cclib/parser/xtbparser.py

+        #   number of atoms            :                    18
+        #   number of electrons        :                    66
+        #   charge                     :                     0
+        #   spin                       :                   0.0
+        #   first test random number   :      0.87181443679343


This output ~~will be~~ is gone in the ~~next~~ latest xtb release.

awvwgk · 2022-05-14T16:08:17Z

cclib/parser/xtbparser.py

+        # Get Molecular Orbitals energies and HOMO index
+        # xTB trunctaes the MO list so we need to take care of that.
+        # Unkown energies will be given NaN as a value


Try running this part of the parser with higher electronic temperature --etemp 5000 or so.

ghutchis · 2022-05-30T23:06:18Z

FYI - https://github.com/nomad-coe/nomad-parser-xtb

berquist · 2023-03-27T19:39:30Z

Hi Omri, is there anything we can do to move this forward? It looks like a lot of work is already done and it shouldn't go to waste.

Shualdon · 2023-03-28T17:32:27Z

Hi, sorry it's been long since I updated. My research is keeping me busy. I would continue to work on this and will have a new PR as soon as I can.

berquist · 2023-03-28T17:57:01Z

There's no rush. I haven't looked at the code in detail, but if it's in a state that is already useful to you and is mostly tidy, I'm inclined to merge it as-is and updates can be made in follow-up PRs.

ghutchis · 2023-05-17T15:58:58Z

Bump. @Shualdon can we merge since it's already somewhat useful?

berquist · 2023-05-18T01:30:36Z

I will hand-check the output, since there are no tests, and if all is good I'll merge what's here for 1.8.

berquist · 2023-05-18T02:19:37Z

Only the LMO examples are parsing right now, giving the following attributes:

  atomprop
  bondprop
  charge
  homos
  metadata
  moenergies
  mult

shivupa

This seems good for now. Future efforts should include

better description of atomprop/bondprop in ccdata
abstract out geometry parsing
based on @awvwgk's review (thank you btw!) this might fail geom opts with different optimizers.
We can open an issue for future xtb parsing development

Shualdon · 2023-05-23T15:01:44Z

Hi everyone, thanks for the feedback and work on getting this PR merged! I'll work on improving it and submit another PR when it's ready.

shivupa · 2023-05-23T15:05:43Z

Hi Omri! I think the plan will be to merge this (as long as there are no objections). The list I included is for future PRs.

berquist · 2023-07-30T16:23:03Z

I will fix the merge conflict and try to get everything here parsing then merge for 1.8.

Andrew-S-Rosen · 2023-10-22T20:54:48Z

@berquist and friends: What's the current status of this PR? Looks like all tests pass but just merge conflicts have to be resolved?

If this is in a fairly respectable state as-is, I'm happy to take a stab at expanding on it (+ adding tests). That might be best suited as a set of follow-up PRs though because I haven't quite decided yet when I'm going to start this effort. Hopefully it'd be soon, but we know how that goes. 😉

shivupa · 2023-10-22T21:24:28Z

I think having a interested user indicates to me that this should be merged asap + add improvements later.

I am ok with that to move things along on your end @Andrew-S-Rosen

Sorry I realize this exactly what I said in this thread months ago, but this slipped through the cracks with version 2 development going on.. (also coming soon™️)

Andrew-S-Rosen · 2023-10-22T21:27:19Z

@shivupa: No need to apologize. We've all been there! That's just how it goes.

I'd certainly be more inclined to dig in a bit if there's interest in getting this merged 😄 I'm going to take a stab at making an ASE-based xTB calculator using cclib for the output parsing and jinja for the input setup, mostly as an excuse to learn jinja and find areas for improvement in the cclib parser. Will probably give that a spin starting sometime next week, if all goes according to plan. I'll add to cclib as appropriate based on my efforts and will prioritize test coverage.

Andrew-S-Rosen · 2023-10-26T06:46:51Z

@shivupa: Alright, I'm committed to this know! 😉

I installed the PR and ran the most recent version of xTB, which produced an xtb.out file that I read with ccread.

First, the good news: no errors were raised! 🥳

Next the not great news: the resulting ccData object is missing many goodies! For instance, I don't see any scfenergies or grads.

Reproduction:

from cclib.io import ccread
cclib_obj = ccread("xtb.out")
print(cclib_obj.getattributes())

xtb.out.txt

Returns the following:

{'atomprop': {'convcn': [0.908, 0.908],
  'q': [0.0, -0.0],
  'c6aa': [3.062, 3.062],
  'alpha': [2.737, 2.737]},
 'bondprop': {'wbo': [[1.0, [1, 1.0]], [1.0, [0, 1.0]]]},
 'homos': array([0], dtype=int32),
 'metadata': {'package': 'XTB',
  'methods': [],
  'success': True,
  'legacy_package_version': '6.6.1',
  'coord_type': 'xyz'},
 'moenergies': [array([-12.8334,   5.2022])]}

So, there's certainly some work to be done! But I'm hopeful it won't be too painful. Perhaps an update to the xTB output file caused a parsing mismatch. I haven't looked at this PR's code yet. I'll keep you posted.

awvwgk · 2023-10-26T11:59:06Z

cclib/parser/xtbparser.py

+        if 'CYCLE' == line.replace('.','').strip()[:5]:
+            scfenergies = []
+            while line.strip()[4:41] != 'GEOMETRY OPTIMIZATION CONVERGED AFTER' or line.strip()[4:44] !='FAILED TO CONVERGE GEOMETRY OPTIMIZATION':
+                line = next(inputfile)
+                if line.strip()[2:14] == 'total energy':
+                    energy = float(line.split()[4])
+                    scfenergies.append(energy)
+                if line.strip()[4:41] == 'GEOMETRY OPTIMIZATION CONVERGED AFTER':
+                    break
+            self.set_attribute('scfenergies', scfenergies)


This will only populate the scfenergies in case a geometry optimization is done, not for any other runmode like single point (default) or hessian.

berquist · 2023-11-15T05:17:53Z

This is superseded by #1296. Andrew has made sure to keep Omri's commits and we will not squash the full branch.

Shualdon added 2 commits May 3, 2022 14:12

adding xtb parser and example files

de3049d

Merge branch 'master' of https://github.com/Shualdon/cclib

2cb1669

shivupa added the parsers label May 3, 2022

shivupa added this to the v1.7.2 milestone May 3, 2022

awvwgk reviewed May 14, 2022

View reviewed changes

langner modified the milestones: v1.7.2, v1.8 May 17, 2022

berquist linked an issue Mar 27, 2023 that may be closed by this pull request

Support for xTB #789

Closed

Andrew-S-Rosen mentioned this pull request Apr 8, 2023

More complete xtb output parsing Quantum-Accelerators/quacc#217

Closed

langner modified the milestones: v1.8, v1.8.1 May 16, 2023

berquist added 2 commits May 17, 2023 22:11

xTB: normalisesym is now a (required) abstract method

0bb3b51

xTB: ensure atomprop and bondprop attributes are available

08942b6

shivupa approved these changes May 23, 2023

View reviewed changes

berquist added the xTB label Jul 30, 2023

berquist removed this from the v1.8.1 milestone Jul 30, 2023

berquist added this to the v1.8 milestone Jul 30, 2023

berquist self-assigned this Jul 30, 2023

berquist added 2 commits July 30, 2023 15:22

Merge remote-tracking branch 'upstream/master' into Shualdon/master

5c61c2b

xTB: tidying

2173605

berquist modified the milestones: v1.8, v1.8.1 Aug 2, 2023

berquist added 2 commits October 24, 2023 22:33

Merge remote-tracking branch 'upstream/master' into Shualdon/master

12af700

xTB: rename class from XTB to xTB

02245be

Andrew-S-Rosen mentioned this pull request Oct 26, 2023

Support for parsing xTB outputs #1296

Merged

awvwgk reviewed Oct 26, 2023

View reviewed changes

berquist closed this Nov 15, 2023

Conversation

Shualdon commented May 3, 2022

Uh oh!

codecov bot commented May 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ghutchis commented May 5, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awvwgk May 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghutchis commented May 30, 2022

Uh oh!

berquist commented Mar 27, 2023

Uh oh!

Shualdon commented Mar 28, 2023

Uh oh!

berquist commented Mar 28, 2023

Uh oh!

ghutchis commented May 17, 2023

Uh oh!

berquist commented May 18, 2023

Uh oh!

berquist commented May 18, 2023

Uh oh!

shivupa left a comment

Choose a reason for hiding this comment

Uh oh!

Shualdon commented May 23, 2023

Uh oh!

shivupa commented May 23, 2023

Uh oh!

berquist commented Jul 30, 2023

Uh oh!

Andrew-S-Rosen commented Oct 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shivupa commented Oct 22, 2023

Uh oh!

Andrew-S-Rosen commented Oct 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Andrew-S-Rosen commented Oct 26, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

berquist commented Nov 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

codecov bot commented May 3, 2022 •

edited

Loading

awvwgk May 14, 2022 •

edited

Loading

Andrew-S-Rosen commented Oct 22, 2023 •

edited

Loading

Andrew-S-Rosen commented Oct 22, 2023 •

edited

Loading