Skip to content

Initial support for parsing xTB output (WIP)#1129

Closed
Shualdon wants to merge 8 commits intocclib:masterfrom
Shualdon:master
Closed

Initial support for parsing xTB output (WIP)#1129
Shualdon wants to merge 8 commits intocclib:masterfrom
Shualdon:master

Conversation

@Shualdon
Copy link
Contributor

@Shualdon Shualdon commented May 3, 2022

Hi,

As discussed in #789, I have added initial support for reading xTB output files.
Currently it only supports GFN2 calculations using the latest xTB version (6.4.1). It can probably read older versions, but there are some changes in the text it won't be able to parse, for now.
I added the "atomprop" and "bondprop" attributes (#789), and tried to use as much of the existing attributes as possible.
I tried as much as I can to follow the same style as the other parsers.
I also added some example output files.
This is a WIP, so I would be happy to get feedback.

Omri.

@codecov
Copy link

codecov bot commented May 3, 2022

Codecov Report

Attention: 199 lines in your changes are missing coverage. Please review.

Comparison is base (d533158) 87.79% compared to head (2cb1669) 86.57%.
Report is 642 commits behind head on master.

❗ Current head 2cb1669 differs from pull request most recent head 02245be. Consider uploading reports for the commit 02245be to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1129      +/-   ##
==========================================
- Coverage   87.79%   86.57%   -1.22%     
==========================================
  Files          64       70       +6     
  Lines       13435    13699     +264     
==========================================
+ Hits        11795    11860      +65     
- Misses       1640     1839     +199     
Files Coverage Δ
cclib/io/ccio.py 72.28% <100.00%> (+0.11%) ⬆️
cclib/parser/__init__.py 100.00% <100.00%> (ø)
cclib/parser/data.py 80.80% <ø> (ø)
cclib/parser/xtbparser.py 4.78% <4.78%> (ø)

... and 8 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shivupa shivupa added the parsers label May 3, 2022
@shivupa shivupa added this to the v1.7.2 milestone May 3, 2022
@ghutchis
Copy link
Contributor

ghutchis commented May 5, 2022

I guess the next step would be to add some test coverage? Is there a list of the basic systems to run to add to the data files repo?

# H 5.25469642314449 -2.38603632440806 0.38612072928186
# H 3.38728187535945 -4.19429919486669 -0.44692022446750
#
# For sdf\mol input file - gives sdf\mol output coords
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will always print the final structure in the same format as the input, meaning the format can be any of those listed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the parser has to guess based on format after the final structure: line?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guessing formats can be difficult, therefore I stopped guessing the user input format but only rely on the name of the input file instead in xtb. If it is helpful xtb could write the detected input format as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how I did it here - guessing the format from the input file (as the output prints it). I only did xyz and mol\sdf formats so far, as those are the formats I mostly use, but I plan to add more in the future.
Having xtb print the format can be useful, but I guess it's going to work similarly to how it is done now.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An easy way to match the detection logic in xtb is to collect the coordinate file from the initial printout and reproduce the suffix / basename selection rules for the file format from the IO library:

https://github.com/grimme-lab/mctc-lib/blob/4e88c5239344dad805c91ae021694891ff0dd372/src/mctc/io/filetype.f90#L71-L128

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's basically how I did it. I just need to add support to more formats.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shualdon Im going to work on a more extensive review, but can the parsing of formats be abstracted into a function. I would eventually prefer that they are moved out of this parser to a more general area for reuse later. (Perhaps to the baseclass or something).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be done. I will work on it.

Comment on lines +66 to +69
# * total energy : -37.3373741 Eh change -0.1278963E-08 Eh
# gradient norm : 0.0004128 Eh/α predicted 0.0000000E+00 (-100.00%)
# displ. norm : 0.0058476 α lambda -0.2852929E-06
# maximum displ.: 0.0054052 α in ANC's #1, #7, #3, ...
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output is dependent on the optimizer used, xtb currently supports two variants of ANCopt and the RF based one is exchanged for the LBFGS based at 500 atoms. Also, the user can configure this in the input file always use the RF or LBFGS based version.

Comment on lines +37 to +41
# number of atoms : 18
# number of electrons : 66
# charge : 0
# spin : 0.0
# first test random number : 0.87181443679343
Copy link

@awvwgk awvwgk May 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output will be is gone in the next latest xtb release.

Comment on lines +164 to +166
# Get Molecular Orbitals energies and HOMO index
# xTB trunctaes the MO list so we need to take care of that.
# Unkown energies will be given NaN as a value
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try running this part of the parser with higher electronic temperature --etemp 5000 or so.

@langner langner modified the milestones: v1.7.2, v1.8 May 17, 2022
@ghutchis
Copy link
Contributor

@berquist berquist linked an issue Mar 27, 2023 that may be closed by this pull request
@berquist
Copy link
Member

Hi Omri, is there anything we can do to move this forward? It looks like a lot of work is already done and it shouldn't go to waste.

@Shualdon
Copy link
Contributor Author

Hi, sorry it's been long since I updated. My research is keeping me busy. I would continue to work on this and will have a new PR as soon as I can.

@berquist
Copy link
Member

There's no rush. I haven't looked at the code in detail, but if it's in a state that is already useful to you and is mostly tidy, I'm inclined to merge it as-is and updates can be made in follow-up PRs.

@ghutchis
Copy link
Contributor

Bump. @Shualdon can we merge since it's already somewhat useful?

@berquist
Copy link
Member

I will hand-check the output, since there are no tests, and if all is good I'll merge what's here for 1.8.

@berquist
Copy link
Member

Only the LMO examples are parsing right now, giving the following attributes:

  atomprop
  bondprop
  charge
  homos
  metadata
  moenergies
  mult

Copy link
Member

@shivupa shivupa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good for now. Future efforts should include

  • better description of atomprop/bondprop in ccdata
  • abstract out geometry parsing
  • based on @awvwgk's review (thank you btw!) this might fail geom opts with different optimizers.
    We can open an issue for future xtb parsing development

@Shualdon
Copy link
Contributor Author

Hi everyone, thanks for the feedback and work on getting this PR merged! I'll work on improving it and submit another PR when it's ready.

@shivupa
Copy link
Member

shivupa commented May 23, 2023

Hi Omri! I think the plan will be to merge this (as long as there are no objections). The list I included is for future PRs.

@berquist berquist added the xTB label Jul 30, 2023
@berquist berquist removed this from the v1.8.1 milestone Jul 30, 2023
@berquist berquist added this to the v1.8 milestone Jul 30, 2023
@berquist
Copy link
Member

I will fix the merge conflict and try to get everything here parsing then merge for 1.8.

@berquist berquist self-assigned this Jul 30, 2023
@berquist berquist modified the milestones: v1.8, v1.8.1 Aug 2, 2023
@Andrew-S-Rosen
Copy link
Contributor

Andrew-S-Rosen commented Oct 22, 2023

@berquist and friends: What's the current status of this PR? Looks like all tests pass but just merge conflicts have to be resolved?

If this is in a fairly respectable state as-is, I'm happy to take a stab at expanding on it (+ adding tests). That might be best suited as a set of follow-up PRs though because I haven't quite decided yet when I'm going to start this effort. Hopefully it'd be soon, but we know how that goes. 😉

@shivupa
Copy link
Member

shivupa commented Oct 22, 2023

I think having a interested user indicates to me that this should be merged asap + add improvements later.

I am ok with that to move things along on your end @Andrew-S-Rosen

Sorry I realize this exactly what I said in this thread months ago, but this slipped through the cracks with version 2 development going on.. (also coming soon™️)

@Andrew-S-Rosen
Copy link
Contributor

Andrew-S-Rosen commented Oct 22, 2023

@shivupa: No need to apologize. We've all been there! That's just how it goes.

I'd certainly be more inclined to dig in a bit if there's interest in getting this merged 😄 I'm going to take a stab at making an ASE-based xTB calculator using cclib for the output parsing and jinja for the input setup, mostly as an excuse to learn jinja and find areas for improvement in the cclib parser. Will probably give that a spin starting sometime next week, if all goes according to plan. I'll add to cclib as appropriate based on my efforts and will prioritize test coverage.

@Andrew-S-Rosen
Copy link
Contributor

@shivupa: Alright, I'm committed to this know! 😉

I installed the PR and ran the most recent version of xTB, which produced an xtb.out file that I read with ccread.

First, the good news: no errors were raised! 🥳

Next the not great news: the resulting ccData object is missing many goodies! For instance, I don't see any scfenergies or grads.

Reproduction:

from cclib.io import ccread
cclib_obj = ccread("xtb.out")
print(cclib_obj.getattributes())

xtb.out.txt

Returns the following:

{'atomprop': {'convcn': [0.908, 0.908],
  'q': [0.0, -0.0],
  'c6aa': [3.062, 3.062],
  'alpha': [2.737, 2.737]},
 'bondprop': {'wbo': [[1.0, [1, 1.0]], [1.0, [0, 1.0]]]},
 'homos': array([0], dtype=int32),
 'metadata': {'package': 'XTB',
  'methods': [],
  'success': True,
  'legacy_package_version': '6.6.1',
  'coord_type': 'xyz'},
 'moenergies': [array([-12.8334,   5.2022])]}

So, there's certainly some work to be done! But I'm hopeful it won't be too painful. Perhaps an update to the xTB output file caused a parsing mismatch. I haven't looked at this PR's code yet. I'll keep you posted.

Comment on lines +87 to +96
if 'CYCLE' == line.replace('.','').strip()[:5]:
scfenergies = []
while line.strip()[4:41] != 'GEOMETRY OPTIMIZATION CONVERGED AFTER' or line.strip()[4:44] !='FAILED TO CONVERGE GEOMETRY OPTIMIZATION':
line = next(inputfile)
if line.strip()[2:14] == 'total energy':
energy = float(line.split()[4])
scfenergies.append(energy)
if line.strip()[4:41] == 'GEOMETRY OPTIMIZATION CONVERGED AFTER':
break
self.set_attribute('scfenergies', scfenergies)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only populate the scfenergies in case a geometry optimization is done, not for any other runmode like single point (default) or hessian.

@berquist
Copy link
Member

This is superseded by #1296. Andrew has made sure to keep Omri's commits and we will not squash the full branch.

@berquist berquist closed this Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for xTB

7 participants