Support for parsing xTB outputs#1296
Support for parsing xTB outputs#1296berquist merged 78 commits intocclib:masterfrom Andrew-S-Rosen:rosen-xtb
Conversation
|
@berquist, @shivupa: Could you please let me know where (in the test suite) I should be adding the tests for the xTB logfile parsing? Thanks! The development instructions are a little vague about how precisely the logfiles are being tested and what assertions are being applied. Also, looks like we need to add an |
I'd focus on a single point test first. If you have a Line 318 in 91c8d60 That will let the SP test run (https://github.com/cclib/cclib/blob/master/test/data/testSP.py), and then decorate the functions that need skipped for now (or inherit the Generic_SP_test if needed).
Yes, cclib-data is for regression tests only. Unit and integration tests live in this repo. I don't expect you need integration tests yet. We mainly add things there if there was a bug report that leads to a fix. |
|
@awvwgk: I'm giving the preliminary xTB parser a full overhaul since it was a good start but not really in a place I'd feel confident for production. In giving it a refresh, I realized that I remain unsure where (if anywhere) the spin is consistently reported in the xTB logfile. If the user supplied it as an input, e.g. via Do you happen to have any suggestions? If my understanding is indeed correct, then it might just have to be an optional return. |
|
@shivupa: I am pretty much done with the refactoring, updating it for the current version of xTB, bugfixing, and adding features. Mostly just have the automated tests to set up and run now. However, I have one remaining question in the meantime. Namely, what is the ideal approach if the gradients (for the |
|
We should be able to handle multiple files (for example, turbomole requires this). I can provide some details later today about this! |
|
@shivupa: Thanks! A bit of insight would be great when you have a moment 😄 In the meantime, I'll sort out adding some of these tests. Because xTB is not a DFT method, there are a fair number of fields that will need to be skipped, but that's okay. |
|
Sorry traveling. So this works for turbomole I think this should just work straight away for the turbomole parser but you might need to write an xtb specific cclib/cclib/parser/turbomoleparser.py Line 73 in 91c8d60 @oliver-s-lee does that sound right? |
|
Hi folks, the mechanism for parsing from multiple files is relatively simple, it just uses some magic to make all the files appear as one long file. From the parser's point of view, all you need is to define a If calling from the code, you can just specify a list of files as @shivupa demonstrated. From the command line, you need to give the |
|
Thanks!! That's very helpful! I'll give that a go. |
|
Thanks for merging in the RTD fix, @berquist! Haven't forgotten about this PR btw. Had to take a short break after getting a little overwhelmed trying to sort out the logic behind the test suite. 😅 Will be back at it shortly. Edit: Hoping to get back to this in the beginning of December. The main limiting factor is that I don't have a good handle on how the test suite is running. I know how to write tests of course, but things are somewhat optimized in the cclib test suite such that I don't have an immediate idea where to start. I need time to look into it. |
Two questionsThis is at the point where the only thing needed before merging is a set of tests. There are some additional features to add based on parsing multiple files, but that'll be saved for follow-up work. With that said, I have two remaining questions outlined below. Multiple File Parsing@oliver-s-lee: It sounds like cclib operates by concatenating everything in memory to one big file. On the surface, this seems simple, but in practice it leads to some complexities. For instance, xTB outputs a -0.56476049
0.28238024
0.28238024Note that there are no strings in the file that one can use to identify that this is from the With that said, what is the recommended approach? For simplicity, I will probably push this off to a follow-up PR, but it'd still be useful to know. Tests@shivupa, @berquist: I, unfortunately, remain quite confused about how to properly implement tests for the parser. Of course, I understand how to write a test suite on my own that would evaluate the various attributes; however, I don't really have a good grasp on how to do things "the cclib" way since this is not documented anywhere (unless I have missed it). This is especially true because xTB doesn't share many attributes that other codes have (e.g. no basis, for instance). As one point of initial confusion, I can specify I'm assuming y'all would not appreciate if I wrote my own |
|
Hi @Andrew-S-Rosen, great to see the xTB support is progressing so well! With regards to multi-file parsing, yes you are largely correct. A minor (but important) technical difference is that the files are not actually all loaded into memory at once, because this would require a large continuous block of memory that would be very wasteful. Rather, each line is read from each file one-by-one, but there's some magic as you say to make it appear like one long-file. Regardless, the issue of detecting which file is which is an interesting one, and not something we've come up against before I don't think. You can access the name of the current file like this: There might be some fringe cases where this doesn't work, like if we're reading files from a URL where the filename might not be available, but I'd recommend not worrying about those for now :) For testing, I'll let the other guys confirm as they have more experience with the testing suite, but yes creating a custom XTBSPTest class inheriting from |
Yes. I will make an example for you and commit it, then you can expand on that.
Suffer through this for now, but the test suite is being incrementally rewritten to properly follow pytest expectations. One important thing is that |
|
@berquist, @oliver-s-lee: Thank you both for the very helpful comments! I should have a clear path forward with this :) |
|
Should've said this earlier, but I will comment again when I'm done committing to the branch. |
|
Done; my last few commits show how to write customized test cases and skip those test methods whose implementation is missing or doesn't make sense for the program. The strategy is to incrementally remove each decorator as functionality is added or fixed. I'd start with |
|
@berquist: Thanks a ton!! I really appreciate you setting the stage for this. I'll take it from here and patch things up as needed. I'll let you know when it's ready for review. |
|
@berquist: How am I meant to run the tests locally? Doing |
|
The principal driver of tests is https://github.com/cclib/cclib/blob/master/.github/scripts/run_pytest.bash. The key is, from the root directory, python -m pytest -v -s --terse test -k "not test_method"where An alternative that might work is python -m test.test_data --terse XTBwhich uses the old |
|
Since you aren't on Slack, I'll bug you here...I would like to start (and possibly finish) the 1.8.1 release process this coming weekend. Probably both of us would like to see this in but I don't want to merge until you're satisfied with tests. Do you need any additional pointers or help? |
|
@berquist: Thanks for the ping. Admittedly, I have been a bit overwhelmed by the gnarly merge conflicts (see below) and the test suite. I just tried to resolve some of the conflicts, but I ended up making things worse so reverted it. I'm going to put the merge conflicts aside for a moment and try to focus on increasing the test coverage today. If you have a spare moment, getting some help with the conflicts would be huge... some of the files have very large and obscure conflicts e.g. see test/regression.py where the whole file is marked as a conflict, making it hard to tell what to keep/change. |
|
It slipped my mind that you would have to deal with the rebase/merge over the reformatted code. Just looking at the filenames, the reason for most (all?) of these conflicts is actually line endings: these files were CRLF, I converted them all to LF, and Git is incredibly stupid about attempting to resolve this for reasons I don't know. I can't get to this during the week, so hack on tests for now, and ~Saturday I will rebase the whole thing (unless you want to...). Then I will review, since I don't think anyone's done that yet. |
|
@berquist: Ah, thanks! Good to know about the line endings. With that in mind, I might be able to get the conflict resolved. I'll take a stab but won't spend too much time if it's still a mess. Will work on more tests! |
|
If you do want to try tackling it yourself (you probably don't...), you will want to (For the pre-commit command, you can ignore |
This is a follow-up to the unmerged PR from #1129 and expands upon it substantially, using it as a starting point. Most of the content can be found in the xtbparser.py file.
Some improvements made here compared to the prior PR:
gradientsfileThe only aspects I have not carried over are the catchall attributes
atompropandbondpropproposed in #789. If they are of interest, I recommend porting them over from #1129 after this gets merged. I only wanted to deal with official cclib-supported attributes for now.Closes #789
Closes #1129.