Adding sorting to rst output of list_toolchains to match txt and none#3246
Adding sorting to rst output of list_toolchains to match txt and none#3246boegel merged 10 commits intoeasybuilders:developfrom justbennet:sort-list-toolchains
Conversation
When requesting --output-format=rst, the list of available toolchains was not being sorted. That output will be used to create a table of available toolchains for the documentation and needs to be sorted. I wrapped the list of keys in `sorted()` to match the ouput of `eb --list-toolchains` with no options and when `txt` is requested.
|
This fix only sorts the first column but leaves the other columns unchanged (and, on a side note you can only see the problem in Python2). I can't figure out a good way of doing this though... |
|
Ok, this seems to work: |
|
@ocaisa Thanks for finding that it didn't work in Python 2 (I think). It seems like this shouldn't take two sorts, somehow. It also seems like a common approach for both Having just stared at it a bit, it looks like it's producing two colums with the same compiler information, perhaps that redundancy should be removed, too? I will not have time to look today, but I should be able to return to this tomorrow. |
|
The other option is to turn I should clarify that in Python3 the dictionary was already ordered so there isn't the same problem. It was only with Python 2 that I could reproduce the problem. |
|
You can get to it with |
|
I've been looking at this more closely, and there seem to be multiple issues. It appears that the data for the rst table comes out as a bunch of columns, which makes sorting all of them together a key. I think this code might be more complex than it needs to be, too. It looked to me as though your sorting of tcs.items() does not work because the data is extracted into lists in what seems to me to be a complex way. If you look at the current output, does it have two columns for compiler information? I am seeing one for 'compiler' and one for 'COMPILER', both containing the same information. That redundancy seems like it could be eliminated. I also question the value of have a 'CUDA' column that only lists 'CUDA' as its only possible value when that information is also included in the compiler/COMPILER column. I think I would advocate for dropping the CUDA column from the rst output as redundant, as well. I think I have two excerpts here https://gist.github.com/justbennet/916fad2b2c62a02fc063cd96a25fded3 to illustrate this. I propose that trying to refactor the code to make it simpler and more easily read and to eliminate the second compiler column and the CUDA column. I've spent several hours looking at this, and I have an inefficient way to get this done. I think that the scale this might attain will never make that inefficiency matter particularly, but I think that taking an alternate approach to constructing the table might end up with more readable and more easily maintained code and provide the same information. Thoughts? |
|
Actually, I just looked at https://easybuild.readthedocs.io/en/latest/eb_list_toolchains.html again, and the format of the included table is quite different from the output of Should the output of That seems to have the same information as in the current output but seems more cleanly formatted and easier to read. |
|
@boegel I don't really know how the docs get generated, and I don't think I'd be the one to make decisions on how that table should look but there what comes out of this function does indeed have redundancy and the documentation page is markedly different. |
|
I'll pick up on this again. The current proposed change only sorts the first column, so it definitely needs a bot more work. We can just sort the toolchain names once, and then walk through the The page in the documentation hasn't been updated for ages, because it's not getting updated by the |
|
Some thoughts on this:
|
|
To get a good idea of what the internal data structure looks like that holds all the toolchain details, making this temporary change in the code this is useful: diff --git a/easybuild/tools/docs.py b/easybuild/tools/docs.py
index 1d1342e2c..a5d6a8ae1 100644
--- a/easybuild/tools/docs.py
+++ b/easybuild/tools/docs.py
@@ -744,6 +744,9 @@ def list_toolchains_rst(tcs):
""" Returns overview of all toolchains in rst format """
title = "List of known toolchains"
+ import pprint
+ pprint.pprint(tcs)
+
# figure out column names
table_titles = ['name', 'compiler', 'MPI']
for tc in tcs.values():With that in place, It would be nice if it could produce output like this: |
|
Ace! I'll try to make that so. |
|
Please correct me if I am wrong, but if there is no BLAS, then there is no LAPACK, and if there is no LAPACK, then there can be no ScaLAPACK. Those are all linear algebra, so I took the liberty of combining them into a single column. I have something now that is modified to produce a table that begins with this. So, you get no duplicate compiler column, no redundant CUDA column, sorted case-insensitive on name. I do not know what is up with the Cray entries. I am using and that works as expected with In any event, I have something that I believe is close to what you want, and seems to mostly work. I tested with python 2.7.5 and 3.6.8. If this looks good enough, I can update the PR with this version of |
Per the conversation in PR, this commit revises the .rst output that results in the following changes. Output is sorted case-insensitively on toolchain name The duplicate compiler column has been removed. The BLAS/LAPACK/ScaLAPACK entries have been consolidated into a single column Missing entries have been replaced with '(*none*)'. Blank entries remain blank (maybe a change to the toolchain to be consistent with the other toolchains?)
|
Full examples of two alternate formats are at. https://gist.github.com/justbennet/f5a405a61665feef7bde1df6b39ae5fc |
Using modified code suggested by Kenneth Hoste in the PR request. Added exceptions for empty Cray toolchain entries.
boegel
left a comment
There was a problem hiding this comment.
Some more minor suggestions, close to being done!
Ideally, we also ensure that the new format produced is covered by the tests as well...
Let me know if you need help with that.
| # Create text placeholder to use for missing entries. | ||
| # Not good documentation style to italicize the parentheses enclosing | ||
| # italic text unless surrouning text is also italic. | ||
| none_txt = '(*none*)' |
There was a problem hiding this comment.
also making the parentheses italic was deliberate actually, I think it looks awkward otherwise...
compare '(none)' (fully italic) with '(none)' (partial italic)
anyway, this is bikeshedding, definitely not a blocker for this PR :)
There was a problem hiding this comment.
I was trained as a typesetter and spent years as a proofreader and (English) copy editor, and I have the opposite reaction to (none) and (none)
https://english.stackexchange.com/questions/9878/should-punctuation-surrounding-italicised-words-be-italicised
But, this is your baby, so I will do it your way.
There was a problem hiding this comment.
Oh boy, please don't agree with me if you're better trained than me in this stuff! :)
I was just going by feeling, if there are clear arguments against using *(none)*, then please don't!
There was a problem hiding this comment.
If it would lead to inconsistencies elsewhere, it's fine. It's just a convention. ;-) Just trying to explain why my instincts are what they are. And that it was not entirely idiosyncratic. ;-)
I suspect that most everyone else on the project would do what you did and put the *s around the parentheses, and it's probably more important to be consistent throughout. I changed them, so let's not change back and get this out the door. :-)
| for tc in tcs.values(): | ||
| table_titles.extend(tc.keys()) | ||
| # Specify the column names for the table | ||
| table_titles = ['name', 'COMPILER', 'MPI', 'linalg', 'FFT'] |
There was a problem hiding this comment.
Should be 'compiler' to ensure correct column name is used in the end, no?
Now you get COMPILER as column name, I think...
There was a problem hiding this comment.
Ah, you are right. I think table_titles is OK, but the change should be to
col_names = {
'name': 'Name',
'COMPILER': 'Compiler(s)',
'linalg': "Linear algebra",
}
to match the change you requested to table_titles before. Sorry. Will do that.
There was a problem hiding this comment.
I think it looks awkward to have name vs COMPILER in table_titles.
For MPI & FFT it's OK, since those are abbreviations.
There was a problem hiding this comment.
They are just placeholders internal to the code. How about all of the entries in table_titles go upper, and the col_names are used to set the headings in the display? I tested that now, and that works, and that will make things consistent?
There was a problem hiding this comment.
Same with linalg, then? So,
table_titles = ['NAME', 'COMPILER', 'MPI', 'LINALG', 'FFT']
. . .
col_names = {
'NAME': 'Name',
'COMPILER': 'Compiler(s)',
'LINALG': "Linear algebra",
}
and all references in the code to the text contents of table_titles would have the upper version.
There was a problem hiding this comment.
+1 for consistency, so going all upper is fine by me
|
@justbennet I've added a test for the output produced by It fails now, but with good reason imho, for the following cases:
To run the test yourself locally, try using this command: If the tests for If the test still fails, you'll get a Let me know if you need any help! |
Needed a nub() to eliminated duplicate '*(none)*'s Needed to add an 'or none_txt' to deal with incorrect values being returned from the system toolchain.
|
Thanks a lot for your efforts on this @justbennet! |
When requesting
--output-format=rst, the list of available toolchains wasnot being sorted. That output will be used to create a table of available
toolchains for the documentation and needs to be sorted. I wrapped the
list of keys in
sorted()to match the ouput ofeb --list-toolchainswith no options and when
txtis requested.