Adding sorting to rst output of list_toolchains to match txt and none by justbennet · Pull Request #3246 · easybuilders/easybuild-framework

justbennet · 2020-03-14T19:11:28Z

When requesting --output-format=rst, the list of available toolchains was
not being sorted. That output will be used to create a table of available
toolchains for the documentation and needs to be sorted. I wrapped the
list of keys in sorted() to match the ouput of eb --list-toolchains
with no options and when txt is requested.

When requesting --output-format=rst, the list of available toolchains was not being sorted. That output will be used to create a table of available toolchains for the documentation and needs to be sorted. I wrapped the list of keys in `sorted()` to match the ouput of `eb --list-toolchains` with no options and when `txt` is requested.

ocaisa · 2020-03-17T09:25:27Z

This fix only sorts the first column but leaves the other columns unchanged (and, on a side note you can only see the problem in Python2). I can't figure out a good way of doing this though...

ocaisa · 2020-03-17T09:33:36Z

Ok, this seems to work:

     table_values[0] = ['**%s**' % tcname for tcname in sorted(tcs.keys())]      

     for idx in range(1, len(table_titles)):                                     
         for _, tc in sorted(tcs.items()):

justbennet · 2020-03-17T11:21:05Z

@ocaisa Thanks for finding that it didn't work in Python 2 (I think).

It seems like this shouldn't take two sorts, somehow. It also seems like a common approach for both rst and txt would be desirable, at least for sorting the entries, as it's the same data. Maybe that's not feasible because of the extra columns.

Having just stared at it a bit, it looks like it's producing two colums with the same compiler information, perhaps that redundancy should be removed, too?

I will not have time to look today, but I should be able to return to this tomorrow.

ocaisa · 2020-03-17T12:36:51Z

The other option is to turn tcs into an OrderedDict type at the beginning. I agree that is probably a cleaner solution.

I should clarify that in Python3 the dictionary was already ordered so there isn't the same problem. It was only with Python 2 that I could reproduce the problem.

ocaisa · 2020-03-18T07:44:41Z

You can get to it with

from easybuild.tools.py2vs3 import OrderedDict

justbennet · 2020-03-18T22:39:18Z

Hi, @ocaisa and @boegel,

I've been looking at this more closely, and there seem to be multiple issues. It appears that the data for the rst table comes out as a bunch of columns, which makes sorting all of them together a key. I think this code might be more complex than it needs to be, too.

It looked to me as though your sorting of tcs.items() does not work because the data is extracted into lists in what seems to me to be a complex way.

If you look at the current output, does it have two columns for compiler information? I am seeing one for 'compiler' and one for 'COMPILER', both containing the same information. That redundancy seems like it could be eliminated. I also question the value of have a 'CUDA' column that only lists 'CUDA' as its only possible value when that information is also included in the compiler/COMPILER column. I think I would advocate for dropping the CUDA column from the rst output as redundant, as well.

I think I have two excerpts here

https://gist.github.com/justbennet/916fad2b2c62a02fc063cd96a25fded3

to illustrate this.

I propose that trying to refactor the code to make it simpler and more easily read and to eliminate the second compiler column and the CUDA column.

I've spent several hours looking at this, and I have an inefficient way to get this done. I think that the scale this might attain will never make that inefficiency matter particularly, but I think that taking an alternate approach to constructing the table might end up with more readable and more easily maintained code and provide the same information.

Thoughts?

justbennet · 2020-03-19T00:19:45Z

Actually, I just looked at https://easybuild.readthedocs.io/en/latest/eb_list_toolchains.html again, and the format of the included table is quite different from the output of eb --list-toolchains.

Toolchain     Components that comprise the toolchain
--------------------------------------------------------
Name          Compilers    MPI stack  Included Libraries
--------------------------------------------------------
ClangGCC      Clang/GCC
GCC           GCC
cgmpich       Clang/GCC    MPICH
cgmpolf       Clang/GCC    MPICH      OpenBLAS/LAPACK, ScaLAPACK(/BLACS), FFTW

Should the output of eb --list-toolchains --output-format match that in the old file currently being included, which is https://raw.githubusercontent.com/easybuilders/easybuild/develop/docs/version-specific/eb_list_toolchains.txt

That seems to have the same information as in the current output but seems more cleanly formatted and easier to read.

ocaisa · 2020-03-23T08:17:31Z

@boegel I don't really know how the docs get generated, and I don't think I'd be the one to make decisions on how that table should look but there what comes out of this function does indeed have redundancy and the documentation page is markedly different.

boegel · 2020-04-08T08:42:20Z

I'll pick up on this again. The current proposed change only sorts the first column, so it definitely needs a bot more work.

We can just sort the toolchain names once, and then walk through the tcs dict using the sorted keys (toolchain names), no need to sort twice.

The page in the documentation hasn't been updated for ages, because it's not getting updated by the update-all-docs.sh in https://github.com/easybuilders/easybuild/tree/develop/docs/scripts.

boegel · 2020-04-18T12:53:30Z

Some thoughts on this:

the table in https://easybuild.readthedocs.io/en/latest/eb_list_toolchains.html should be removed, and a pointer to https://easybuild.readthedocs.io/en/latest/version-specific/toolchains.html should be added instead;
the table in https://easybuild.readthedocs.io/en/latest/version-specific/toolchains.html should formatted better, indeed to sorted by toolchain names (case-insensitive), exclude duplicate columns, etc.;
w.r.t. the CUDA column, that could be made more generic to something like "Extra", where then mentioning CUDA for toolchains that include it could make sense;
maybe (none) should be mentioned in places where a toolchain has no BLAS or MPI for example, rather than just leaving it blank?

boegel · 2020-04-18T13:23:47Z

To get a good idea of what the internal data structure looks like that holds all the toolchain details, making this temporary change in the code this is useful:

diff --git a/easybuild/tools/docs.py b/easybuild/tools/docs.py
index 1d1342e2c..a5d6a8ae1 100644
--- a/easybuild/tools/docs.py
+++ b/easybuild/tools/docs.py
@@ -744,6 +744,9 @@ def list_toolchains_rst(tcs):
     """ Returns overview of all toolchains in rst format """
     title = "List of known toolchains"

+    import pprint
+    pprint.pprint(tcs)
+
     # figure out column names
     table_titles = ['name', 'compiler', 'MPI']
     for tc in tcs.values():

With that in place, eb --list-toolchains --output-format=rst produces something like:

$ eb --list-toolchains --output-format=rst
...
 'GCC': {'COMPILER': ['GCC']},
...
 'foss': {'BLACS': ['BLACS'],
          'BLAS': ['OpenBLAS'],
          'COMPILER': ['GCC'],
          'FFT': ['FFTW'],
          'LAPACK': ['OpenBLAS'],
          'MPI': ['OpenMPI'],
          'SCALAPACK': ['ScaLAPACK']},
...
 'gompi': {'COMPILER': ['GCC'], 'MPI': ['OpenMPI']},
...

It would be nice if it could produce output like this:

List of known toolchains
------------------------

============    =================    ===========    ========    =====================    ==========    
name            compilers            MPI            BLAS        LAPACK                   FFT
============    =================    ===========    ========    =====================    ==========    
**foss**        GCC                  OpenMPI        OpenBLAS    OpenBLAS, ScaLAPACK      FFTW
**fosscuda**    GCC, CUDA            OpenMPI        OpenBLAS    OpenBLAS, ScaLAPACK      FFTW
**GCC**         GCC                  *(none)*       *(none)*    *(none)*                 *(none)*        
**gompi**       GCC                  OpenMPI        *(none)*    *(none)*                 *(none)*        
**intel**       icc, ifort           impi           imkl        imkl                     imkl
**intel**       icc, ifort, CUDA     impi           imkl        imkl                     imkl
============    =================    ===========    ========    =====================    ==========

justbennet · 2020-04-18T13:42:18Z

Ace!

I'll try to make that so.

justbennet · 2020-04-18T21:01:06Z

Please correct me if I am wrong, but if there is no BLAS, then there is no LAPACK, and if there is no LAPACK, then there can be no ScaLAPACK. Those are all linear algebra, so I took the liberty of combining them into a single column. I have something now that is modified to produce a table that begins with this.

List of known toolchains
------------------------

================    ================    ===========    =======================    ========
Name                Compiler(s)         MPI            BLAS/LAPACK/ScaLAPACK      FFT     
================    ================    ===========    =======================    ========
**cgmpich**         Clang, GCC          MPICH          (*none*)                   (*none*)
**cgmpolf**         Clang, GCC          MPICH          OpenBLAS, ScaLAPACK        FFTW
**cgmvapich2**      Clang, GCC          MVAPICH2       (*none*)                   (*none*)
**cgmvolf**         Clang, GCC          MVAPICH2       OpenBLAS, ScaLAPACK        FFTW
**cgompi**          Clang, GCC          OpenMPI        (*none*)                   (*none*)
**cgoolf**          Clang, GCC          OpenMPI        OpenBLAS, ScaLAPACK        FFTW
**ClangGCC**        Clang, GCC          (*none*)       (*none*)                   (*none*)
**CrayCCE**         PrgEnv-cray                                                   (*none*)
**CrayGNU**         PrgEnv-gnu                                                    (*none*)
**CrayIntel**       PrgEnv-intel                                                  (*none*)
**CrayPGI**         PrgEnv-pgi                         (*none*)                   (*none*)
**foss**            GCC                 OpenMPI        OpenBLAS, ScaLAPACK        FFTW
. . . .

So, you get no duplicate compiler column, no redundant CUDA column, sorted case-insensitive on name.

I do not know what is up with the Cray entries. I am using

tc.get('BLAS', ['(*none*)'])

and that works as expected with ClangGCC in the entry above CrayCCE, but not with the CrayCCE entry. Perhaps it is because the ClangGCC does not have those keys defined, whereas the CrayCCE entry does, but they are empty? The pretty print dump for those two looks like this.

{'ClangGCC': {'COMPILER': ['Clang', 'GCC']},
 'CrayCCE': {'BLACS': [],
             'BLAS': [],
             'COMPILER': ['PrgEnv-cray'],
             'LAPACK': [],
             'MPI': [],
             'SCALAPACK': []},

In any event, I have something that I believe is close to what you want, and seems to mostly work. I tested with python 2.7.5 and 3.6.8.

[bennet@studentpc2 ~]$ eb --list-toolchains --output-format=rst
>> Considering 'python'...
>> 'python' version: 2.7.5, which matches Python 2 version requirement (>= 2.6)
>> Selected Python command: python (/usr/bin/python)
>> python -m easybuild.main --list-toolchains --output-format=rst
. . . .

[bennet@studentpc2 ~]$ eb --list-toolchains --output-format=rst
>> Considering 'python3'...
>> 'python3' version: 3.6.8, which matches Python 3 version requirement (>= 3.5)
>> Selected Python command: python3 (/usr/bin/python3)
>> python3 -m easybuild.main --list-toolchains --output-format=rst

If this looks good enough, I can update the PR with this version of docs.py.

Per the conversation in PR, this commit revises the .rst output that results in the following changes. Output is sorted case-insensitively on toolchain name The duplicate compiler column has been removed. The BLAS/LAPACK/ScaLAPACK entries have been consolidated into a single column Missing entries have been replaced with '(*none*)'. Blank entries remain blank (maybe a change to the toolchain to be consistent with the other toolchains?)

justbennet · 2020-04-19T11:27:23Z

Full examples of two alternate formats are at.

https://gist.github.com/justbennet/f5a405a61665feef7bde1df6b39ae5fc

Using modified code suggested by Kenneth Hoste in the PR request. Added exceptions for empty Cray toolchain entries.

boegel

Some more minor suggestions, close to being done!

Ideally, we also ensure that the new format produced is covered by the tests as well...
Let me know if you need help with that.

boegel · 2020-04-20T08:27:47Z

+    # Create text placeholder to use for missing entries.
+    # Not good documentation style to italicize the parentheses enclosing
+    # italic text unless surrouning text is also italic.
+    none_txt = '(*none*)'


also making the parentheses italic was deliberate actually, I think it looks awkward otherwise...

compare '(none)' (fully italic) with '(none)' (partial italic)

anyway, this is bikeshedding, definitely not a blocker for this PR :)

I was trained as a typesetter and spent years as a proofreader and (English) copy editor, and I have the opposite reaction to (none) and (none)
https://english.stackexchange.com/questions/9878/should-punctuation-surrounding-italicised-words-be-italicised
But, this is your baby, so I will do it your way.

Oh boy, please don't agree with me if you're better trained than me in this stuff! :)

I was just going by feeling, if there are clear arguments against using *(none)*, then please don't!

If it would lead to inconsistencies elsewhere, it's fine. It's just a convention. ;-) Just trying to explain why my instincts are what they are. And that it was not entirely idiosyncratic. ;-)

I suspect that most everyone else on the project would do what you did and put the *s around the parentheses, and it's probably more important to be consistent throughout. I changed them, so let's not change back and get this out the door. :-)

boegel · 2020-04-20T08:29:07Z

-    for tc in tcs.values():
-        table_titles.extend(tc.keys())
+    # Specify the column names for the table
+    table_titles = ['name', 'COMPILER', 'MPI', 'linalg', 'FFT']


Should be 'compiler' to ensure correct column name is used in the end, no?

Now you get COMPILER as column name, I think...

Ah, you are right. I think table_titles is OK, but the change should be to

col_names = { 'name': 'Name', 'COMPILER': 'Compiler(s)', 'linalg': "Linear algebra", }

to match the change you requested to table_titles before. Sorry. Will do that.

I think it looks awkward to have name vs COMPILER in table_titles.

For MPI & FFT it's OK, since those are abbreviations.

They are just placeholders internal to the code. How about all of the entries in table_titles go upper, and the col_names are used to set the headings in the display? I tested that now, and that works, and that will make things consistent?

Same with linalg, then? So,

table_titles = ['NAME', 'COMPILER', 'MPI', 'LINALG', 'FFT'] . . . col_names = { 'NAME': 'Name', 'COMPILER': 'Compiler(s)', 'LINALG': "Linear algebra", }

and all references in the code to the text contents of table_titles would have the upper version.

+1 for consistency, so going all upper is fine by me

boegel · 2020-04-20T17:57:41Z

@justbennet I've added a test for the output produced by --list-toolchains --output-format=rst.

It fails now, but with good reason imho, for the following cases:

for toolchains using OpenBLAS, OpenBLAS is listed twice in the linalg column, we should try and avoid that (missing a call nub somewhere?);
for toolchains using imkl, imkl is listed 3 times in the linalg column;
for the system toolchain, we're missing a (none) in the compiler column;

To run the test yourself locally, try using this command:

python -O -m test.framework.options list_toolchains

If the tests for --list-toolchains pass, you should see output like:

Ran 2 tests in 0.931s

ok

If the test still fails, you'll get a AssertionError: None is not true : Pattern '<pattern>' should be found in: , followed by the full output produced by --list-toolchains --output-format=rst.

Let me know if you need any help!

Needed a nub() to eliminated duplicate '*(none)*'s Needed to add an 'or none_txt' to deal with incorrect values being returned from the system toolchain.

…ains

boegel · 2020-05-08T05:22:59Z

Thanks a lot for your efforts on this @justbennet!

boegel added the bug fix label Mar 16, 2020

boegel added this to the next release (4.2.0) milestone Mar 16, 2020

boegel modified the milestones: next release (4.2.0), release after 4.2.0 (4.2.1?) Apr 8, 2020

boegel requested changes Apr 19, 2020

View reviewed changes

Comment thread easybuild/tools/docs.py Outdated

Comment thread easybuild/tools/docs.py Outdated

Comment thread easybuild/tools/docs.py Outdated

Comment thread easybuild/tools/docs.py Outdated

Comment thread easybuild/tools/docs.py

bennet-umich added 2 commits April 19, 2020 10:27

Processing toolchains in single for loop

8a98b69

Using modified code suggested by Kenneth Hoste in the PR request. Added exceptions for empty Cray toolchain entries.

Changing Cray MPI to cray-mpich

1063dac

boegel requested changes Apr 20, 2020

View reviewed changes

bennet-umich and others added 2 commits April 20, 2020 10:47

Making changes per comments in PR 3246

e5cec82

add test for output produced by --list-toolchains --output-format=rst

c287dd9

bennet-umich added 2 commits April 21, 2020 09:27

Fixing output from list_toolchains_rst

f474d20

Needed a nub() to eliminated duplicate '*(none)*'s Needed to add an 'or none_txt' to deal with incorrect values being returned from the system toolchain.

Fixing missing commas in options.py

92c9b90

easybuilders deleted a comment from boegelbot Apr 21, 2020

bennet-umich added 2 commits April 23, 2020 10:02

Merge remote-tracking branch 'upstream/develop' into sort-list-toolch…

c7b5906

…ains

Convert remaining column names to upper

2cef406

boegel approved these changes May 8, 2020

View reviewed changes

boegel merged commit 8d81c7a into easybuilders:develop May 8, 2020

boegel mentioned this pull request May 8, 2020

update list of known toolchains easybuilders/easybuild#623

Merged

Conversation

justbennet commented Mar 14, 2020

Uh oh!

ocaisa commented Mar 17, 2020

Uh oh!

ocaisa commented Mar 17, 2020

Uh oh!

justbennet commented Mar 17, 2020

Uh oh!

ocaisa commented Mar 17, 2020

Uh oh!

ocaisa commented Mar 18, 2020

Uh oh!

justbennet commented Mar 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

justbennet commented Mar 19, 2020

Uh oh!

ocaisa commented Mar 23, 2020

Uh oh!

boegel commented Apr 8, 2020

Uh oh!

boegel commented Apr 18, 2020

Uh oh!

boegel commented Apr 18, 2020

Uh oh!

justbennet commented Apr 18, 2020

Uh oh!

justbennet commented Apr 18, 2020

Uh oh!

justbennet commented Apr 19, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

boegel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justbennet Apr 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

boegel commented Apr 20, 2020

Uh oh!

boegel commented May 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

justbennet commented Mar 18, 2020 •

edited

Loading

justbennet Apr 20, 2020 •

edited

Loading