Skip to content

Adding sorting to rst output of list_toolchains to match txt and none#3246

Merged
boegel merged 10 commits intoeasybuilders:developfrom
justbennet:sort-list-toolchains
May 8, 2020
Merged

Adding sorting to rst output of list_toolchains to match txt and none#3246
boegel merged 10 commits intoeasybuilders:developfrom
justbennet:sort-list-toolchains

Conversation

@justbennet
Copy link
Copy Markdown
Contributor

When requesting --output-format=rst, the list of available toolchains was
not being sorted. That output will be used to create a table of available
toolchains for the documentation and needs to be sorted. I wrapped the
list of keys in sorted() to match the ouput of eb --list-toolchains
with no options and when txt is requested.

When requesting --output-format=rst, the list of available toolchains was
not being sorted.  That output will be used to create a table of available
toolchains for the documentation and needs to be sorted.  I wrapped the
list of keys in `sorted()` to match the ouput of `eb --list-toolchains`
with no options and when `txt` is requested.
@boegel boegel added the bug fix label Mar 16, 2020
@boegel boegel added this to the next release (4.2.0) milestone Mar 16, 2020
@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented Mar 17, 2020

This fix only sorts the first column but leaves the other columns unchanged (and, on a side note you can only see the problem in Python2). I can't figure out a good way of doing this though...

@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented Mar 17, 2020

Ok, this seems to work:

     table_values[0] = ['**%s**' % tcname for tcname in sorted(tcs.keys())]      

     for idx in range(1, len(table_titles)):                                     
         for _, tc in sorted(tcs.items()):     

@justbennet
Copy link
Copy Markdown
Contributor Author

@ocaisa Thanks for finding that it didn't work in Python 2 (I think).

It seems like this shouldn't take two sorts, somehow. It also seems like a common approach for both rst and txt would be desirable, at least for sorting the entries, as it's the same data. Maybe that's not feasible because of the extra columns.

Having just stared at it a bit, it looks like it's producing two colums with the same compiler information, perhaps that redundancy should be removed, too?

I will not have time to look today, but I should be able to return to this tomorrow.

@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented Mar 17, 2020

The other option is to turn tcs into an OrderedDict type at the beginning. I agree that is probably a cleaner solution.

I should clarify that in Python3 the dictionary was already ordered so there isn't the same problem. It was only with Python 2 that I could reproduce the problem.

@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented Mar 18, 2020

You can get to it with

from easybuild.tools.py2vs3 import OrderedDict

@justbennet
Copy link
Copy Markdown
Contributor Author

justbennet commented Mar 18, 2020

Hi, @ocaisa and @boegel,

I've been looking at this more closely, and there seem to be multiple issues. It appears that the data for the rst table comes out as a bunch of columns, which makes sorting all of them together a key. I think this code might be more complex than it needs to be, too.

It looked to me as though your sorting of tcs.items() does not work because the data is extracted into lists in what seems to me to be a complex way.

If you look at the current output, does it have two columns for compiler information? I am seeing one for 'compiler' and one for 'COMPILER', both containing the same information. That redundancy seems like it could be eliminated. I also question the value of have a 'CUDA' column that only lists 'CUDA' as its only possible value when that information is also included in the compiler/COMPILER column. I think I would advocate for dropping the CUDA column from the rst output as redundant, as well.

I think I have two excerpts here

https://gist.github.com/justbennet/916fad2b2c62a02fc063cd96a25fded3

to illustrate this.

I propose that trying to refactor the code to make it simpler and more easily read and to eliminate the second compiler column and the CUDA column.

I've spent several hours looking at this, and I have an inefficient way to get this done. I think that the scale this might attain will never make that inefficiency matter particularly, but I think that taking an alternate approach to constructing the table might end up with more readable and more easily maintained code and provide the same information.

Thoughts?

@justbennet
Copy link
Copy Markdown
Contributor Author

Actually, I just looked at https://easybuild.readthedocs.io/en/latest/eb_list_toolchains.html again, and the format of the included table is quite different from the output of eb --list-toolchains.

Toolchain     Components that comprise the toolchain
--------------------------------------------------------
Name          Compilers    MPI stack  Included Libraries
--------------------------------------------------------
ClangGCC      Clang/GCC
GCC           GCC
cgmpich       Clang/GCC    MPICH
cgmpolf       Clang/GCC    MPICH      OpenBLAS/LAPACK, ScaLAPACK(/BLACS), FFTW

Should the output of eb --list-toolchains --output-format match that in the old file currently being included, which is https://raw.githubusercontent.com/easybuilders/easybuild/develop/docs/version-specific/eb_list_toolchains.txt

That seems to have the same information as in the current output but seems more cleanly formatted and easier to read.

@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented Mar 23, 2020

@boegel I don't really know how the docs get generated, and I don't think I'd be the one to make decisions on how that table should look but there what comes out of this function does indeed have redundancy and the documentation page is markedly different.

@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 8, 2020

I'll pick up on this again. The current proposed change only sorts the first column, so it definitely needs a bot more work.

We can just sort the toolchain names once, and then walk through the tcs dict using the sorted keys (toolchain names), no need to sort twice.

The page in the documentation hasn't been updated for ages, because it's not getting updated by the update-all-docs.sh in https://github.com/easybuilders/easybuild/tree/develop/docs/scripts.

@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 18, 2020

Some thoughts on this:

@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 18, 2020

To get a good idea of what the internal data structure looks like that holds all the toolchain details, making this temporary change in the code this is useful:

diff --git a/easybuild/tools/docs.py b/easybuild/tools/docs.py
index 1d1342e2c..a5d6a8ae1 100644
--- a/easybuild/tools/docs.py
+++ b/easybuild/tools/docs.py
@@ -744,6 +744,9 @@ def list_toolchains_rst(tcs):
     """ Returns overview of all toolchains in rst format """
     title = "List of known toolchains"

+    import pprint
+    pprint.pprint(tcs)
+
     # figure out column names
     table_titles = ['name', 'compiler', 'MPI']
     for tc in tcs.values():

With that in place, eb --list-toolchains --output-format=rst produces something like:

$ eb --list-toolchains --output-format=rst
...
 'GCC': {'COMPILER': ['GCC']},
...
 'foss': {'BLACS': ['BLACS'],
          'BLAS': ['OpenBLAS'],
          'COMPILER': ['GCC'],
          'FFT': ['FFTW'],
          'LAPACK': ['OpenBLAS'],
          'MPI': ['OpenMPI'],
          'SCALAPACK': ['ScaLAPACK']},
...
 'gompi': {'COMPILER': ['GCC'], 'MPI': ['OpenMPI']},
...

It would be nice if it could produce output like this:

List of known toolchains
------------------------

============    =================    ===========    ========    =====================    ==========    
name            compilers            MPI            BLAS        LAPACK                   FFT
============    =================    ===========    ========    =====================    ==========    
**foss**        GCC                  OpenMPI        OpenBLAS    OpenBLAS, ScaLAPACK      FFTW
**fosscuda**    GCC, CUDA            OpenMPI        OpenBLAS    OpenBLAS, ScaLAPACK      FFTW
**GCC**         GCC                  *(none)*       *(none)*    *(none)*                 *(none)*        
**gompi**       GCC                  OpenMPI        *(none)*    *(none)*                 *(none)*        
**intel**       icc, ifort           impi           imkl        imkl                     imkl
**intel**       icc, ifort, CUDA     impi           imkl        imkl                     imkl
============    =================    ===========    ========    =====================    ==========

@justbennet
Copy link
Copy Markdown
Contributor Author

Ace!

I'll try to make that so.

@justbennet
Copy link
Copy Markdown
Contributor Author

Please correct me if I am wrong, but if there is no BLAS, then there is no LAPACK, and if there is no LAPACK, then there can be no ScaLAPACK. Those are all linear algebra, so I took the liberty of combining them into a single column. I have something now that is modified to produce a table that begins with this.

List of known toolchains
------------------------

================    ================    ===========    =======================    ========
Name                Compiler(s)         MPI            BLAS/LAPACK/ScaLAPACK      FFT     
================    ================    ===========    =======================    ========
**cgmpich**         Clang, GCC          MPICH          (*none*)                   (*none*)
**cgmpolf**         Clang, GCC          MPICH          OpenBLAS, ScaLAPACK        FFTW
**cgmvapich2**      Clang, GCC          MVAPICH2       (*none*)                   (*none*)
**cgmvolf**         Clang, GCC          MVAPICH2       OpenBLAS, ScaLAPACK        FFTW
**cgompi**          Clang, GCC          OpenMPI        (*none*)                   (*none*)
**cgoolf**          Clang, GCC          OpenMPI        OpenBLAS, ScaLAPACK        FFTW
**ClangGCC**        Clang, GCC          (*none*)       (*none*)                   (*none*)
**CrayCCE**         PrgEnv-cray                                                   (*none*)
**CrayGNU**         PrgEnv-gnu                                                    (*none*)
**CrayIntel**       PrgEnv-intel                                                  (*none*)
**CrayPGI**         PrgEnv-pgi                         (*none*)                   (*none*)
**foss**            GCC                 OpenMPI        OpenBLAS, ScaLAPACK        FFTW
. . . .

So, you get no duplicate compiler column, no redundant CUDA column, sorted case-insensitive on name.

I do not know what is up with the Cray entries. I am using

tc.get('BLAS', ['(*none*)'])

and that works as expected with ClangGCC in the entry above CrayCCE, but not with the CrayCCE entry. Perhaps it is because the ClangGCC does not have those keys defined, whereas the CrayCCE entry does, but they are empty? The pretty print dump for those two looks like this.

{'ClangGCC': {'COMPILER': ['Clang', 'GCC']},
 'CrayCCE': {'BLACS': [],
             'BLAS': [],
             'COMPILER': ['PrgEnv-cray'],
             'LAPACK': [],
             'MPI': [],
             'SCALAPACK': []},

In any event, I have something that I believe is close to what you want, and seems to mostly work. I tested with python 2.7.5 and 3.6.8.

[bennet@studentpc2 ~]$ eb --list-toolchains --output-format=rst
>> Considering 'python'...
>> 'python' version: 2.7.5, which matches Python 2 version requirement (>= 2.6)
>> Selected Python command: python (/usr/bin/python)
>> python -m easybuild.main --list-toolchains --output-format=rst
. . . .

[bennet@studentpc2 ~]$ eb --list-toolchains --output-format=rst
>> Considering 'python3'...
>> 'python3' version: 3.6.8, which matches Python 3 version requirement (>= 3.5)
>> Selected Python command: python3 (/usr/bin/python3)
>> python3 -m easybuild.main --list-toolchains --output-format=rst

If this looks good enough, I can update the PR with this version of docs.py.

Per the conversation in PR, this commit revises the .rst output
that results in the following changes.

Output is sorted case-insensitively on toolchain name

The duplicate compiler column has been removed.

The BLAS/LAPACK/ScaLAPACK entries have been consolidated into a
single column

Missing entries have been replaced with '(*none*)'.  Blank entries
remain blank (maybe a change to the toolchain to be consistent with
the other toolchains?)
@justbennet
Copy link
Copy Markdown
Contributor Author

Full examples of two alternate formats are at.

https://gist.github.com/justbennet/f5a405a61665feef7bde1df6b39ae5fc

Comment thread easybuild/tools/docs.py Outdated
Comment thread easybuild/tools/docs.py Outdated
Comment thread easybuild/tools/docs.py Outdated
Comment thread easybuild/tools/docs.py Outdated
Comment thread easybuild/tools/docs.py
Using modified code suggested by Kenneth Hoste in the PR request.

Added exceptions for empty Cray toolchain entries.
Copy link
Copy Markdown
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more minor suggestions, close to being done!

Ideally, we also ensure that the new format produced is covered by the tests as well...
Let me know if you need help with that.

Comment thread easybuild/tools/docs.py Outdated
# Create text placeholder to use for missing entries.
# Not good documentation style to italicize the parentheses enclosing
# italic text unless surrouning text is also italic.
none_txt = '(*none*)'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also making the parentheses italic was deliberate actually, I think it looks awkward otherwise...

compare '(none)' (fully italic) with '(none)' (partial italic)

anyway, this is bikeshedding, definitely not a blocker for this PR :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trained as a typesetter and spent years as a proofreader and (English) copy editor, and I have the opposite reaction to (none) and (none)
https://english.stackexchange.com/questions/9878/should-punctuation-surrounding-italicised-words-be-italicised
But, this is your baby, so I will do it your way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh boy, please don't agree with me if you're better trained than me in this stuff! :)

I was just going by feeling, if there are clear arguments against using *(none)*, then please don't!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it would lead to inconsistencies elsewhere, it's fine. It's just a convention. ;-) Just trying to explain why my instincts are what they are. And that it was not entirely idiosyncratic. ;-)

I suspect that most everyone else on the project would do what you did and put the *s around the parentheses, and it's probably more important to be consistent throughout. I changed them, so let's not change back and get this out the door. :-)

Comment thread easybuild/tools/docs.py Outdated
for tc in tcs.values():
table_titles.extend(tc.keys())
# Specify the column names for the table
table_titles = ['name', 'COMPILER', 'MPI', 'linalg', 'FFT']
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be 'compiler' to ensure correct column name is used in the end, no?

Now you get COMPILER as column name, I think...

Copy link
Copy Markdown
Contributor Author

@justbennet justbennet Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you are right. I think table_titles is OK, but the change should be to

    col_names = {
        'name': 'Name',
        'COMPILER': 'Compiler(s)',
        'linalg': "Linear algebra",
    }

to match the change you requested to table_titles before. Sorry. Will do that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks awkward to have name vs COMPILER in table_titles.

For MPI & FFT it's OK, since those are abbreviations.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are just placeholders internal to the code. How about all of the entries in table_titles go upper, and the col_names are used to set the headings in the display? I tested that now, and that works, and that will make things consistent?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with linalg, then? So,

table_titles = ['NAME', 'COMPILER', 'MPI', 'LINALG', 'FFT']
. . . 
    col_names = {
        'NAME': 'Name',
        'COMPILER': 'Compiler(s)',
        'LINALG': "Linear algebra",
    }

and all references in the code to the text contents of table_titles would have the upper version.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for consistency, so going all upper is fine by me

Comment thread easybuild/tools/docs.py
Comment thread easybuild/tools/docs.py Outdated
Comment thread easybuild/tools/docs.py Outdated
Comment thread easybuild/tools/docs.py Outdated
Comment thread easybuild/tools/docs.py Outdated
@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 20, 2020

@justbennet I've added a test for the output produced by --list-toolchains --output-format=rst.

It fails now, but with good reason imho, for the following cases:

  • for toolchains using OpenBLAS, OpenBLAS is listed twice in the linalg column, we should try and avoid that (missing a call nub somewhere?);
  • for toolchains using imkl, imkl is listed 3 times in the linalg column;
  • for the system toolchain, we're missing a (none) in the compiler column;

To run the test yourself locally, try using this command:

python -O -m test.framework.options list_toolchains

If the tests for --list-toolchains pass, you should see output like:

Ran 2 tests in 0.931s

ok

If the test still fails, you'll get a AssertionError: None is not true : Pattern '<pattern>' should be found in: , followed by the full output produced by --list-toolchains --output-format=rst.

Let me know if you need any help!

Needed a nub() to eliminated duplicate '*(none)*'s

Needed to add an 'or none_txt' to deal with incorrect values being
returned from the system toolchain.
@easybuilders easybuilders deleted a comment from boegelbot Apr 21, 2020
@boegel
Copy link
Copy Markdown
Member

boegel commented May 8, 2020

Thanks a lot for your efforts on this @justbennet!

@boegel boegel merged commit 8d81c7a into easybuilders:develop May 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants