Charlie Pauvert webpages

Microbial interactions, FAIR data and the birth of a collaboration

2022-08-26T09:43:17+02:00

Last year in May, when I discovered the effort of Alan (Pacheco) and Daniel (Segrè) to encode microbial interactions in multidimensional vectors (published in FEMS Microbiology Letters, 2019, I set out to create a searchable online representation of the data they had compiled to enable future contributions (see the article on mi-atlas). Then, I reached out to Alan and Daniel in June, 2021 to notify them of this visualisation. Little did I know at the time that this exchange would lead to writing a Perspective article together.

Indeed, we first met in a Zoom call between the USA, Switzerland and Austria during the summer. Starting from technical points such as how to improve the contribution by our peers to mi-atlas or the issues with the long-term maintenance of online scientific databases, we moved to more general considerations on microbial interactions and how to define them, if at all possible. This mutual interest prompt us to meet more regularly (as well as having Dileep Kishore on board) and we soon had lively debate on ways to improve access to and integration of interaction data across the field.

Figure 1: The FAIR principles explained (Source: scibite.com)

By the end of the year 2021, a draft paper started to take shape with our interrogations and suggestions that we tried to refined. We all had the feeling that the topic was timely given the emergence of nation-wide initiatives towards Findable, Accessible, Interoperable and Re-usable data (FAIR; Figure 1) in the field of microbiome. As a matter of fact, I was about to start a new position a couple of months later and closely collaborate with the NFDI4Microbiota consortium from the German National Research Data Infrastructure. We tried to keep the pace to meet, discuss and even dedicated Zoom session to write and further improve our draft.

After a first submission mid-March, we were excited few weeks later to read that the three reviewers were positively interested by our propositions but deemed them not detailed enough and rejected our article. This turns out to be an excellent decision by them and the editor because it pushed us to detail even more the required metadata for microbial interactions and outline concrete steps that could be taken by the community.

Finally satisfied by our own progress, we submitted again our Perspective paper, which was soon accepted and is now available in mSystems: “Toward FAIR representations of microbial interactions”. We hope that our Perspective article will convince researchers and spark a community initiative to improve our suggestions. Together, we can leverage the proposed steps to streamline the comparison, integration and storage of data on microbial interactions that is generated worldwide in various scientific investigations (Figure 2 and the rest of the article)!

Figure 2: Applying a FAIR system to the study of microbial interactions and correlations (Source: figure 1 from doi: 10.1128/msystems.00659-22)

PS: After months of online collaboration, Alan and I finally had the opportunity to meet in-person during the 18th International Symposium on Microbial Ecology in Lausanne last week. There we could attend and discuss new and exciting research on microbial interactions done across the globe!

mi-atlas: an interactive and evolving catalogue of microbial interactions

2021-06-09T09:34:23+02:00

A couple of weeks ago, I read an article by Pacheco and Segrè, 2019 in FEMS Microbiology Letters regarding microbial interactions and how to go beyond a classification dedicated to ecological outcomes only. Of course, no classification is perfect (just like standards will forever be developed to encompass exceptions, leading to even more standards).

They propose to encode interactions between microorganisms using several binary (0/1) or ternary (0/1/-1) attributes to build a catalog amenable to quantitative analyses. I thought this was a good idea. But I was frustrated that their (huge) initial effort to describe 74 interactions was “buried” in the Supplementary Material section of their article. Such multivariate table with 33 columns, while suitable for machines is hard to encompass for human. So I started to work on how to improve the visualisation of the catalogue and provide means for others scientists to contribute easily.

In the end, I provided:

a website
a Shiny application
and a one Github repository to bring them all (and in the darkness bind them)

The website presents their framework and the idea behind my project. The Shiny application displays the catalogue, allows to interactively focus on one interaction and helps users to encode a new interaction within the framework. Have a look and do not hesitate to drop an issue if you feel like it!

Interactive visualisation of the influences of philosophers

2021-02-19T11:43:26+01:00

Last year, I started a data visualisation and aggregation project on my free time which I recently updated. Feel free to explore at cpauvert.shinyapps.io/in-phi-luence.

This project stemmed from a question of my partner regarding the schools of thoughts and influences of philosophers and next thing I knew I had opened R and had an excuse for a first Shiny application.

The screenshot just above illustrates the Shiny application in-phi-luence displaying an interactive network of philosophers. Initially the sole data source of philosophers influences was The Free Encyclopedia Wikipedia (en) whom articles on philosophers of science were automatically scraped (R code available on Github).

Few months after the initial draft version of the application, I stumble upon an interesting project, The Internet Philosophy Ontology Project (InPho) which is a scholarly resource that compiles ontologies on philosophers and which are then made accessible through API or OWL files. Using the latter, I was able to fetch and concatenate monthly archives of the InPho ontologies to build a network. The R and Python code, and a Snakemake workflow are available on Github.

Another awesome resource, The Standford Encyclopedia of Philosophy, could have been used for a curated source of influences. However it did not provide any API to access its data nor flat files databases.

In the future I plan to compare the two networks (Wikipedia vs. InPho) and perhaps suggest missing influences to Wikipedia. Another interesting lines would be the analysis of the philosophers network using node-level metrics. The in and out degrees are already displayed in a n interactive table but I unexpectedly struggle to find a relevant metrics (among centrality measures or authority scores) so far but there is room for improvements.

R with Snakemake: a few hurdles to overcome

2020-10-21T15:20:34+02:00

When working on the wrappers for DADA2, I had to respect both the grammar of Python/Snakemake and R. Here are some of the hurdles I have encountered.

Keeping the log

Keeping track of the log is quite easy in Snakemake when the tools can run in the shell. Don’t mistake me, R scripts can also be run on the shell using Rscript and Deer and Langer showed that you can use the command Rscript > {log} to correctly keep track of your script. However, when using wrappers you do not have access to this command because in the Snakemake grammar instead of the script: word you have the wrapper: word.

I knew the sink() R function to redirect the output of R commands to a file. But when trying to redirect both messages and errors to the file when testing my wrappers it failed. The following post on stackoverflow provided the solution, which in short needs two invocations of sink() to be able to capture both messages. It is now included in the wrappers I wrote.

Passing parameters to R

Snakemake provide both reproducible and customizable workflows. But providing parameters to R wrappers was harder than I thought. Looking at others R wrappers I saw two approaches. Either be fully explicit (and redundant) by copying all the R arguments of the needed functions into the Snakemake params: word. Or be transparent and pass arguments in a character string that would be further interpreted with the R functions parse() and eval().

I wanted something more flexible and decided to rely on the R function do.call() which enables the execution of a function based on arguments provided as named list (see 6.2.4 from Hadley’s Advanced R).

Using such structure, I could pass a Python dictionary to Snakemake params: that would then be interpreted as a named lists in R. For instance:

rule dada2_filter_trim_pe:
    # [...]
    params:
        {'maxEE':1, 'truncLen': [240,200] }
    # [...]

Snakemake convert the Python dictionary into a named list which can be directly used for the R function (here dada2::filterAndTrim()). Such named list if part of the larger snakemake S4 object that we can access in R (more info on Snakemake docs).

However, you cannot concatenate the lists provided (snakemake@input, snakemake@output and snakemake@params) and expect do.call() to do all the work. No, because Snakemake passes the input: as unnumbered list and as named list. So for input and output slots I could not do directly do.call(filterAndTrim, snakemake@input). Instead, I needed to prepare the arguments as follow:

args<-list(
        fwd = snakemake@input[["fwd"]],
        rev = snakemake@input[["rev"]],
        filt = snakemake@output[["filt"]],
        filt.rev = snakemake@output[["filt_rev"]],
        multithread=snakemake@threads
)

Most of the submitted Snakemake wrappers can now accept Python dictionaries to customize the underlying DADA2 R functions.

These wrappers are currently under reviews by the team of the Snakemake wrappers repository. Meanwhile I try to design a DADA2 meta-wrapper to be able to nicely assemble these bricks.

LEGOlize DADA2: getting DADA2 into Snakemake

2020-10-08T12:27:00+02:00

I finally have the time to properly use the workflow management Snakemake which is great because so many recent developments happened since I’ve heard of this tool during my master degree. The authors and contributors of Snakemake actually published recently a preprint highlighting such features.

Among these features, one could be compared to LEGO® bricks: Snakemake wrappers. They are dedicated Snakemake rule that allow to plug – much like bricks – common tools to perform your analysis. These wrappers should pass automatic tests prior to their integration in the repository which safeguard against typos during development that hinders your workflow.

Bricks can even be assembled in dedicated sets, which in the case of Snakemake means that wrappers can be combined to design meta-wrapper where a common analysis workflow can be crafted using a selection of wrappers. This convenient idea enables the user to finely tune the level of modularity wished during the design of its Snakemake workflow: from custom rules, wrappers to meta-wrappers¹.

Snakemake use is rising and there was even recently a preprint for a DADA2 workflow with Snakemake. I was really excited for this huge contribution that, in my opinion, filled a gap. However, I realized that some steps were not parts of my personal workflow (such as the taxonomy) and that instead of this one well running Snakemake workflow – huge LEGO set –, I’d rather choose from several DADA2 wrappers – piles of bricks – to build a more flexible workflow.

It all started when I realised that there were no DADA2 wrappers and even few metabarcoding related wrappers proposed in the repository.
Last week, I then started writing my first wrappers by mimicking previous R wrappers listed in the repository. I tried to design toy examples to test the wrappers and I even manage to propose a few pull requests on the Github repository. However, I realize after going over the corrections proposed by the reviewer that I needed to carefully think ahead my wrapper and their articulations. Indeed, I first proposed wrappers using paired-end reads only which is not truly flexible. Hence some of my wrappers contained duplicated lines of code to cope with the orientation which violates the don’t repeat yourself rule. Even worse, it sometimes adds complexity to steps that could be processed unbeknownst of the reads orientation.

Therefore, I put my pull request on a draft status and will go back to writing properly. I already put on paper the dependency of wrappers to read orientation in order to optimize this workflow. I hope to propose soon these DADA2 wrappers and eventually a DADA2 meta-wrappers as well.

Check the documentation for more details on modularisation with Snakemake. ↩

Hello (again) world!

2020-10-07T16:05:00+02:00

At last, I’ve updated my personal page using Pelican this time, but still relying on the awesome Github Pages.

I am hoping to document a bit more often my personal coding projects here, especially if I am to learn new skills. More importantly, I will try to document my code errors and how I tried to dodge them and gain knowledge in the process.