Charlie Pauvert webpageshttps://cpauvert.github.io/2022-08-26T09:43:17+02:00Microbial interactions, FAIR data and the birth of a collaboration2022-08-26T09:43:17+02:002022-08-26T09:43:17+02:00Charlie Pauverttag:cpauvert.github.io,2022-08-26:/microbial-interactions-birth-collaboration.html<p>From a visualisation project to a Perspective article</p><p>Last year in May, when I discovered the effort of Alan (<a href="proxy.php?url=https://orcid.org/0000-0002-1128-3232">Pacheco</a>) and Daniel (<a href="proxy.php?url=https://orcid.org/0000-0003-4859-1914">Segrè</a>) to encode microbial interactions in multidimensional vectors (published in <a href="proxy.php?url=https://doi.org/10.1093/femsle/fnz125">FEMS Microbiology Letters, 2019</a>, I set out to create a searchable online representation of the data they had compiled to enable future contributions (see <a href="proxy.php?url=https://cpauvert.github.io/mi-atlas-catalogue-of-microbial-interactions.html">the article on mi-atlas</a>).
Then, I reached out to Alan and Daniel in June, 2021 to notify them of this visualisation. Little did I know at the time that this exchange would lead to writing a Perspective article together.</p>
<p>Indeed, we first met in a Zoom call between the USA, Switzerland and Austria during the summer. Starting from technical points such as how to improve the contribution by our peers to <a href="proxy.php?url=https://cpauvert.github.io/mi-atlas">mi-atlas</a> or the issues with the long-term maintenance of online scientific databases, we moved to more general considerations on microbial interactions and how to define them, if at all possible. This mutual interest prompt us to meet more regularly (as well as having Dileep <a href="proxy.php?url=https://orcid.org/0000-0003-4859-8681">Kishore</a> on board) and we soon had lively debate on ways to improve access to and integration of interaction data across the field.</p>
<p><img alt="Illustration of the FAIR principles" src="proxy.php?url=https://cpauvert.github.io/images/FAIR_scibite_com.png">
<em>Figure 1: The FAIR principles explained (Source: scibite.com)</em></p>
<p>By the end of the year 2021, a draft paper started to take shape with our interrogations and suggestions that we tried to refined. We all had the feeling that the topic was timely given the emergence of nation-wide initiatives towards Findable, Accessible, Interoperable and Re-usable data (FAIR; Figure 1) in the field of microbiome. As a matter of fact, I was about to start a new position a couple of months later and closely collaborate with the <a href="proxy.php?url=https://nfdi4microbiota.de">NFDI4Microbiota</a> consortium from the German National Research Data Infrastructure. We tried to keep the pace to meet, discuss and even dedicated Zoom session to write and further improve our draft.</p>
<p>After a first submission mid-March, we were excited few weeks later to read that the three reviewers were positively interested by our propositions but deemed them not detailed enough and rejected our article.
This turns out to be an excellent decision by them and the editor because it pushed us to detail even more the required metadata for microbial interactions and outline concrete steps that could be taken by the community.</p>
<p>Finally satisfied by our own progress, we submitted again our Perspective paper, which was soon accepted and is now available in mSystems: <a href="proxy.php?url=https://doi.org/10.1128/msystems.00659-22">“Toward FAIR representations of microbial interactions”</a>. We hope that our Perspective article will convince researchers and spark a community initiative to improve our suggestions. Together, we can leverage the proposed steps to streamline the comparison, integration and storage of data on microbial interactions that is generated worldwide in various scientific investigations (Figure 2 and the rest of the article)!</p>
<p><img alt="Applying a FAIR system to the study of microbial interactions and correlations" src="proxy.php?url=https://cpauvert.github.io/images/msystems.00659-22.jpg">
<em>Figure 2: Applying a FAIR system to the study of microbial interactions and correlations (Source: figure 1 from doi: <a href="proxy.php?url=10.1128/msystems.00659-22">10.1128/msystems.00659-22</a>)</em></p>
<p>PS: After months of online collaboration, Alan and I finally had the opportunity to meet in-person during the <a href="proxy.php?url=https://isme18.isme-microbes.org">18th International Symposium on Microbial Ecology</a> in Lausanne last week. There we could attend and discuss new and exciting research on microbial interactions done across the globe!</p>mi-atlas: an interactive and evolving catalogue of microbial interactions2021-06-09T09:34:23+02:002021-06-09T09:34:23+02:00Charlie Pauverttag:cpauvert.github.io,2021-06-09:/mi-atlas-catalogue-of-microbial-interactions.html<p>Checkout <a href="proxy.php?url=https://cpauvert.github.io/mi-atlas">https://cpauvert.github.io/mi-atlas</a></p><p><img style="float: left; border-radius: 5px; margin: 10px; padding: 0;" width="173" height="200" src="proxy.php?url=https://cpauvert.github.io/images/logo-mi-atlas.png"> A couple of weeks ago, I read an article by <a href="proxy.php?url=https://doi.org/10.1093/femsle/fnz125">Pacheco and Segrè, 2019</a> in <em>FEMS Microbiology Letters</em> regarding microbial interactions and how to go beyond a classification dedicated to ecological outcomes only. Of course, no classification is perfect (just like standards will forever be developed to encompass exceptions, <a href="proxy.php?url=https://xkcd.com/927">leading to even more standards</a>).</p>
<p>They propose to encode interactions between microorganisms using several binary (<code>0/1</code>) or ternary (<code>0/1/-1</code>) <em>attributes</em> to build a catalog amenable to quantitative analyses. I thought this was a good idea. But I was frustrated that their (huge) initial effort to describe 74 interactions was “buried” in the Supplementary Material section of their article. Such multivariate table with 33 columns, while suitable for machines is hard to encompass for human. So I started to work on how to improve the visualisation of the catalogue and provide means for others scientists to contribute easily.</p>
<p>In the end, I provided:</p>
<ul>
<li>a <a href="proxy.php?url=https://cpauvert.github.io/mi-atlas">website</a> </li>
<li>a <a href="proxy.php?url=https://cpauvert.shinyapps.io/mi-atlas">Shiny application</a></li>
<li>and a one <a href="proxy.php?url=https://github.com/cpauvert/mi-atlas">Github</a> repository to bring them all (<em>and in the darkness bind them</em>)</li>
</ul>
<p>The <a href="proxy.php?url=https://cpauvert.github.io/mi-atlas">website</a> presents their framework and the idea behind my project. The <a href="proxy.php?url=https://cpauvert.shinyapps.io/mi-atlas">Shiny application</a> displays the catalogue, allows to interactively focus on one interaction and helps users to encode a new interaction within the framework.
Have a look and do not hesitate to <a href="proxy.php?url=https://github.com/cpauvert/mi-atlas/blob/main/CONTRIBUTING.md">drop an issue</a> if you feel like it!</p>Interactive visualisation of the influences of philosophers2021-02-19T11:43:26+01:002021-02-19T11:43:26+01:00Charlie Pauverttag:cpauvert.github.io,2021-02-19:/interactive-visualisation-of-the-influences-of-philosophers.html<p>Checkout <a href="proxy.php?url=https://cpauvert.shinyapps.io/in-phi-luence">https://cpauvert.shinyapps.io/in-phi-luence</a></p><p>Last year, I started a data visualisation and aggregation project on my free time which I recently updated. Feel free to explore at <a href="proxy.php?url=https://cpauvert.shinyapps.io/in-phi-luence">cpauvert.shinyapps.io/in-phi-luence</a>. </p>
<p>This project stemmed from a question of my partner regarding the schools of thoughts and influences of philosophers and next thing I knew I had opened R and had an excuse for a first Shiny application.</p>
<p><img alt="Screenshot of the Shiny application" src="proxy.php?url=https://cpauvert.github.io/images/in-phi-luence_screenshot.png"></p>
<p>The screenshot just above illustrates the Shiny application <em>in-phi-luence</em> displaying an interactive network of philosophers. Initially the sole data source of philosophers influences was <a href="proxy.php?url=https://en.wikipedia.org">The Free Encyclopedia Wikipedia (en)</a> whom articles on philosophers of science were automatically scraped (R code available on <a href="proxy.php?url=https://github.com/cpauvert/in-phi-luence">Github</a>).</p>
<p>Few months after the initial draft version of the application, I stumble upon an interesting project, <a href="proxy.php?url=https://www.inphoproject.org">The Internet Philosophy Ontology Project (InPho)</a> which is a scholarly resource that compiles ontologies on philosophers and which are then made accessible through API or OWL files. Using the latter, I was able to fetch and concatenate monthly archives of the InPho ontologies to build a network. The R and Python code, and a Snakemake workflow are available on <a href="proxy.php?url=https://github.com/cpauvert/in-phi-luence/inpho">Github</a>.</p>
<p>Another awesome resource, <a href="proxy.php?url=https://plato.stanford.edu/">The Standford Encyclopedia of Philosophy</a>, could have been used for a curated source of influences. However it did not provide any API to access its data nor flat files databases.</p>
<p>In the future I plan to compare the two networks (Wikipedia vs. InPho) and perhaps suggest missing influences to Wikipedia. Another interesting lines would be the analysis of the philosophers network using node-level metrics. The in and out degrees are already displayed in a n interactive table but I unexpectedly struggle to find a relevant metrics (among centrality measures or authority scores) so far but there is room for improvements. </p>R with Snakemake: a few hurdles to overcome2020-10-21T15:20:34+02:002020-10-21T15:20:34+02:00Charlie Pauverttag:cpauvert.github.io,2020-10-21:/r-with-snakemake-a-few-hurdles-to-overcome.html<p>Suggested solutions to issues I dealt with when wrapping R scripts for Snakemake.</p><p>When working on the <a href="proxy.php?url=https://cpauvert.github.io/legolize-dada2.html">wrappers for DADA2</a>, I had to respect both the grammar of Python/Snakemake and R. Here are some of the hurdles I have encountered.</p>
<h2>Keeping the log</h2>
<p>Keeping track of the log is quite easy in Snakemake when the tools can run in the shell. Don’t mistake me, R scripts can also be run on the shell using <code>Rscript</code> and <a href="proxy.php?url=https://lachlandeer.github.io/snakemake-econ-r-tutorial/logging-output-and-errors.html">Deer and Langer</a> showed that you can use the command <code>Rscript > {log}</code> to correctly keep track of your script. However, when using wrappers you do not have access to this command because in the <a href="proxy.php?url=https://snakemake.readthedocs.io/en/stable/snakefiles/writing_snakefiles.html#grammar">Snakemake grammar</a> instead of the <code>script:</code> word you have the <code>wrapper:</code> word.</p>
<p>I knew the <code>sink()</code> R function to redirect the output of R commands to a file. But when trying to redirect both messages and errors to the file when testing my wrappers it failed. The following <a href="proxy.php?url=https://stackoverflow.com/a/48173272">post</a> on stackoverflow provided the solution, which in short needs two invocations of <code>sink()</code> to be able to capture both messages. It is now included in the wrappers I wrote.</p>
<h2>Passing parameters to R</h2>
<p>Snakemake provide both reproducible and customizable workflows. But providing parameters to R wrappers was harder than I thought. Looking at others R wrappers I saw two approaches. Either be fully explicit (and redundant) by copying all the R arguments of the needed functions into the Snakemake <code>params:</code> word. Or be transparent and pass arguments in a character string that would be further interpreted with the R functions <code>parse()</code> and <code>eval()</code>.</p>
<p>I wanted something more flexible and decided to rely on the R function <code>do.call()</code> which enables the execution of a function based on arguments provided as named list (see 6.2.4 from <a href="proxy.php?url=https://adv-r.hadley.nz/functions.html#function-fundamentals">Hadley’s Advanced R</a>).</p>
<p>Using such structure, I could pass a Python dictionary to Snakemake <code>params:</code> that would then be interpreted as a named lists in R. For instance:</p>
<div class="highlight"><pre><span></span><code><span class="n">rule</span> <span class="n">dada2_filter_trim_pe</span><span class="p">:</span>
<span class="c1"># [...]</span>
<span class="n">params</span><span class="p">:</span>
<span class="p">{</span><span class="s1">'maxEE'</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'truncLen'</span><span class="p">:</span> <span class="p">[</span><span class="mi">240</span><span class="p">,</span><span class="mi">200</span><span class="p">]</span> <span class="p">}</span>
<span class="c1"># [...]</span>
</code></pre></div>
<p>Snakemake convert the Python dictionary into a named list which can be directly used for the R function (here <code>dada2::filterAndTrim()</code>). Such named list if part of the larger <code>snakemake</code> S4 object that we can access in R (more info on <a href="proxy.php?url=https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#external-scripts">Snakemake docs</a>). </p>
<p>However, you cannot concatenate the lists provided (<code>snakemake@input</code>, <code>snakemake@output</code> and <code>snakemake@params</code>) and expect <code>do.call()</code> to do all the work. No, because Snakemake passes the <code>input:</code> as unnumbered list <strong>and</strong> as named list. So for input and output slots I could not do directly <code>do.call(filterAndTrim, snakemake@input)</code>. Instead, I needed to prepare the arguments as follow:</p>
<div class="highlight"><pre><span></span><code><span class="n">args</span><span class="o"><-</span><span class="nf">list</span><span class="p">(</span>
<span class="n">fwd</span> <span class="o">=</span> <span class="n">snakemake</span><span class="o">@</span><span class="n">input</span><span class="p">[[</span><span class="s">"fwd"</span><span class="p">]],</span>
<span class="n">rev</span> <span class="o">=</span> <span class="n">snakemake</span><span class="o">@</span><span class="n">input</span><span class="p">[[</span><span class="s">"rev"</span><span class="p">]],</span>
<span class="n">filt</span> <span class="o">=</span> <span class="n">snakemake</span><span class="o">@</span><span class="n">output</span><span class="p">[[</span><span class="s">"filt"</span><span class="p">]],</span>
<span class="n">filt.rev</span> <span class="o">=</span> <span class="n">snakemake</span><span class="o">@</span><span class="n">output</span><span class="p">[[</span><span class="s">"filt_rev"</span><span class="p">]],</span>
<span class="n">multithread</span><span class="o">=</span><span class="n">snakemake</span><span class="o">@</span><span class="n">threads</span>
<span class="p">)</span>
</code></pre></div>
<p>Most of the submitted Snakemake wrappers can now accept Python dictionaries to customize the underlying DADA2 R functions.</p>
<p>These wrappers are currently under reviews by the team of the Snakemake wrappers repository. Meanwhile I try to design a DADA2 meta-wrapper to be able to nicely assemble these <em>bricks</em>.</p>LEGOlize DADA2: getting DADA2 into Snakemake2020-10-08T12:27:00+02:002020-10-09T10:39:00+02:00Charlie Pauverttag:cpauvert.github.io,2020-10-08:/legolize-dada2.html<p>A (WIP) contribution project to rely on the modularity of Snakemake wrappers to propose a flexible pipeline for processing metabarcoding data.</p><p>I finally have the time to properly use the workflow management <a href="proxy.php?url=https://github.com/snakemake/snakemake">Snakemake</a> which is great because so many recent developments happened since I’ve heard of this tool during my master degree. The authors and contributors of Snakemake actually published recently a <a href="proxy.php?url=https://doi.org/10.5281/zenodo.4067137">preprint</a> highlighting such features.</p>
<p>Among these features, one could be compared to LEGO® bricks: Snakemake <em>wrappers</em>. They are dedicated Snakemake rule that allow to plug – much like bricks – common tools to perform your analysis. These wrappers should pass automatic tests prior to their integration in the repository which safeguard against typos during development that hinders your workflow.</p>
<p>Bricks can even be assembled in dedicated sets, which in the case of Snakemake means that wrappers can be combined to design <em>meta-wrapper</em> where a common analysis workflow can be crafted using a selection of wrappers. This convenient idea enables the user to finely tune the level of modularity wished during the design of its Snakemake workflow: from custom rules, wrappers to meta-wrappers<sup id="fnref:1"><a class="footnote-ref" href="proxy.php?url=#fn:1">1</a></sup>.</p>
<p>Snakemake use is rising and there was even recently a <a href="proxy.php?url=https://doi.org/10.1101/2020.05.17.095679">preprint</a> for a <a href="proxy.php?url=https://benjjneb.github.io/dada2/">DADA2</a> workflow with Snakemake. I was really excited for this huge contribution that, in my opinion, filled a gap. However, I realized that some steps were not parts of my personal workflow (such as the taxonomy) and that instead of this one well running Snakemake workflow – huge LEGO set –, I’d rather choose from several DADA2 wrappers – piles of bricks – to build a more flexible workflow.</p>
<p>It all started when I realised that there were no DADA2 wrappers and even few metabarcoding related wrappers proposed in the repository.<br>
Last week, I then started writing my first wrappers by mimicking previous R wrappers listed in the <a href="proxy.php?url=https://snakemake-wrappers.readthedocs.io/en/stable/index.html">repository</a>. I tried to design toy examples to test the wrappers and I even manage to propose a few pull requests on the <a href="proxy.php?url=https://github.com/snakemake/snakemake-wrappers/pulls">Github repository</a>. However, I realize after going over the corrections proposed by the reviewer that I needed to carefully think ahead my wrapper and their articulations. Indeed, I first proposed wrappers using paired-end reads only which is not truly flexible. Hence some of my wrappers contained duplicated lines of code to cope with the orientation which violates the <em>don’t repeat yourself</em> <a href="proxy.php?url=https://en.wikipedia.org/wiki/Don%27t_repeat_yourself">rule</a>.
Even worse, it sometimes adds complexity to steps that could be processed unbeknownst of the reads orientation. </p>
<p>Therefore, I put my pull request on a draft status and will go back to writing properly. I already put on paper the dependency of wrappers to read orientation in order to optimize this workflow. I hope to propose soon these DADA2 wrappers and eventually a DADA2 meta-wrappers as well.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>Check the <a href="proxy.php?url=https://snakemake.readthedocs.io/en/stable/snakefiles/modularization.html">documentation</a> for more details on modularisation with Snakemake. <a class="footnote-backref" href="proxy.php?url=#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Hello (again) world!2020-10-07T16:05:00+02:002020-10-07T16:05:00+02:00Charlie Pauverttag:cpauvert.github.io,2020-10-07:/hello-again-world.html<p>At last, I’ve updated my personal page using <a href="proxy.php?url=https://getpelican.com">Pelican</a> this time, but still relying on the awesome <a href="proxy.php?url=https://pages.github.com">Github Pages</a>.</p>
<p>I am hoping to document a bit more often my personal coding projects here, especially if I am to learn new skills. More importantly, I will try to document my …</p><p>At last, I’ve updated my personal page using <a href="proxy.php?url=https://getpelican.com">Pelican</a> this time, but still relying on the awesome <a href="proxy.php?url=https://pages.github.com">Github Pages</a>.</p>
<p>I am hoping to document a bit more often my personal coding projects here, especially if I am to learn new skills. More importantly, I will try to document my code errors and how I tried to dodge them and gain knowledge in the process.</p>