Jekyll2026-02-01T03:57:36+00:00https://jdkent.github.io/feed.xmlJames D. KentResearch Associate building reproducible neuroimaging meta-analysis tools (Neurosynth Compose, NiMARE) with an applied informatics and open data focus.James D. Kent[email protected]https://jdkent.github.ioNeurostuff Development Environment2020-09-10T00:00:00+00:002020-09-10T00:00:00+00:00https://jdkent.github.io/posts/2020/09/neurostuff-Development-EnvironmentThis post is a chronicle of how I’m setting up a development environment for neurostuff using VSCode (version 1.48.2 as of this writing).

Prerequisites

Step 0: clone the repository

clone the repository

git clone https://github.com/PsychoinformaticsLab/neurostuff.git

and then cd into the created folder.

cd neurostuff

Step 1: Open VSCode

open vscode

code .

Step 2: Generate the Container Files

Click on the lower lefthand green section of the VSCode bottom banner.

That will open a menu where you will select Remote-Containers: Add Development Container Configuration Files...

which progresses the menu to the next choice of which file to use to build the container. You will select From 'docker-compose.yml'.

The next menu will ask which service you wish to create, you will select neurostuff.

Following those choices should result in a folder named .devcontainer with two files:

  1. devcontainer.json
  2. docker-compose.yml

Step 2: Edit devcontainer.json

There are several edits needed to setup devcontainer.json for the neurostuff repository.

The first field to edit in devcontainer.json is dockerComposeFile:

within neurostuff, there is an additional docker-compose.dev.yml file to add.

The second field to edit in devcontainer.json is workspaceFolder, which by default is /workspace.

Change /workspace to /neurostuff

Finally, the third field we want to edit is extensions. neurostuff is written in python, so we want the python extension to be installed with vscode on this remote container. Add ms-python.python to extensions`.

Step 3: Edit docker-compose.yml

Within docker-compose.yml we will change the volumes mount from /workspace to /neurostuff

Step 4: Open VSCode with the remote container

Click on the left corner in the green section of the bottom banner again.

Select Remote-Containers: Reopen in Container

Once the images are built and the containers are created, you should be working from within the neurostuff container, yay!

Step 5: Setup python testing

Press Ctrl+Shift+P and type into the menu bar: Python: Configure Tests

neurostuff uses pytest so we will select pytest as our test framework.

When the menu progresses, select neurostuff as the base directory.

The menu should close and VSCode will find all tests written for neurostuff and you will see a new icon that looks like a beaker.

To try debugging a test, create a breakpoint in one of the test files.

then click on Debug Test above the test function.

Clicking Debug Test should run the test until it reaches the breakpoint at which point you have total control to inspect variables and test your understanding.

Additional Reading

]]>
James D. Kent[email protected]https://jdkent.github.io
Development Environment with VS Code2019-06-26T00:00:00+00:002019-06-26T00:00:00+00:00https://jdkent.github.io/posts/2019/06/FMRIPREP-Development-EnvironmentOften times you have to choose between installing the complex web of dependencies to debug fmriprep from a graphical text editor and using a container where you can only debug from a terminal. VS Code has several extensions that make it easier to develop fmriprep and have the best of both worlds. The VS Code Docker and Remote Development extensions will give you a great experience running tests and debugging fmriprep on your laptop or work machine.

1. Download Test Data

To test fmriprep, we use test data modified from openneuro datasets. We will download the data in a downloads folder at the project root:

$ mkdir -p ./downloads && cd ./downloads

Then using a script similar to the one below, we download all necessary data into the downloads folder:

mkdir -p data/reports

# regression data for pytest
if [[ ! -d data/fmriprep_bold_truncated ]]; then
    wget --retry-connrefused --waitretry=5 --read-timeout=20 --timeout=15 -t 0 -q \
    -O fmriprep_bold_truncated.tar.gz "https://osf.io/286yr/download"
    tar xvzf fmriprep_bold_truncated.tar.gz -C data
else
    echo "Truncated BOLD series were already downloaded"
fi

if [[ ! -d data/fmriprep_bold_mask ]]; then
    wget --retry-connrefused --waitretry=5 --read-timeout=20 --timeout=15 -t 0 -q \
    -O fmriprep_bold_mask.tar.gz "https://osf.io/s4f7b/download"
    tar xvzf fmriprep_bold_mask.tar.gz -C data
else
    echo "Pre-computed masks were already downloaded"
fi

# data for test fmriprep runs
if [[ ! -d data/ds005 ]]; then
    wget --retry-connrefused --waitretry=5 --read-timeout=20 --timeout=15 -t 0 -q \
    -O ds005_downsampled.tar.gz "https://files.osf.io/v1/resources/fvuh8/providers/osfstorage/57f32a429ad5a101f977eb75"
    tar xvzf ds005_downsampled.tar.gz -C data
else
    echo "Dataset ds000005 was already downloaded"
fi

if [[ ! -d data/ds054 ]]; then
    wget --retry-connrefused --waitretry=5 --read-timeout=20 --timeout=15 -t 0 -q \
    -O ds054_downsampled.tar.gz "https://files.osf.io/v1/resources/fvuh8/providers/osfstorage/57f32c22594d9001ef91bf9e"
    tar xvzf ds054_downsampled.tar.gz -C data
else
    echo "Dataset ds000054 was already downloaded"
fi

if [[ ! -d data/ds210 ]]; then
    wget --retry-connrefused --waitretry=5 --read-timeout=20 --timeout=15 -t 0 -q \
    -O ds210_downsampled.tar.gz "https://files.osf.io/v1/resources/fvuh8/providers/osfstorage/5ae9e37b9a64d7000ce66c21"
    tar xvzf ds210_downsampled.tar.gz -C data
else
    echo "Dataset ds000210 was already downloaded"
fi

if [[ ! -d ds005/derivatives/freesurfer ]]; then
    mkdir -p ds005/derivatives
    wget --retry-connrefused --waitretry=5 --read-timeout=20 --timeout=15 -t 0 -q \
    -O ds005_derivatives_freesurfer.tar.gz "https://files.osf.io/v1/resources/fvuh8/providers/osfstorage/58fe59eb594d900250960180"
    tar xvzf ds005_derivatives_freesurfer.tar.gz -C ds005/derivatives
else
    echo "FreeSurfer derivatives of ds000005 were already downloaded"
fi

2. Install Prerequisite Software

You can see the documentation for installing VS Code extensions

3. Create .devcontainer.json

.devcontainer.json specifies how we build our development environment. the file lives in the root (i.e. top) directory of the fmriprep repository at the same level as the Dockerfile.

We are following the directions given by VS Code to create a development environment with a Dockerfile.

To create the .devcontainer.json we will open VS Code in the root of the fmriprep project:

$ cd $HOME/projects/fmriprep
$ code .

Once VS Code is open, we will press Ctrl+Shift+P on the keyboard and type Remote-Containers: Create Container Configuration File. Selecting that command will create a .devcontainer.json for us, but we will change the json file to meet our needs.

The contents of .devcontainer.json will look like the following:

// See https://aka.ms/vscode-remote/devcontainer.json for format details.
{
    "name": "fmriprep_dev",
    "image": "fmriprep:dev",
    "dockerFile": "Dockerfile",
    "workspaceMount": "src=${env:PWD},dst=/src/fmriprep,type=bind",
    "workspaceFolder": "/src/fmriprep",
    "extensions": [
        "ms-python.python",
        "visualstudioexptteam.vscodeintellicode"
    ],
    "runArgs": ["--entrypoint", "",
                "-v", "${env:PWD}/downloads:/tmp",
                "-e", "FMRIPREP_REGRESSION_SOURCE=/tmp/data/fmriprep_bold_truncated",
                "-e", "FMRIPREP_REGRESSION_TARGETS=/tmp/data/fmriprep_bold_mask",
                "-e", "FMRIPREP_REGRESSION_REPORTS=/tmp/data/reports",
                "-e", "FS_LICENSE=/tmp/license.txt",
                "-e", "FMRIPREP_DEV=1"],
    "postCreateCommand": "pip uninstall -y fmriprep && python setup.py develop && conda install -y flake8 && cd /tmp && echo 'cHJpbnRmICJrcnp5c3p0b2YuZ29yZ29sZXdza2lAZ21haWwuY29tXG41MTcyXG4gKkN2dW12RVYzelRmZ1xuRlM1Si8yYzFhZ2c0RVxuIiA+IGxpY2Vuc2UudHh0Cg==' | base64 -d | sh"
}

The keys are documented so we will not re-hash them here, but we will explain the motivation for the values referenced by some of the keys.

  • We changed the default workspaceMount to bind our local fmriprep repository to the container.
  • Similarly, we changed the workspaceFolder to open VS Code in the correct directory where our local fmriprep repository is now bound.
  • We installed two essential plugins for working with the code: ms-python.python and visualstudioexptteam.vscodeintellicode.
    • ms-python.python helps with debugging, linting, intellisense, etc. for python.

    • visualstudioexptteam.vscodeintellicode helps with code completion based on common code patterns.

  • The runArgs removes the entrypoint for the container (it was set to only run fmriprep), mounts data downloaded from step 1, and sets environment variables for the container to know where the test data should be and that we are in a testing environment.
  • The postCreateCommand performs three miscellaneous tasks:

    • reinstalls fmriprep under development mode so edits in /src/fmriprep make a difference in the call to fmriprep (as opposed to patching as we've seen in the above sections).
    • installs flake8 for code linting in VS Code.
    • creates a freesurfer licence file in /tmp, which is necessary to run fmriprep with freesurfer.

4. Create Container Environment

press Ctrl+Shift+P on the keyboard and type/select Remote-Containers: Open Folder in Container This should open a folder browser. Navigate to your fmriprep project folder and select open. The build process should begin.

5a. Setup Debugging with .vscode/launch.json

.vscode/launch.json is a file that helps VS Code run debugging sessions. Go to the debug view in the activity bar on the side of VS Code. From the debug view, select the configure gear icon on the Debug view top bar.

.vscode/launch.json should now exist and have a couple default entries. We will remove those entries and replace them with the following:

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [

        {
            "name": "python: ds005-anat",
            "type": "python",
            "request": "launch",
            "program": "/usr/local/miniconda/bin/fmriprep",
            "args": [
                "-w", "/tmp/ds005/work",
                "/tmp/data/ds005",
                "/tmp/ds005/derivatives",
                "participant",
                "--skull-strip-template", "OASIS30ANTs:res-1",
                "--output-spaces", "MNI152NLin2009cAsym", "MNI152NLin6Asym",
                "--sloppy", "--write-graph",
                "--anat-only", "-vv", "--notrack"
            ],
            "console": "integratedTerminal",
            "justMyCode": false
        },
        {
            "name": "python: ds005-full",
            "type": "python",
            "request": "launch",
            "program": "/usr/local/miniconda/bin/fmriprep",
            "args": [
                "-w", "/tmp/ds005/work",
                "/tmp/data/ds005",
                "/tmp/ds005/derivatives",
                "participant",
                "--sloppy", "--write-graph",
                "--use-aroma",
                "--skull-strip-template", "OASIS30ANTs:res-1",
                "--output-space", "T1w", "template", "fsaverage5", "fsnative",
                "--template-resampling-grid",  "native",
                "--use-plugin", "/src/fmriprep/.circleci/legacy.yml",
                "--cifti-output", "-vv", "--notrack"
            ],
            "console": "integratedTerminal",
            "justMyCode": false
        },
        {
            "name": "python: ds054",
            "type": "python",
            "request": "launch",
            "program": "/usr/local/miniconda/bin/fmriprep",
            "args": [
                "-w", "/tmp/ds054/work",
                "/tmp/data/ds054",
                "/tmp/ds054/derivatives",
                "participant",
                "--fs-no-reconall", "--sloppy",
                "--output-spaces", "MNI152NLin2009cAsym:res-2", "anat", "func",
                "-vv",
                "--notrack"
            ],
            "console": "integratedTerminal",
            "justMyCode": false
        },
        {
            "name": "python: ds210-anat",
            "type": "python",
            "request": "launch",
            "program": "/usr/local/miniconda/bin/fmriprep",
            "args": [
                "-w", "/tmp/ds210/work",
                "/tmp/data/ds210",
                "/tmp/ds210/derivatives",
                "participant",
                "--fs-no-reconall", "--sloppy", "--write-graph",
                "--anat-only", "-vv", "--notrack"
            ],
            "console": "integratedTerminal",
            "justMyCode": false
        },
        {
            "name": "python: ds210-full",
            "type": "python",
            "request": "launch",
            "program": "/usr/local/miniconda/bin/fmriprep",
            "args": [
                "-w", "/tmp/ds210/work",
                "/tmp/data/ds210",
                "/tmp/ds210/derivatives",
                "participant",
                "--t2s-coreg", "--use-syn-sdc",
                "--template-resampling-grid", "native",
                "--dummy-scans", "1",
                "--fs-no-reconall", "--sloppy", "--write-graph",
                "--anat-only", "-vv", "--notrack"
            ],
            "console": "integratedTerminal",
            "justMyCode": false
        }
    ]
}

After adding those entries, you should be able to hit the green arrow and debug any changes you made to fmriprep.

You can edit this file to test on your own data or some other configuration. Please see python debugging in VS Code to learn more about the configurations.

5b. pytest in VS Code

In addition to debugging, you can also interactively run pytest. Please see the VS Code directions to get the testing framework setup.

]]>
James D. Kent[email protected]https://jdkent.github.io
Rules for problem solving and understanding code2019-04-06T00:00:00+00:002019-04-06T00:00:00+00:00https://jdkent.github.io/posts/2019/04/Problem-Solving
  • Create a minimal dataset to test the code

  • Learn to use a debugger

  • Step through the code line by line

  • ]]>
    James D. Kent[email protected]https://jdkent.github.io
    Python User Learning R2019-02-05T00:00:00+00:002019-02-05T00:00:00+00:00https://jdkent.github.io/posts/2019/02/Python-to-R-Part-1Just as I’m getting a feel for python, why not tack on another language. R is very popular for statistical computing, and the number of packages R has for this functionality confirms that intuition.

    So in order to be maximally effective in my graduate work (and use the right model for the job), I should at least have a passing knowledge of R in addition to python. I am using this blog format to chronicle my adventures and missteps while learning R.

    WYSIWYG (what you see is what you get)

    This is the assumption I’ve operated under with python. When I make a list and print out the results, I get something like this:

    list(1, 2, 3, 4, 5)
    [1, 2, 3, 4, 5]
    

    I explicitly see the structure of the list and the straight brackets tell me this is a list as opposed to parens which would indicate a tuple.

    However in R, I may get something like:

    c(1, 2, 3, 4, 5)
    [1] 1 2 3 4 5
    

    It took me a little while to get comfortable that the [1] prepended is a convenience to show which line the results are being printed out to. R gets more confusing when trying to show a more complex object. However, I’m beginning to appreciate R as an interactive programming language, and the seemingly strange way to print data structures is great for the interactive user who does not need to concern themselves with the underlying data structures R is using.

    But if you are interested in the structure, then you should use the struct() function, which will bring it closer to what I’m used to seeing in python and help me understand the data types I use in R.

    The next post will probably be about non-standard evaluation (this still blows my mind)

    ]]>
    James D. Kent[email protected]https://jdkent.github.io
    Automating Data Quality Assurance2018-11-20T00:00:00+00:002018-11-20T00:00:00+00:00https://jdkent.github.io/posts/2018/11/Setup-for-Data-QAThis is an ongoing process where we are attempting to collect data and visualize it quickly so we can see if anything looks off.

    I still think the best/simplest scenerio is to use psychopy for stimulus presentation and use other python utilities to generate a figure from the data.

    Alas, we are stuck with eprime and I am struck with inspiration to make things much more complicated to practice using utilities that are not directly designed for this use-case.

    Before I dive in, here are a list of tools/services/utilities I will be using to setup the quality assurance service. They each link to a tutorial/explanation.

    Prerequisites

    QA script exists

    we’ve already written/borrowed code to generate an svg file from the output of an eprime task, so I will not be covering that. We will assume there is a script that generates some form of figure output (in a BIDS organized fashion). You can look at the end of the guide for what the code looks like for an example script.

    You have a github account

    sign up for github

    You have a circleci account

    sign up for circleci and connect your github account

    You have a dockerhub account

    sign up for dockerhub

    Step 1: Create a reproducible environment to run the QA code

    Our QA code for this example is written in python, and currently a good way to share the environment necessary to run/reproduce the code is anaconda.

    If you developed the qa code while working in a conda environment, great! Otherwise you will create a conda environment with:

    conda create -n eprime_convert python=3.6
    

    where eprime_convert is the name of the environment (you can make this be anything you want) and python=3.6 is selecting the specific version of python (we currently use 3.6). To activate the newly created environment:

    source activate eprime_convert
    

    Now you will look at the import statements at the top of your script and conda install the necessary packages.

    from convert_eprime import convert
    import pandas as pd
    import seaborn as sns
    from argparse import ArgumentParser
    import os
    from matplotlib import pyplot as plt
    from glob import glob
    import shutil
    import re
    

    From this, it appears I need to install: convert_eprime, pandas, seaborn, and matplotlib. All the other imports are from builtin packages in python so they are available by default. (you will notice which packages are default with practice) My first pass to install everything would be:

    conda install convert_eprime pandas seaborn matplotlib
    

    This would install everything if I didn’t include convert_eprime. convert_eprime is not tracked by anaconda, and isn’t even tracked by pypi. It’s a pet project from another graduate student that was fed up with e-merge. To install convert_eprime I need to know how to install a github repo. Luckily, stackoverflow has an answer for everything. So the real commands to install everything are:

    conda install pandas seaborn matplotlib
    pip install git+https://github.com/tsalo/convert-eprime.git
    

    Test your script to make sure it works with these installs. If it complains that you are missing something, install it. Now you can export your environment to a file so it can be reproduced.

    conda env export > environment.yml
    

    Open up that environment.yml because we need to edit it. It may look something like this:

    name: eprime_convert
    channels:
      - defaults
    dependencies:
      - blas=1.0=mkl
      - ca-certificates=2018.03.07=0
      - certifi=2018.10.15=py36_0
      - cycler=0.10.0=py36_0
      - dbus=1.13.2=h714fa37_1
      - expat=2.2.6=he6710b0_0
      - fontconfig=2.13.0=h9420a91_0
      - freetype=2.9.1=h8a8886c_1
      - glib=2.56.2=hd408876_0
      - gst-plugins-base=1.14.0=hbbd80ab_1
      - gstreamer=1.14.0=hb453b48_1
      - icu=58.2=h9c2bf20_1
      - intel-openmp=2019.0=118
      - jpeg=9b=h024ee3a_2
      - kiwisolver=1.0.1=py36hf484d3e_0
      - libedit=3.1.20170329=h6b74fdf_2
      - libffi=3.2.1=hd88cf55_4
      - libgcc-ng=8.2.0=hdf63c60_1
      - libgfortran-ng=7.3.0=hdf63c60_0
      - libpng=1.6.35=hbc83047_0
      - libstdcxx-ng=8.2.0=hdf63c60_1
      - libuuid=1.0.3=h1bed415_2
      - libxcb=1.13=h1bed415_1
      - libxml2=2.9.8=h26e45fe_1
      - matplotlib=3.0.1=py36h5429711_0
      - mkl=2019.0=118
      - mkl_fft=1.0.6=py36h7dd41cf_0
      - mkl_random=1.0.1=py36h4414c95_1
      - ncurses=6.1=hf484d3e_0
      - numpy=1.15.4=py36h1d66e8a_0
      - numpy-base=1.15.4=py36h81de0dd_0
      - openssl=1.0.2p=h14c3975_0
      - pandas=0.23.4=py36h04863e7_0
      - patsy=0.5.1=py36_0
      - pcre=8.42=h439df22_0
      - pip=18.1=py36_0
      - pyparsing=2.3.0=py36_0
      - pyqt=5.9.2=py36h05f1152_2
      - python=3.6.6=h6e4f718_2
      - python-dateutil=2.7.5=py36_0
      - pytz=2018.7=py36_0
      - qt=5.9.6=h8703b6f_2
      - readline=7.0=h7b6447c_5
      - scipy=1.1.0=py36hfa4b5c9_1
      - seaborn=0.9.0=py36_0
      - setuptools=40.5.0=py36_0
      - sip=4.19.8=py36hf484d3e_0
      - six=1.11.0=py36_1
      - sqlite=3.25.2=h7b6447c_0
      - statsmodels=0.9.0=py36h035aef0_0
      - tk=8.6.8=hbc83047_0
      - tornado=5.1.1=py36h7b6447c_0
      - wheel=0.32.2=py36_0
      - xz=5.2.4=h14c3975_4
      - zlib=1.2.11=ha838bed_2
      - pip:
        - convert-eprime==0.0.1a0
        - future==0.17.1
    prefix: /home/james/.conda/envs/eprime_convert
    

    If we were only going to run this environment on identical (or near identical) hardware, then this is fine, but if we want a more flexible yml, then we need to start editing. A few things to do:

    • remove the prefix
    • change the convert-eprime version to the github repo
    • remove the depency installs and the machine specific install codes

    After editing, the file should look something like this:

    name: convert_eprime
    channels:
      - defaults
    dependencies:
      - matplotlib=3.0.1
      - numpy=1.15.4
      - pandas=0.23.4
      - seaborn=0.9.0
      - python=3.6
      - pip:
        - git+https://github.com/tsalo/convert-eprime.git
    

    Much cleaner (I kept numpy as its own install just to be explicit, I don’t believe it’s actually necessary to include).

    We have created the yml to basically build the same environment that we want to use/build our code with. This will be good for deploying/sharing the code in multiple contexts. However, we are going to lock down the environment in which the code runs even further using docker. Basically, we are going to build a docker container that has our conda environment installed on it.

    We can do that by making a Dockerfile that could look like this:

    # https://medium.com/@chadlagore/conda-environments-with-docker-82cdc9d25754
    FROM continuumio/miniconda3:4.5.11
    
    COPY eprime_convert.yml /env/
    
    RUN conda env create -f /env/eprime_convert.yml &&\
        conda clean --all
    
    # Pull the environment name out of the environment.yml
    RUN echo "source activate $(head -1 /env/eprime_convert.yml | cut -d' ' -f2)" > ~/.bashrc
    ENV PATH /opt/conda/envs/$(head -1 /env/eprime_convert.yml | cut -d' ' -f2)/bin:$PATH
    
    ENTRYPOINT [ "/bin/bash", "-c" ]
    

    and we can build the Dockerfile with this command:

    docker build -t jdkent/eprime_convert .
    

    The tag is linked to my dockerhub account so when I push the container to dockerhub it will go the correct location. I will push the container to dockerhub with the following command:

    docker push jdkent/eprime_convert
    

    The container can be seen on dockerhub.

    Excellent! With this in place we can move on to setting up circleci

    Step 2: Use circleci to run the code after each data commit

    circleci is an online service that can run arbitrary code whenever something happens in a github repository. The vagueness of the description hides the power behind this service. Essentially, your imagination is the limit for what you can do.

    Follow the official circleci docs to add the repository to circleci so that circleci will begin triggering builds when commits appear in that repository.

    Inside your git repository add a .circleci folder and make a config.yml inside that folder, that is what circleci will read.

    Here is a full example config.yml for circleci, I will break it down after.

    # Python CircleCI 2.0 configuration file
    #
    # Check https://circleci.com/docs/2.0/language-python/ for more details
    #
    version: 2
    jobs:
      build:
        docker:
          # specify the version you desire here
          - image: jdkent/eprime_convert:latest
    
        working_directory: ~/repo
    
        steps:
          - run:
              name: clone github repo
              command: |
                git clone https:///${GITHUB_TOKEN}@github.com/HBClab/BetterTaskSwitch.git
    
          - run:
              name: check if data QA should be skipped
              command: |
                cd ~/repo/BetterTaskSwitch
                if [[ "$( git log --format=oneline -n 1 $CIRCLE_SHA1 | grep -i -E '\[skip[ _]?ci\]' )" != "" ]]; then
                  echo "Skipping Data QA"
                  circleci step halt
                fi
    
          - run:
              name: run eprime convert
              command: |
                  source activate eprime_convert
                  ~/repo/BetterTaskSwitch/code/eprime_convert.py \
                    -b ~/repo/BetterTaskSwitch/bids \
                    -r ~/repo/BetterTaskSwitch/task-full_resp-srbox \
                    -c ~/repo/BetterTaskSwitch/code/config_file/task_switch.json \
                    -a mri \
                    --sub-prefix GE120
          - run:
              name: add and commit files
              command: |
                cd ~/repo/BetterTaskSwitch
                git config credential.helper 'cache --timeout=120'
                git config user.email "[email protected]"
                git config user.name "QA Bot"
                # Push quietly to prevent showing the token in log
                git add .
                git commit -m "[skip ci] $(date)"
                git push -q https://${GITHUB_TOKEN}@github.com/HBClab/BetterTaskSwitch.git master
    
    • version: 2: the overall version of circleci to use, they are depreciating version one so all of them should be version 2
    • jobs: the list of things I want circleci to run.
      • build:: this provides the option to choose what machinary I want circleci to run on
        • docker:: I want to use docker to select the environment my jobs are run using.
          • - image:jdkent/eprime_convert:latest: this selects the docker image stored on dockerhub that we just made in the last step.
        • working_directory: ~/repo: where the commandline interface will drop me when I’m running commands in the docker container we selected (I don’t really take advantage of this option).
        • steps:: the steps we will take to run the job.
          • - run:: instantiation of a step to take in the job
            • name: clone github repo: the name of the step we are taking.
            • command: |: the actual command we will be running in the docker container (the | (pipe) allows us to type the command on a separate line so the line of code does not look crowded).
          • - run
            • name: check if data QA should be skipped
            • command: |: this command checks if [skip ci] or [skip_ci] is in the most recent commit message and will stop the circleci build if this is selected.
          • - run:
            • name: run eprime convert
            • command: |: this command activates the conda environment and runs our data qa script with the appropriate inputs generating the figure output.
          • - run:
            • name: add and commit files
            • command: |: this command creates a github identity so the bot can push the new data to the github repository (importantly the github message contains [skip ci], what would happen if that wasn’t there?)

    One important detail I’ve left out is what’s up with ${GITHUB_TOKEN}. That is a special variable I’ve defined using circleci’s environment variable settings. This is great for storing variables that represent some type of authentication (e.g. passwords), but you don’t want everyone to be able to see the password. In this instance I’m using a github token. You can make your own github token going to your github profile, clicking on settings, clicking on developer settings, and then creating a new token. see the github announcement about tokens

    Warning: you will only have explicit access to your token when you create it, so make sure you copy the token somewhere safe on your computer.

    Once you have circleci setup and the config file inside your repository, you are ready to add the files and push the changes back up to github, and observe your first circleci build. The steps would look something like this:

    git add .circleci/config.yml
    git commit -m 'add circleci build configuration'
    git push origin master
    

    Note: the error I ran into when doing this was incorrect permissions of eprime_convert.py in my repository. I gave the file executable permissions with the following command:

    git update-index --chmod=+x eprime_convert.py
    

    Step 3: display the figures using github-pages

    We have created a reproducible environment and setup circleci to run everytime we push a new commit to the repository. The next step is to easily visualize all the figures we have created. We will do this using github-pages.

    Follow the github instructions to have github start hosting your repository as a static webpage (using github-pages). I’m using the minimal theme and I suggest that you use that theme too. Pull the changes to your repository. You will have an _config.yml file in your base directory. Change the file to look something like this:

    theme: jekyll-theme-minimal
    plugins:
      - jekyll-relative-links
    title: [BetterTaskSwitch]
    description: [Monitoring BetterTaskSwitch Data]
    logo: https://avatars0.githubusercontent.com/u/24659915?s=400&u=12a4f626488fe0f692d77f355d9dd9f3e4e63f7a&v=4
    baseurl: /BetterTaskSwitch
    

    You will change the title, description, and baseurl to what’s specific in the repository you are working on. The logo is pointing towards our (HBClab) github logo.

    Next we will add liquid syntax to display all the swarmplots that are in our repository. You will place this code in your README.md file located at the base of your repository.

    {% assign my_files = site.static_files | where:"extname",".svg" | sort:"modified_time" | reverse %}
    
    {% capture sevendays %}{{'now' | date: "%s" | minus : 604800 }}{% endcapture %}
    
    {% for taskswitch in my_files %}
        {% if taskswitch.name contains "swarmplot" %}
            {% capture file_mod %}{{taskswitch.modified_time | date: "%s"}}{% endcapture %}
            {% if file_mod > sevendays %}
    
    ### Recent
    
            {% else %}
    
    ### Older
    
            {% endif %}
    **{{taskswitch.name}}**
    ![{{taskswitch.name}}]({{ taskswitch.path | prepend:site.baseurl }})
        {% endif %}
    {% endfor %}
    

    Note: This stackoverflow helped me with how to parse and compare dates

    I will explain important bits of this code:

    {% assign my_files = site.static_files | where:"extname",".svg" | sort:"modified_time" | reverse %}

    This line creates a variable called my_files that searches through all static files where the extension of the file is .svg. Next, the resulting array is then piped to sort the array by the date the file was last modified (from oldest -> newest). Finally, the result is reversed so that the array is sorted from newest -> oldest.

    {% capture sevendays %}{{'now' | date: "%s" | minus : 604800 }}{% endcapture %}

    This line creates a variable called sevendays which measures the current time using seconds %s and then subtracts seven days worth of seconds (7 * 24 * 60 * 60 = 604800). This will be used to tell whether an image is seven days old or not.

    {% capture file_mod %}{{taskswitch.modified_time | date: "%s"}}{% endcapture %}

    This line creates the variable file_mod. file_mod is the date (in seconds) when the file was last modified. This means we can directly compare file_mod and sevendays to test whether the file is older or newer than seven days.

    ![{{taskswitch.name}}]({{ taskswitch.path | prepend:site.baseurl }})

    This is the last line I will explain since it may look confusing. It combines both markdown syntax and liquid syntax. Here is the markdown portion: ![name](url). That markdown syntax displays an inline image. The double curly brackets are liquid syntax. These return strings that can be interpreted by markdown. taskswitch.path is the path to the file relative to the top directory of the repository (e.g. /some/dir/file.svg). However, with how github parses the url, we also need to include the website basename as well, so we prepend the site’s baseurl. If you look back, you can see we defined the baseurl variable in _config.yml. This is the difference between searching for a file using this https://hbclab.github.io as our baseurl and this https://hbclab.github.io/BetterTaskSwitch (we want this one)

    Next we want to check to make sure we did everything correctly. We can do this by serving the jekyll website we made locally. Please follow the github instructions to do this.

    Once we are satisfied with how the website looks, we can add/commit/push the changes to github.

    git add _config.yml Gemfile README.md
    git commit -m 'add website functionality'
    git push origin master
    

    That’s it! Once you’ve done all that, you can reap the benefits of having an automated system that generates figures and makes them visible via a website.

    Example Script

    This code was written to work, not be beautiful, acknowledge that this code may not represent best (or even) recommended practices.

    #!/usr/bin/env python
    # generate pipelines that read in the eprime txt files and output a
    # machine readable summary and a useful figure for quality assurance.
    
    from convert_eprime import convert
    import pandas as pd
    import numpy as np
    from argparse import ArgumentParser
    import os
    from glob import glob
    import shutil
    import re
    from matplotlib import pyplot as plt
    plt.style.use('ggplot')
    import seaborn as sns
    sns.set_palette("bright")
    
    # expressions
    session_dict = {1: 'pre', 2: 'post'}
    
    
    def get_parser():
        """Build parser object for cmdline processing"""
        parser = ArgumentParser(description='betterVTSM.py: converts '
                                            'eprime output to tsv in BIDS format')
        parser.add_argument('-b', '--bids', action='store',
                            help='root folder of a BIDS valid dataset')
        parser.add_argument('-r', '--raw-dir', action='store',
                            help='directory where edat and txt files live')
        parser.add_argument('-p', '--participant-label', action='store', nargs='+',
                            help='participant label(s) to process')
        parser.add_argument('-s', '--session-label', action='store', nargs='+',
                            help='session label(s) to process (either 1 or 2)')
        parser.add_argument('-c', '--config', action='store', required=True,
                            help='config file to process the eprime txt. '
                                 'see convert_eprime for details')
        parser.add_argument('--sub-prefix', action='store',
                            help='add additional characters to the prefix of the participant label')
        return parser
    
    def copy_eprime_files(src, dest):
        # collect edat2 and txt files
        types = ('*.edat2', '*.txt')
        raw_files = []
        for type in types:
            raw_files.extend(glob(os.path.join(src, type)))
    
        # copy all files into sourcedata (if not already there)
        copied_files = 0
        for file in raw_files:
            out_file = os.path.join(dest, os.path.basename(file))
            if not os.path.isfile(out_file):
                shutil.copy(file, dest)
                copied_files += 1
        return copied_files
    
    
    def main():
        """Entry point"""
        opts = get_parser().parse_args()
    
        # set input/output directories
        bids_dir = os.path.abspath(opts.bids)
        # ensure bids directory exists
        os.makedirs(bids_dir, exist_ok=True)
    
    
    
        sourcedata = os.path.join(bids_dir, 'sourcedata', 'VSTM')
        derivatives = os.path.join(bids_dir, 'derivatives')
    
        # ensure sourcedata and derivatives exist
        os.makedirs(sourcedata, exist_ok=True)
        os.makedirs(derivatives, exist_ok=True)
    
    
        # assume data is already copied over if raw_dir isn't specified
        if opts.raw_dir:
            raw_dir = os.path.abspath(opts.raw_dir)
            # output is only the number of copied files, throwing away
            files_copied = copy_eprime_files(raw_dir, sourcedata)
            print('{num} file(s) copied'.format(num=files_copied))
        else:
            print('-r not specified, assuming data are in the correct location: '
                  '{dir}'.format(dir=sourcedata))
    
        # collect participant labels
        if opts.participant_label:
            participants = opts.participant_label
        else:
            participant_files = glob(os.path.join(sourcedata, 'VSTM_*.txt'))
            sub_expr = re.compile(r'^.*VSTM_PACR-(?P<sub_id>[0-9]{3})-(?P<ses_id>[1-2]).txt')
            participants = []
            for participant_file in participant_files:
                print(participant_file)
                sub_dict = sub_expr.search(participant_file).groupdict()
                participants.append(sub_dict['sub_id'])
    
        # collect sessions
        if opts.session_label:
            sessions = opts.session_label
        else:
            sessions = [1, 2]
    
        filename_template = 'VSTM_PACR-{sub}-{ses}.{ext}'
        participant_dict = {}
        for participant in participants:
            participant_dict[participant] = {}
            for session in sessions:
                # initialize sub/ses dictionary
                participant_dict[participant][session] = {'edat': None, 'txt': None}
    
                # get the edat file (if it exists)
                edat_file = filename_template.format(sub=participant,
                                                     ses=session,
                                                     ext='edat2')
    
                if os.path.isfile(os.path.join(sourcedata, edat_file)):
                    participant_dict[participant][session]['edat'] = os.path.join(
                        sourcedata, edat_file
                    )
                else:
                    print('{edat} missing!'.format(edat=edat_file))
                    participant_dict[participant].pop(session)
                    continue
    
                # get the txt file (if it exists)
                txt_file = filename_template.format(sub=participant,
                                                    ses=session,
                                                    ext='txt')
    
                if os.path.isfile(os.path.join(sourcedata, txt_file)):
                    participant_dict[participant][session]['txt'] = os.path.join(
                        sourcedata, txt_file
                    )
                else:
                    print('{txt} missing!'.format(txt=txt_file))
                    participant_dict[participant].pop(session)
                    continue
    
        # process the data per session
        for participant in participant_dict.keys():
            if opts.sub_prefix:
                participant_label = opts.sub_prefix + participant
            else:
                participant_label = participant
            for session in participant_dict[participant].keys():
                # type coersion to integer
                session = int(session)
                session_label = session_dict[session]
                edat_file = participant_dict[participant][session]['edat']
                txt_file = participant_dict[participant][session]['txt']
                config = os.path.abspath(opts.config)
    
                folder = 'beh'
    
                work_file = os.path.join(sourcedata, 'work', 'sub-' + participant_label,
                                         'ses-' + session_label, 'beh',
                                         'sub-{sub}_ses-{ses}_task-VSTM_raw.csv'.format(sub=participant_label, ses=session_label))
                # ensure directory exists
                os.makedirs(os.path.dirname(work_file), exist_ok=True)
                # conversion to csv
                convert.text_to_rcsv(txt_file, edat_file, config, work_file)
                # create dataframe
                df = pd.read_csv(work_file)
    
    
                #drops practice trials
                df.drop(df[(df.Running == 'ColorPractice') | (df.Running == 'ShapePractice') | (df.Running == 'PracticeBoth')].index, inplace=True)
                #drop all NaN entries, re: trials where no response was desired (at begining of all VSTM blocks)
                df.dropna(how='all', inplace=True)
                # rename column headers
                df.rename(index=str, columns={"Running": "trial_type",
                                  "Probe.ACC": "correct",
                                  "Probe.RT": "response_time",
                                  "Probe.CRESP": "probe_novelty"}, inplace=True)
                # convert response_time into seconds
                df['response_time'] = df['response_time'] / 1000
                # change 'correct' column from float to int
                df.correct = df.correct.astype(int)
                # create new column for block number
                df['block'] = df['trial_type']
                # replace trial_type elements with simpler description
                df['trial_type'].replace({'SimColour':'color', 'SimShape':'shape',
                                          'SimBoth':'color_and_shape'}, inplace=True)
                # replace probe_novelty elements with a more sensible set
                # {/} -> novel -> 1
                # z -> repeat -> 0
                df['probe_novelty'].replace({'{/}': 1, 'z': 0}, inplace=True)
    
                # write processed data to file
                base_file = 'sub-{sub}_ses-{ses}_task-VSTM_events.tsv'
                bids_file = os.path.join(bids_dir,
                                         'sub-' + participant_label,
                                         'ses-' + session_label,
                                         folder,
                                         base_file.format(
                                            sub=participant_label,
                                            ses=session_label)
                                         )
    
                # make sure the directory exists
                os.makedirs(os.path.dirname(bids_file), exist_ok=True)
                df.to_csv(bids_file, sep='\t', index=False)
    
    
                # Do some quality assurance
                derivatives_dir = os.path.join(derivatives, 'VSTMQA')
                os.makedirs(derivatives_dir, exist_ok=True)
                base_json = 'sub-{sub}_ses-{ses}_task-VSTM_averages.json'
                out_json = os.path.join(derivatives_dir,
                                        'sub-' + participant_label,
                                        'ses-' + session_label,
                                        folder,
                                        base_json.format(
                                           sub=participant_label,
                                           ses=session_label)
                                        )
                base_fig = 'sub-{sub}_ses-{ses}_task-VSTM_swarmplot.svg'
                out_fig = os.path.join(derivatives_dir,
                                       'sub-' + participant_label,
                                       'ses-' + session_label,
                                       folder,
                                       base_fig.format(
                                        sub=participant_label,
                                        ses=session_label)
                                       )
    
                # make the derivatives directory for the participant/session in taskSwitchQA
                os.makedirs(os.path.dirname(out_json), exist_ok=True)
    
                # get average response time and average correct
                json_dict = {'response_time': None, 'correct': None}
                json_dict['response_time'] = df['response_time'].where(df['correct'] == 1).mean()
                json_dict['correct'] = df['correct'].mean()
                ave_res = pd.Series(json_dict)
                ave_res.to_json(out_json)
                if not os.path.isfile(out_fig):
                    # make a swarmplot
                    myplot = sns.swarmplot(x="trial_type", y="response_time",
                                           hue="correct", data=df, size=6)
                    # set the y range larger to fit the legend
                    myplot.set_ylim(0, 10.0)
                    # remove the title of the legend
                    myplot.legend(title=None)
                    # rename the xticks
                    myplot.set_xticklabels(['Color', 'Shape', 'Shape and Color'])
                    # rename xlabel
                    myplot.set_xlabel('trial type')
                    myplot.set_ylabel('response time (seconds)')
                    # rename the legend labels
                    new_labels = ['incorrect', 'correct']
                    for t, l in zip(myplot.legend_.texts, new_labels):
                        t.set_text(l)
                    # save the figure
                    myplot.figure.savefig(out_fig, dpi=72)
                    # remove all plot features from memory
                    plt.clf()
    
    
    if __name__ == '__main__':
        main()
    
    ]]>
    James D. Kent[email protected]https://jdkent.github.io
    My Workflow2018-10-14T00:00:00+00:002018-10-14T00:00:00+00:00https://jdkent.github.io/posts/2018/10/Current-WorkflowWe work with data on a lab server, basically network attached storage… There is no ftp access or any other web service integrations, so we do not treat it as our own private git server. However, I am “softly” mimicking that functionality in my current workflow. Once I’ve generated data I want to analyze/explore on the server (via some heavy data chugging analysis through the cluster), I can make the output directory a git repository and clone it locally. Since multiple people can access the server at the same time, this is useful as to not step on each other’s toes as we access/modify data. When I’m done fooling around locally, I can try to push, but since the repository I cloned from isn’t bare, I can’t. To get around that I had to use this command (that I stole from stack overflow):

    git config --local receive.denyCurrentBranch updateInstead
    

    And now I can push to the server git repo. We can also benefit from branches if multiple people are working on the data at the same time.

    In addition, I can make a python environment with conda, and wrap up any python notebooks I make with a yml file that means anyone else (obstensibly) can replicate my environment and run the code (reproducible!)

    ]]>
    James D. Kent[email protected]https://jdkent.github.io