Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

README.md

Essential Techniques in Data Analysis and Code Management for Research Projects

Welcome to the Advanced Python Workshop! This 20-hour course is designed to help researchers move beyond single-script, single-data-file analyses and adopt best practices in version control, collaboration, data management, modular coding, workflow orchestration, and environment management. If you’re eager to make your research code more robust, reproducible, and scalable, you’re in the right place!

Preparation Checklist

Before the workshop, please ensure you have installed the following tools:

  1. Git
  2. Conda (via Anaconda, Miniconda, or Miniforge)
  3. Visual Studio Code (VSCode)
  4. GIN CLI Client

Course Plan

Date Topic Short Description
Feb 24 (9–12:30) Git, GitHub, Conda, VSCode, & READMEs An introduction to reproducibility concepts, environment management, and collaborative coding practices. Learn to manage your code and dependencies via Git, GitHub, Conda, and VSCode.
Mar 3 (9–12:30) Functions, Modules, & Testing Dive into writing reusable functions, structuring larger projects into modules, and using Pytest to ensure code reliability.
Mar 10 (9–12:30) Dependency Inversion Implement advanced design patterns for testability and modularity, making your codebase easier to extend and maintain.
Mar 24 (9–12:30) Scientific Data File Storage with HDF5. Explore the JSON, YAML, Numpy, and HDF5 file formats for efficient, large-scale scientific data management and how to integrate it into your Python workflows.
Mar 31 (9–12:30) Workflow Management with Snakemake and Papermill Orchestrate multi-step pipelines, manage complex data analysis workflows, and ensure reproducibility using Snakemake.
Apr 7 (9–12:30) Data Packaging with XArray and GIN Make complex data easy to work with and easy to access.

Note: An optional “Joker” session is tentatively planned for April 9 (9–12:30). Content will be determined based on class progress and participant feedback.