Skip to content

table1/framework-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quick Start

Preview: During setup, you'll be asked to choose:

  • Project type: project (full-featured), course (teaching), or presentation (single talk)
  • Notebook format: Quarto .qmd (recommended) or RMarkdown .Rmd
  • Git: Whether to initialize a git repository
  • Package management: Whether to use renv for package management

Not sure? Choose the defaults. You can always change these later in settings.yml.

Option 1: CLI Tool (Recommended)

Install the CLI:

curl -fsSL https://raw.githubusercontent.com/table1/framework/main/inst/bin/install-cli.sh | bash

And get started:

# Create projects
framework new myproject
framework new slides presentation
framework new

See Command Line Interface for full details.

Option 2: One-Time Script (No CLI Installation)

One-liner (macOS/Linux/Windows with Git Bash):

curl -fsSL https://raw.githubusercontent.com/table1/framework-project/main/new-project.sh | bash

This guides you through creating a new project without installing the CLI.

Option 3: Manual Setup

Clone the template and customize init.R to your preferences:

git clone https://github.com/table1/framework-project my-project
cd my-project

Open init.R in your favorite editor to set your project name, type, and options, then run it:

framework::init(
  project_name = "MyProject",
  type = "project",                                  # or "course" or "presentation"
  use_renv = FALSE,
  default_notebook_format = "quarto",
  author_name = "Your Name",                         # Allows auto-filling Notebook author (optional)
  author_email = "[email protected]", 
  author_affiliation = "Johns Hopkins University"  
)

# Then run your code from your IDE. Or save your changes and run:
source("init.R")

Project Types

  • project (default): Full-featured research projects with exploratory notebooks, production scripts, organized data management, and documentation
  • course: Teaching materials with presentations, student notebooks, and example data
  • presentation: Single talks or presentations with minimal overhead: just data, helper functions, and output

Not sure? Use type = "project". You can always delete directories you don't need; you won't break anything.

Example structure:

project/
├── notebooks/              # Exploratory analysis
├── scripts/                # Production pipelines
├── data/
│   ├── source/private/     # Raw data (gitignored)
│   ├── source/public/      # Public raw data
│   ├── cached/             # Computation cache (gitignored)
│   └── final/private/      # Results (gitignored)
├── functions/              # Custom functions
├── results/private/        # Analysis outputs (gitignored)
├── docs/                   # Documentation
├── settings.yml              # Project configuration
├── framework.db            # Metadata/tracking database
└── .env                    # Secrets (gitignored)

Why Use Framework?

Framework reduces boilerplate and enforces best practices for data analysis:

  • Project scaffolding: Standardized directories, config-driven setup
  • Data management: Declarative data catalog, integrity tracking, encryption (on roadmap)
  • Auto-loading: Load the packages you use in every file with one command; no more file juggling with your library() calls
  • Pain-free renv integration: Use renv for reproducible package management without having to fight renv or babysit it.
  • Caching: Smart caching for expensive computations
  • Database helpers: PostgreSQL, SQLite with credential management
  • Supported file formats: CSV, TSV, RDS, Stata (.dta), SPSS (.sav), SAS (.xpt, .sas7bdat)

What Gets Created

When you run init(), Framework creates:

  • Project structure: Organized directories (varies by type)
  • Configuration files: settings.yml and optional settings/ files
  • Git setup: .gitignore configured to protect private data
  • Tooling: .lintr, .editorconfig for code quality
  • Database: framework.db for metadata tracking
  • Environment: .env template for secrets

Framework

A lightweight R package for structured, reproducible data analysis projects focusing on convention over configuration.

⚠️ Active Development: APIs may change. Version 1 with stable API coming soon.

Quick Start

Preview: During setup, you'll be asked to choose:

  • Project type: project (full-featured), course (teaching), or presentation (single talk)
  • Notebook format: Quarto .qmd (recommended) or RMarkdown .Rmd
  • Git: Whether to initialize a git repository
  • Package management: Whether to use renv for package management

Not sure? Choose the defaults. You can always change these later in config.yml.

Option 1: CLI Tool (Recommended)

# Install
curl -fsSL https://raw.githubusercontent.com/table1/framework/main/inst/bin/install-cli.sh | bash

# Create projects
framework new myproject
framework new slides presentation
framework new

See Command Line Interface for full details.

Option 2: One-Time Script (No CLI Installation)

One-liner (macOS/Linux/Windows with Git Bash):

curl -fsSL https://raw.githubusercontent.com/table1/framework-project/main/new-project.sh | bash

This guides you through creating a new project without installing the CLI.

Option 3: Manual Setup

Clone the template and customize init.R to your preferences:

git clone https://github.com/table1/framework-project my-project
cd my-project

Open init.R in your favorite editor to set your project name, type, and options, then run it:

framework::init(
  project_name = "MyProject",
  type = "project",                                  # or "course" or "presentation"
  use_renv = FALSE,
  default_notebook_format = "quarto",
  author_name = "Your Name",                         # Allows auto-filling Notebook author (optional)
  author_email = "[email protected]", 
  author_affiliation = "Johns Hopkins University"  
)

# Then run your code from your IDE. Or save your changes and run:
source("init.R")

Project Types

  • project (default): Full-featured research projects with exploratory notebooks, production scripts, organized data management, and documentation
  • course: Teaching materials with presentations, student notebooks, and example data
  • presentation: Single talks or presentations with minimal overhead: just data, helper functions, and output

Not sure? Use type = "project". You can always delete directories you don't need; you won't break anything.

Example structure:

project/
├── notebooks/              # Exploratory analysis
├── scripts/                # Production pipelines
├── data/
│   ├── source/private/     # Raw data (gitignored)
│   ├── source/public/      # Public raw data
│   ├── cached/             # Computation cache (gitignored)
│   └── final/private/      # Results (gitignored)
├── functions/              # Custom functions
├── results/private/        # Analysis outputs (gitignored)
├── docs/                   # Documentation
├── config.yml              # Project configuration
├── framework.db            # Metadata/tracking database
└── .env                    # Secrets (gitignored)

Why Use Framework?

Framework reduces boilerplate and enforces best practices for data analysis:

  • Project scaffolding: Standardized directories, config-driven setup
  • Data management: Declarative data catalog, integrity tracking, encryption (on roadmap)
  • Auto-loading: Load the packages you use in every file with one command; no more file juggling with your library() calls
  • Pain-free renv integration: Use renv for reproducible package management without having to fight renv or babysit it.
  • Caching: Smart caching for expensive computations
  • Database helpers: PostgreSQL, SQLite with credential management
  • Supported file formats: CSV, TSV, RDS, Stata (.dta), SPSS (.sav), SAS (.xpt, .sas7bdat)

What Gets Created

When you run init(), Framework creates:

  • Project structure: Organized directories (varies by type)
  • Configuration files: config.yml and optional settings/ files
  • Git setup: .gitignore configured to protect private data
  • Tooling: .lintr, .styler.R, .editorconfig for code quality
  • Database: framework.db for metadata tracking
  • Environment: .env template for secrets

Core Workflow

1. Initialize Your Session

library(framework)
scaffold()  # Loads packages, functions, config, standardizes working directory

2. Load Data

Via config:

# config.yml or settings/data.yml
data:
  source:
    private:
      survey:
        path: data/source/private/survey.dta
        type: stata
        locked: true
# Load using dot notation
df <- data_load("source.private.survey")

Direct path:

df <- data_load("data/my_file.csv")       # CSV
df <- data_load("data/stata_file.dta")    # Stata
df <- data_load("data/spss_file.sav")     # SPSS

Statistical formats (Stata/SPSS/SAS) strip metadata by default for safety. Use keep_attributes = TRUE to preserve labels.

3. Cache Expensive Operations

model <- get_or_cache("model_v1", {
  expensive_model_fit(df)
}, expire_after = 1440)  # Cache for 24 hours

4. Save Results

# Save data
data_save(processed_df, "final.private.clean", type = "csv")

# Save analysis output
result_save("regression_model", model, type = "model")

# Save notebook (blinded)
result_save("report", file = "report.html", type = "notebook",
            blind = TRUE, public = FALSE)

5. Query Databases

# config.yml
connections:
  db:
    driver: postgresql
    host: !expr Sys.getenv("DB_HOST")
    database: !expr Sys.getenv("DB_NAME")
    user: !expr Sys.getenv("DB_USER")
    password: !expr Sys.getenv("DB_PASS")
df <- query_get("SELECT * FROM users WHERE active = true", "db")

Configuration

Simple:

default:
  packages:
    - dplyr
    - ggplot2
  data:
    example: data/example.csv

Advanced: Split config into settings/ files:

default:
  data: settings/data.yml
  packages: settings/packages.yml
  connections: settings/connections.yml
  security: settings/security.yml

Use .env for secrets:

DB_HOST=localhost
DB_PASS=secret
DATA_ENCRYPTION_KEY=key123

Reference in config:

security:
  data_key: !expr Sys.getenv("DATA_ENCRYPTION_KEY")

Key Functions

Function Purpose
scaffold() Initialize session (load packages, functions, config)
data_load() Load data from path or config
data_save() Save data with integrity tracking
query_get() Execute SQL query, return data
query_execute() Execute SQL command
get_or_cache() Lazy evaluation with caching
result_save() Save analysis output
result_get() Retrieve saved result
scratch_capture() Quick debug/temp file save
renv_enable() Enable renv for reproducibility (opt-in)
renv_disable() Disable renv integration
packages_snapshot() Save package versions to renv.lock
packages_restore() Restore packages from renv.lock

Data Integrity & Security

  • Hash tracking - All data files tracked with SHA-256 hashes
  • Locked data - Flag files as read-only, errors on modification
  • Encryption - AES encryption for sensitive data/results
  • Gitignore by default - Private directories auto-ignored

Reproducibility with renv

Framework includes optional renv integration (OFF by default):

# Enable renv for this project
renv_enable()

# Your packages are now managed by renv
# Use snapshot after installing new packages
packages_snapshot()

# Disable renv if you prefer
renv_disable()

Version pinning in config.yml:

packages:
  - dplyr              # Latest from CRAN
  - [email protected]     # Specific version
  - tidyverse/dplyr@main  # GitHub with ref

See renv integration docs for details.

Roadmap

  • Excel file support
  • Quarto codebook generation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors