Read this before creating any diagnostic tests or reports.
This document establishes mandatory naming conventions for all diagnostic work in GIMBAL. Following these conventions ensures:
- Tests are easy to find and understand
- Results are traceable and reproducible
- Reports clearly link to their corresponding tests
- The codebase remains organized as complexity grows
Each test plan document in plans/ should have a corresponding directory in tests/diagnostics/ with a matching name structure.
Example:
- Plan:
plans/v0.2.1_divergence_test_plan.md - Directory:
tests/diagnostics/v0_2_1_divergence/
Within each diagnostic directory, use the three-file pattern for each test:
test_*.py- Executable test coderesults_*.json- Raw numerical outputreport_*.md- Human-readable analysis
Pattern: test_group_N_<short_description>.py
Rules:
- Must start with
test_prefix (makes them identifiable as test files) - Include
_group_N_where N is the test group number from the plan - Use lowercase with underscores for the description
- Description should be 2-4 words maximum
Examples:
test_group_1_baseline_no_hmm.py
test_group_2_baseline_with_hmm.py
test_group_3_state_count.py
test_group_5_divergence_localization.py
What to include:
- Docstring at top explaining purpose and linking to test plan
- Configuration parameters clearly defined
- Function(s) to run the test and collect metrics
- Code to save results to corresponding JSON file
- Return metrics for interactive use
Pattern: results_group_N_<short_description>.json
Rules:
- Must start with
results_prefix - Use same
_group_N_and description as corresponding test file - JSON format for structured data
- Include metadata (timestamp, configuration, environment)
Examples:
results_group_1_baseline_no_hmm.json
results_group_2_baseline_with_hmm.json
Required fields:
{
"test_group": 1,
"description": "Short description matching filename",
"configuration": {
"parameter1": value1,
"parameter2": value2
},
"metrics": {
"primary_metric": value,
"secondary_metrics": {}
},
"timestamp": "ISO 8601 format",
"environment": {
"relevant": "version info"
}
}Pattern: report_group_N_<short_description>.md
Rules:
- Must start with
report_prefix - Use same
_group_N_and description as corresponding test file - Markdown format for readability
- Include interpretation and recommendations
Examples:
report_group_1_baseline_no_hmm.md
report_group_2_baseline_with_hmm.md
report_summary.md (special case: synthesizes all tests)
Required sections:
- Test Header - Group number, description, configuration, date
- Results Summary - Key metrics in table format
- Interpretation - What the results mean
- Diagnostic Artifacts - Links to plots, traces, etc.
- Recommendations - Next steps based on findings
Pattern: plots/group_N_<short_description>/
Rules:
- Subdirectory named
plots/within the diagnostic directory - Each test group gets its own subdirectory
- Use same
_group_N_and description as test files - Contains PNG/PDF files with descriptive names
Example structure:
tests/diagnostics/v0_2_1_divergence/
├── plots/
│ ├── group_1_baseline_no_hmm/
│ │ ├── divergence_pairs.png
│ │ ├── trace_plot.png
│ │ └── energy_plot.png
│ ├── group_2_baseline_with_hmm/
│ │ └── ...
When writing tests that generate plots, follow these guidelines:
Always configure matplotlib to use a non-interactive backend at the start of your test module:
import matplotlib
matplotlib.use('Agg') # Must be set BEFORE importing pyplot
import matplotlib.pyplot as pltWhy:
- Prevents the test from hanging waiting for user interaction
- Allows tests to run unattended in CI/CD or batch modes
- Necessary for reliable agent handoff and automation
After saving every figure, explicitly close it to free memory:
fig, ax = plt.subplots(figsize=(12, 8))
# ... plot content ...
fig.savefig(plot_path, dpi=150, bbox_inches='tight')
plt.close(fig) # Critical: prevents memory accumulationWhy:
- Matplotlib keeps figures in memory by default
- Accumulation can exhaust memory for long-running tests
- Explicit cleanup ensures deterministic behavior
Organize plot paths using the diagnostic directory structure:
from pathlib import Path
# Define output directory matching naming conventions
test_name = "group_1_baseline_no_hmm"
diagnostic_dir = Path("plots") / test_name
diagnostic_dir.mkdir(parents=True, exist_ok=True)
# Save with descriptive filenames
fig.savefig(diagnostic_dir / f"{test_name}_trace.png", dpi=150, bbox_inches="tight")Result: Plots automatically organize by test group with clear naming
Wrap plot generation in try-except to prevent test failures from plot issues:
try:
fig, ax = plt.subplots(figsize=(12, 8))
# ... plot content ...
fig.savefig(plot_path, dpi=150, bbox_inches='tight')
except Exception as e:
print(f"Warning: Plot generation failed: {e}")
# Test continues even if visualization fails
finally:
plt.close('all') # Ensure cleanup happens regardlessWhy:
- Visualization is auxiliary to test results
- Test failure should not depend on plot generation success
- Allows troubleshooting visualization issues separately
To include generated plots in markdown reports:
## Diagnostic Visualization

*Figure 1: Posterior trace showing parameter evolution over iterations.*File path format:
- Use relative paths from the report location to the plot file
- Include descriptive figure captions
- Reference specific plot features in accompanying text
For Test Group 1 from v0.2.1_divergence_test_plan.md:
tests/diagnostics/v0_2_1_divergence/
├── test_group_1_baseline_no_hmm.py ← Runs the test
├── results_group_1_baseline_no_hmm.json ← Stores metrics
├── report_group_1_baseline_no_hmm.md ← Analyzes results
└── plots/
└── group_1_baseline_no_hmm/
├── divergence_pairs.png
└── ess_comparison.png
Workflow:
- Write
test_group_1_baseline_no_hmm.pyfollowing the plan - Run it:
pixi run python test_group_1_baseline_no_hmm.py - It generates
results_group_1_baseline_no_hmm.jsonautomatically - It generates plots in
plots/group_1_baseline_no_hmm/ - Write
report_group_1_baseline_no_hmm.mdinterpreting the results - Update
report_summary.mdwith findings from this test
Name: report_summary.md (in the diagnostic directory root)
Purpose: Synthesizes findings across all test groups
Contents:
- Table comparing key metrics across all tests
- Overall interpretation of what causes the issue
- Prioritized recommendations
- Links to individual group reports for details
Pattern: utils_<purpose>.py or <purpose>_utils.py
Examples:
test_utils.py (shared test utilities)
plotting_utils.py (visualization helpers)
analysis_utils.py (metric computation)
Rules:
- Do NOT start with
test_orresults_orreport_ - Use descriptive names indicating their utility purpose
- Can be imported by multiple test scripts
Clarity:
- File purpose is immediately obvious from name
- Test group numbers link related files together
- No confusion between tests, results, and reports
Traceability:
- Each result file traces to exactly one test script
- Each report references specific result files
- Summary report synthesizes everything
Scalability:
- Easy to add new test groups without naming conflicts
- Multiple test plans can coexist in separate directories
- Consistent structure across all diagnostics
Maintainability:
- Future developers understand structure instantly
- AI assistants can navigate the organization
- Documentation naturally stays aligned with code
❌ Don't do this:
test1.py (no description)
baseline_test.py (no group number)
hmm_test.py (no group number)
output.json (which test?)
results.json (which test?)
analysis.md (which test?)
report.md (which test?)
✅ Do this instead:
test_group_1_baseline_no_hmm.py
test_group_2_baseline_with_hmm.py
results_group_1_baseline_no_hmm.json
results_group_2_baseline_with_hmm.json
report_group_1_baseline_no_hmm.md
report_group_2_baseline_with_hmm.md
report_summary.md
Before creating files for a new test:
- Read the test plan document in
plans/ - Identify the test group number
- Choose a short, descriptive name (2-4 words)
- Create test script:
test_group_N_<description>.py - Ensure it saves to:
results_group_N_<description>.json - Create plot directory:
plots/group_N_<description>/ - Write report:
report_group_N_<description>.md - Update:
report_summary.mdwith new findings - Verify all files have matching
_group_N_<description>suffixes
This file is named IMPORTANT_FILE_NAMING_CONVENTIONS.md with "IMPORTANT" in all caps at the root level to ensure:
- It appears at the top of directory listings (capitals sort before lowercase)
- The filename itself communicates urgency (IMPORTANT grabs attention)
- It's in the root directory (impossible to miss)
- It's explicit about content (NAMING_CONVENTIONS tells you what's inside)
When working on GIMBAL diagnostics, this file should be your first reference for how to organize your work.
Last updated: December 10, 2025 Applies to: All diagnostic testing in GIMBAL v0.2.1 and beyond