We welcome contributions. Here are specific ways you can help right now.
Impact: HIGH — This is the single most valuable contribution.
We're testing whether structured interaction frameworks reduce output noise across AI systems. We have 188 runs remaining.
What you need:
- Access to any of: Claude, ChatGPT, Grok, DeepSeek, Gemini, Copilot, Manus, Kimi, Euria, Perplexity (or any other system)
- 10 minutes per run
How to do it:
- Read the experiment protocol
- Pick a system and a question from the grid
- Run the question twice: once normally (Condition A), once with the Universal Prompt (Condition B)
- Record: system name, question, word count A, word count B, and both full responses
- Submit as a pull request adding your data to
experiments/data/
Data format: See experiments/schema.py for the exact schema.
Impact: MEDIUM — Helps calibrate our early warning thresholds.
If you have time-series data from any monitoring system (not just AI), you can test our drift detector against it:
from tools.drift_detector import DriftDetector
d = DriftDetector()
for score in your_time_series:
d.record(score, exchange_id="your-id")
alert = d.check()
print(alert.level, alert.rate)Report: does it detect real declines? Does it fire false positives? What thresholds work for your domain?
Impact: MEDIUM — Our detectors are tested on English text. We need:
- Non-English text (does absurdity detection work across languages?)
- Domain-specific text (medical, legal, poetic)
- Edge cases that break the detectors
from detectors.absurdity import detect
result = detect("your text here")Impact: HIGH (theoretical) — The gate function D = A × L × M is the core of the framework. We need:
- Mathematical critique: is non-compensatory multiplication the right operator?
- Alternative formulations (geometric mean? weighted?)
- Edge case analysis: what happens at boundary values?
Impact: MEDIUM — All tools are Python. Ports to JavaScript, Rust, or Go would broaden adoption.
Priority: drift_detector.py and agency_score.py (stdlib only, straightforward to port).
- Fork the repository
- Create a branch:
contrib/your-name/what-you-did - Make your changes
- Run existing examples to make sure nothing breaks
- Submit a pull request with:
- What you did
- What you found
- Your confidence level (C1–C5):
- C1: Anecdotal (single observation)
- C2: Repeated (seen multiple times)
- C3: Systematic (controlled conditions)
- C4: Validated (independent replication)
- C5: Robust (survives adversarial testing)
- Python 3.8+
- Type hints encouraged
- Docstrings required for public functions
- Prefer stdlib over external dependencies
- Each tool should be independently runnable (
python tool.pyshould produce output)
- Changes that break existing experiment data compatibility
- Dependencies on proprietary services or APIs
- Code without documentation or examples
- Findings presented without confidence levels
Open an issue or email [email protected].