ClariFi

Unbiased financial product recommendations — credit cards, EWA, BNPL, savings, loans. Live web search, no affiliate links, interactive.

This repo is a tutorial on building AI products properly: response design → evals → improve → monitor.

git clone https://github.com/youruser/clarifi.git && npm install && npm run dev

The process

1. Design the response, not the UI

The system prompt is the product. Before writing code, decide what "good" looks like — write 5-6 ideal responses by hand. This is where Taste comes from.

What we decided: under 200 words, specific dollar amounts, must surface tradeoffs ("who should skip this"), always search live data, cover all product types not just credit cards.

2. Set up evals before you build

~30 realistic scenarios. Quality > quantity — one sloppy eval sends you chasing the wrong problem.

Write an LLM-as-Judge rubric (ours weights: specificity 25%, data freshness 20%, tradeoff transparency 20%, brevity 15%, coverage 10%, tone 10%). Then judge the judge — go through the grades manually, tweak until they match your Taste. Sometimes removing a criterion is the fix.

3. Improve with the cheapest lever first

Impact is asymptotic. Our order: system prompt (0→80%), web search as live RAG (fixed stale data), inline widgets (better UX for context gathering). We skipped multi-agent and fine-tuning — 10x complexity for marginal gains at this stage.

4. Monitor because the world changes

CI runs evals on every prompt change + weekly (catches silent model drift). Weekly human QA: 10 sampled interactions, rated by the person who wrote the rubric — not outsourced to cheap raters.

Latest from Anthropic's skill-creator 2.0: parallel isolated evals (no context bleed), blind A/B comparison (judge doesn't know which prompt version is which), trigger optimization (60/40 train/test split on skill descriptions).

Architecture

User → Claude API + web_search tool → Response + inline widgets

No database. No product JSON. The AI searches current data on every query. Freshness without maintenance.

Automated evals

pip install pyyaml anthropic
export ANTHROPIC_API_KEY=sk-...
python evals/run_evals.py                        # all 25 scenarios
python evals/run_evals.py --category credit_cards
python evals/run_evals.py --no-llm-judge         # code graders only, faster

Results land in evals/results/<timestamp>.json. Exit code 1 if pass rate drops below 80%.

Two CI triggers (.github/workflows/evals.yml):

Trigger	What happens
PR touching `prompts/**`	Runs full eval suite; posts pass/fail delta vs `main` as a PR comment
Weekly cron (Mon 9am UTC)	Runs full eval suite; opens a GitHub issue if failures exceed 25% — catches silent model drift

Add ANTHROPIC_API_KEY as a repository secret to enable CI.

MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
docs		docs
evals		evals
future		future
prompts		prompts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClariFi

The process

1. Design the response, not the UI

2. Set up evals before you build

3. Improve with the cheapest lever first

4. Monitor because the world changes

Architecture

Automated evals

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ClariFi

The process

1. Design the response, not the UI

2. Set up evals before you build

3. Improve with the cheapest lever first

4. Monitor because the world changes

Architecture

Automated evals

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages