New tools

# AnswerCode Tool Verification and Implementation Guide

## Goal

This document verifies the proposed pseudo code for the remaining candidate tools in `AnswerCode` and explains how to implement them inside the current architecture.

The current system already covers literal search, file discovery, file reading, lightweight structure analysis, symbol-focused reads, reference lookup, test discovery, static call graph analysis, and repository architecture mapping. This guide now focuses only on the tools that are still unimplemented and should continue improving three areas:

1. symbol-aware navigation
2. feature-flow understanding
3. natural-language retrieval

## Current Architecture Constraints

Any new tool should fit the existing design:

1. Create a class under `Services/Tools` that implements `ITool`.
2. Keep the tool class thin.
3. Put heavy analysis in reusable services under a new folder such as `Services/Analysis`.
4. Return compact plain-text output because the agent loop and UI already expect text results.
5. Register the tool in `Program.cs` as `builder.Services.AddSingleton<ITool, NewTool>();`.
6. Update `ToolResultFormatter` so the UI can show better summaries and detail items.

Shared infrastructure now present in the repository:

- `IWorkspaceFileService` for file enumeration, exclusions, and path normalization
- `ILanguageHeuristicService` for multi-language symbol, reference, and test heuristics
- `ICSharpCompilationService` for Roslyn-backed in-memory compilation
- `ISymbolAnalysisService` for definition lookup, symbol boundaries, and symbol metadata
- `IReferenceAnalysisService` for reference lookup and classification
- `ICallGraphService` for static call graph generation with multi-language support
- `IRepoMapService` for repository architecture mapping, module detection, and dependency analysis
- `IMemoryCache` for compilation and analysis caching

Still useful future additions:

- `IGitService` for `git log`, `git blame`, and commit lookups
- `IConfigurationAnalysisService` for config source and usage tracing
- a future semantic index service for embeddings-based retrieval

For C#-first accuracy, the key technical addition is Roslyn. The current implementation added:

- `Microsoft.CodeAnalysis.CSharp`

Without Roslyn, several tools can still be built with regex and file scanning, but results remain heuristic rather than precise. That is now the path used for the non-C# languages listed above.

---

## Verification Summary

| Tool | Current status | Notes |
|------|----------------|-------|
| `semantic_code_search` | Planned | Still needs indexing and embedding infrastructure |
| `trace_execution_path` | Planned | Still needs branch ranking and side-effect detection |
| `impact_analysis` | Planned | Should separate direct vs transitive impact |
| `config_lookup` | Planned | Should model configuration precedence |
| `git_history_lookup` | Planned | Should add line-range blame and rename-aware history |

This guide intentionally omits tools that are already implemented in the repository.

---

## Common Implementation Template

Every new tool should follow the same implementation pattern:

1. Define tool input parameters in `GetChatToolDefinition()`.
2. Parse JSON arguments in `ExecuteAsync()`.
3. Resolve relative paths against `ToolContext.RootPath`.
4. Call a reusable analysis service.
5. Format the output as short deterministic text.
6. Register the tool in DI.
7. Add a formatter rule in `ToolResultFormatter`.

The tool class should not contain the full algorithm unless the algorithm is trivial.

---

## 1. `semantic_code_search`

### Verification

The current pseudo code is correct in concept, but it is missing the most important prerequisite: code chunk indexing. Query-time embedding only works if the repository has already been split into chunks and stored in a searchable index.

### Corrected flow

```mermaid
flowchart TD
        A[Index repository files] --> B[Split files into chunks or symbols]
        B --> C[Create embeddings and metadata]
        C --> D[Persist searchable index]
        D --> E[User submits natural-language query]
        E --> F[Normalize query]
        F --> G[Create embedding and keyword query]
        G --> H[Run hybrid retrieval]
        H --> I[Optional rerank top candidates]
        I --> J[Return top matches with file, lines, score, snippet]
```

### How to implement it

#### Tool contract

- Inputs:
    - `query`
    - `include`
    - `language`
    - `top_k`
- Output:
    - ranked matches with file path, line range, symbol name if known, score, and short snippet

#### Services to add

- `ISemanticIndexService`
- `SemanticIndexService`
- `SemanticChunk` model
- `SemanticSearchTool`

#### Recommended implementation steps

1. Reuse the same excluded-directory rules as existing tools.
2. Enumerate code files.
3. Chunk files by symbol when possible.
     - For C#, use Roslyn symbols.
     - For other languages, fall back to `get_file_outline`-style parsing or fixed windows with overlap.
4. Store per-chunk metadata:
     - file path
     - start line
     - end line
     - language
     - symbol name
     - plain text chunk content
5. Extend the provider layer with an embedding API, or add a dedicated embedding service.
6. Generate embeddings and store them in memory plus an on-disk cache such as `.answercode/index`.
7. At query time, compute the query embedding.
8. Run hybrid search:
     - cosine similarity on vectors
     - keyword boost from `grep_search`-style terms
9. Rerank the top `N` results if needed.
10. Return the top `K` matches in a compact text format.

#### Minimum viable version

Start with a lightweight hybrid implementation:

- chunk by file sections
- use keyword extraction plus embeddings
- skip cross-file symbol grouping

That version already adds major value.

#### Important notes

- Rebuild the index only when files change.
- Cache by project root path.
- Keep chunk size small enough for precision and large enough for context.
- This tool is high-value, but it depends on index infrastructure.

---

## 2. `trace_execution_path`

### Verification

The pseudo code is useful, but it is too high-level for implementation. It needs an explicit branch-selection policy. Execution tracing should highlight the main path, not every branch.

### Corrected flow

```mermaid
flowchart TD
        A[Input entry point or feature] --> B[Resolve entry symbol]
        B --> C[Build constrained call graph]
        C --> D[Detect conditions and side effects]
        D --> E[Rank important branches]
        E --> F[Flatten into primary execution steps]
        F --> G[Return readable path summary]
```

### How to implement it

#### Tool contract

- Inputs:
    - `entry_symbol`
    - `goal_hint` optional
    - `max_depth`
- Output:
    - ordered main steps
    - conditions
    - side effects such as DB writes, HTTP calls, file writes, queue publishes, emitted events

#### Services to add

- `IExecutionTraceService`
- `ExecutionTraceService`
- `TraceExecutionPathTool`

#### Recommended implementation steps

1. Resolve the entry symbol.
2. Build a shallow downstream call graph.
3. Detect side-effect operations using heuristics:
     - EF Core save calls
     - repository writes
     - HTTP client calls
     - queue publish/send calls
     - logging and notification sends
4. Detect important conditions such as authorization checks, validation gates, or feature flags.
5. Collapse helper-only methods that do not change state.
6. Return a numbered path summary rather than a raw graph.

#### Minimum viable version

Target entry points first:

- controller actions
- background job handlers
- public service methods

#### Important notes

- This tool should reuse `call_graph`, not reimplement symbol traversal from scratch.
- It is a summarization-oriented tool, not only a parser.

---

## 3. `impact_analysis`

### Verification

The pseudo code is correct, but it should separate direct impact from transitive impact. Those two categories should not be mixed.

### Corrected flow

```mermaid
flowchart TD
        A[Input symbol or file] --> B[Resolve target]
        B --> C[Find direct references and related files]
        C --> D[Expand upstream and downstream impact]
        D --> E[Find affected configs and tests]
        E --> F[Separate direct and transitive risk]
        F --> G[Return impact report]
```

### How to implement it

#### Tool contract

- Inputs:
    - `symbol`
    - `file_path`
    - `change_type`
    - `depth`
- Output:
    - direct dependents
    - transitive dependents
    - related configuration
    - test files to run
    - risk summary

#### Services to add

- `IImpactAnalysisService`
- `ImpactAnalysisService`
- `ImpactAnalysisTool`

#### Recommended implementation steps

1. Resolve the target symbol or file.
2. Reuse the existing reference-analysis capability for direct usage.
3. Reuse adjacent-file analysis for nearby dependencies.
4. Reuse the current test-discovery capability for validation suggestions.
5. Optionally reuse `call_graph` for behavior change impact.
6. Separate output into:
     - direct impact
     - transitive impact
     - runtime/config impact
     - test impact
7. Add a coarse risk score such as `low`, `medium`, `high`.

#### Minimum viable version

Ship a direct-impact-only report first. Add transitive depth in a second pass.

#### Important notes

- This tool is an orchestrator of other analysis services.
- It should not be implemented before the underlying reference-analysis and test-discovery foundations are in place.

---

## 4. `config_lookup`

### Verification

The pseudo code is mostly correct. The missing piece is configuration precedence. The tool must distinguish where a value is defined from which source wins at runtime.

### Corrected flow

```mermaid
flowchart TD
        A[Input config key or feature] --> B[Find config definitions]
        B --> C[Find loading and binding logic]
        C --> D[Model source precedence]
        D --> E[Find overrides and usage sites]
        E --> F[Return config chain]
```

### How to implement it

#### Tool contract

- Inputs:
    - `config_key`
    - `feature_name`
- Output:
    - source files
    - bound options class if any
    - override order
    - usage sites

#### Services to add

- `IConfigurationAnalysisService`
- `ConfigurationAnalysisService`
- `ConfigLookupTool`

#### Recommended implementation steps

1. Search config files:
     - `appsettings.json`
     - environment-specific JSON files
     - local override files
     - `.env` or similar if present
2. Search bootstrapping code for config providers.
3. Search for bindings such as:
     - `GetSection(...)`
     - `Bind(...)`
     - `IOptions<T>`
     - indexer usage like `Configuration["Key"]`
4. Build a precedence chain.
5. Return both definition sites and runtime winner information.

#### Minimum viable version

Implement a .NET-focused version first because the current repository is a .NET app.

#### Important notes

- In this repository, `appsettings.Local.json` is intentionally loaded as a local override.
- This tool will be especially useful for support and deployment questions.

---

## 5. `git_history_lookup`

### Verification

The pseudo code is correct, but it needs two implementation details: line-range blame and rename-aware history. File history without blame is often too broad.

### Corrected flow

```mermaid
flowchart TD
        A[Input file, symbol, or line range] --> B[Resolve exact code region]
        B --> C[Run blame and history lookup]
        C --> D[Collect relevant commits]
        D --> E[Follow renames when needed]
        E --> F[Summarize timeline and intent]
```

### How to implement it

#### Tool contract

- Inputs:
    - `file_path`
    - `symbol`
    - `line_range`
- Output:
    - commit hashes
    - author
    - timestamp
    - subject line
    - short summary of relevant changes

#### Services to add

- `IGitHistoryService`
- `GitHistoryService`
- `GitHistoryLookupTool`

#### Recommended implementation steps

1. Resolve the target region:
     - if `symbol` is given, first resolve its file and line span
2. Use the `git` CLI or `LibGit2Sharp`.
3. For file history, run rename-aware history.
4. For exact lines, use blame for the range.
5. Collect a small number of relevant commits.
6. Optionally read commit bodies for additional explanation.
7. Return a concise historical summary.

#### Minimum viable version

Use the `git` CLI first. That matches the current lightweight external-tool style already used for ripgrep.

#### Important notes

- This tool is valuable for debugging regressions and explaining design history.
- It should fail gracefully when the project root is not a Git repository.

---

## Recommended Build Order

If the question is business priority, the most valuable additions are `semantic_code_search` and `impact_analysis`.

If the question is engineering dependency order, the better sequence is:

### Phase 1: Repository understanding

1. `config_lookup`
2. `impact_analysis`

Reason:

- these reuse the current file, reference, and test foundations
- they improve repository-level reasoning with relatively low implementation risk
- they immediately help architecture and maintenance questions

### Phase 2: Higher-complexity reasoning

3. `semantic_code_search`
4. `trace_execution_path`
5. `git_history_lookup`

Reason:

- `semantic_code_search` needs indexing infrastructure
- `trace_execution_path` needs reliable symbol resolution (`call_graph` is now available as a foundation)
- `git_history_lookup` is independent, but easiest to add once region resolution exists

---

## Concrete Coding Checklist for Any New Tool

When implementing a tool in `AnswerCode`, complete all of these steps:

1. Create the tool class in `Services/Tools`.
2. Add a stable `ToolName` constant.
3. Define a precise JSON schema in `GetChatToolDefinition()`.
4. Validate arguments and return helpful errors.
5. Resolve project-relative paths against `ToolContext.RootPath`.
6. Put analysis logic in a reusable service.
7. Reuse current exclusion rules for `bin`, `obj`, `.git`, and similar directories.
8. Limit output size and include truncation messages.
9. Register the tool in `Program.cs`.
10. Update `ToolResultFormatter` for:
        - running summary
        - completed summary
        - detail items
11. Add unit tests for:
        - happy path
        - ambiguous symbol
        - missing file/symbol
        - truncated result handling
12. Add a short section to `README.md` once the tool is production-ready.

---

## Final Recommendation

The original pseudo code was generally sound. The main issue was not wrong direction; it was missing implementation detail around indexing, symbol identity, ambiguity handling, caching, multilingual fallback, and confidence labeling.

If the goal is to make `AnswerCode` materially better at answering natural-language questions about source code, the strongest next investments are:

1. semantic retrieval via `semantic_code_search`
2. change-risk reporting via `impact_analysis`
3. execution understanding via `trace_execution_path` (building on the existing `call_graph`)

The current combination already gives the agent better recall, better precision, and lower token cost than the original tool set.

That would move the product from "an AI that can use search tools" toward "an AI that actually understands code structure and behavior much better."


Tool	Current status	Notes
`semantic_code_search`	Planned	Still needs indexing and embedding infrastructure
`trace_execution_path`	Planned	Still needs branch ranking and side-effect detection
`impact_analysis`	Planned	Should separate direct vs transitive impact
`config_lookup`	Planned	Should model configuration precedence
`git_history_lookup`	Planned	Should add line-range blame and rename-aware history

New tools #2

Description

AnswerCode Tool Verification and Implementation Guide

Goal

Current Architecture Constraints

Verification Summary

Common Implementation Template

1. semantic_code_search

Verification

Corrected flow

How to implement it

Tool contract

Services to add

Recommended implementation steps

Minimum viable version

Important notes

2. trace_execution_path

Verification

Corrected flow

How to implement it

Tool contract

Services to add

Recommended implementation steps

Minimum viable version

Important notes

3. impact_analysis

Verification

Corrected flow

How to implement it

Tool contract

Services to add

Recommended implementation steps

Minimum viable version

Important notes

4. config_lookup

Verification

Corrected flow

How to implement it

Tool contract

Services to add

Recommended implementation steps

Minimum viable version

Important notes

5. git_history_lookup

Verification

Corrected flow

How to implement it

Tool contract

Services to add

Recommended implementation steps

Minimum viable version

Important notes

Recommended Build Order

Phase 1: Repository understanding

Phase 2: Higher-complexity reasoning

Concrete Coding Checklist for Any New Tool

Final Recommendation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. `semantic_code_search`

2. `trace_execution_path`

3. `impact_analysis`

4. `config_lookup`

5. `git_history_lookup`