Skip to content

New tools #2

@wooly905

Description

@wooly905

AnswerCode Tool Verification and Implementation Guide

Goal

This document verifies the proposed pseudo code for the remaining candidate tools in AnswerCode and explains how to implement them inside the current architecture.

The current system already covers literal search, file discovery, file reading, lightweight structure analysis, symbol-focused reads, reference lookup, test discovery, static call graph analysis, and repository architecture mapping. This guide now focuses only on the tools that are still unimplemented and should continue improving three areas:

  1. symbol-aware navigation
  2. feature-flow understanding
  3. natural-language retrieval

Current Architecture Constraints

Any new tool should fit the existing design:

  1. Create a class under Services/Tools that implements ITool.
  2. Keep the tool class thin.
  3. Put heavy analysis in reusable services under a new folder such as Services/Analysis.
  4. Return compact plain-text output because the agent loop and UI already expect text results.
  5. Register the tool in Program.cs as builder.Services.AddSingleton<ITool, NewTool>();.
  6. Update ToolResultFormatter so the UI can show better summaries and detail items.

Shared infrastructure now present in the repository:

  • IWorkspaceFileService for file enumeration, exclusions, and path normalization
  • ILanguageHeuristicService for multi-language symbol, reference, and test heuristics
  • ICSharpCompilationService for Roslyn-backed in-memory compilation
  • ISymbolAnalysisService for definition lookup, symbol boundaries, and symbol metadata
  • IReferenceAnalysisService for reference lookup and classification
  • ICallGraphService for static call graph generation with multi-language support
  • IRepoMapService for repository architecture mapping, module detection, and dependency analysis
  • IMemoryCache for compilation and analysis caching

Still useful future additions:

  • IGitService for git log, git blame, and commit lookups
  • IConfigurationAnalysisService for config source and usage tracing
  • a future semantic index service for embeddings-based retrieval

For C#-first accuracy, the key technical addition is Roslyn. The current implementation added:

  • Microsoft.CodeAnalysis.CSharp

Without Roslyn, several tools can still be built with regex and file scanning, but results remain heuristic rather than precise. That is now the path used for the non-C# languages listed above.


Verification Summary

Tool Current status Notes
semantic_code_search Planned Still needs indexing and embedding infrastructure
trace_execution_path Planned Still needs branch ranking and side-effect detection
impact_analysis Planned Should separate direct vs transitive impact
config_lookup Planned Should model configuration precedence
git_history_lookup Planned Should add line-range blame and rename-aware history

This guide intentionally omits tools that are already implemented in the repository.


Common Implementation Template

Every new tool should follow the same implementation pattern:

  1. Define tool input parameters in GetChatToolDefinition().
  2. Parse JSON arguments in ExecuteAsync().
  3. Resolve relative paths against ToolContext.RootPath.
  4. Call a reusable analysis service.
  5. Format the output as short deterministic text.
  6. Register the tool in DI.
  7. Add a formatter rule in ToolResultFormatter.

The tool class should not contain the full algorithm unless the algorithm is trivial.


1. semantic_code_search

Verification

The current pseudo code is correct in concept, but it is missing the most important prerequisite: code chunk indexing. Query-time embedding only works if the repository has already been split into chunks and stored in a searchable index.

Corrected flow

flowchart TD
        A[Index repository files] --> B[Split files into chunks or symbols]
        B --> C[Create embeddings and metadata]
        C --> D[Persist searchable index]
        D --> E[User submits natural-language query]
        E --> F[Normalize query]
        F --> G[Create embedding and keyword query]
        G --> H[Run hybrid retrieval]
        H --> I[Optional rerank top candidates]
        I --> J[Return top matches with file, lines, score, snippet]
Loading

How to implement it

Tool contract

  • Inputs:
    • query
    • include
    • language
    • top_k
  • Output:
    • ranked matches with file path, line range, symbol name if known, score, and short snippet

Services to add

  • ISemanticIndexService
  • SemanticIndexService
  • SemanticChunk model
  • SemanticSearchTool

Recommended implementation steps

  1. Reuse the same excluded-directory rules as existing tools.
  2. Enumerate code files.
  3. Chunk files by symbol when possible.
    • For C#, use Roslyn symbols.
    • For other languages, fall back to get_file_outline-style parsing or fixed windows with overlap.
  4. Store per-chunk metadata:
    • file path
    • start line
    • end line
    • language
    • symbol name
    • plain text chunk content
  5. Extend the provider layer with an embedding API, or add a dedicated embedding service.
  6. Generate embeddings and store them in memory plus an on-disk cache such as .answercode/index.
  7. At query time, compute the query embedding.
  8. Run hybrid search:
    • cosine similarity on vectors
    • keyword boost from grep_search-style terms
  9. Rerank the top N results if needed.
  10. Return the top K matches in a compact text format.

Minimum viable version

Start with a lightweight hybrid implementation:

  • chunk by file sections
  • use keyword extraction plus embeddings
  • skip cross-file symbol grouping

That version already adds major value.

Important notes

  • Rebuild the index only when files change.
  • Cache by project root path.
  • Keep chunk size small enough for precision and large enough for context.
  • This tool is high-value, but it depends on index infrastructure.

2. trace_execution_path

Verification

The pseudo code is useful, but it is too high-level for implementation. It needs an explicit branch-selection policy. Execution tracing should highlight the main path, not every branch.

Corrected flow

flowchart TD
        A[Input entry point or feature] --> B[Resolve entry symbol]
        B --> C[Build constrained call graph]
        C --> D[Detect conditions and side effects]
        D --> E[Rank important branches]
        E --> F[Flatten into primary execution steps]
        F --> G[Return readable path summary]
Loading

How to implement it

Tool contract

  • Inputs:
    • entry_symbol
    • goal_hint optional
    • max_depth
  • Output:
    • ordered main steps
    • conditions
    • side effects such as DB writes, HTTP calls, file writes, queue publishes, emitted events

Services to add

  • IExecutionTraceService
  • ExecutionTraceService
  • TraceExecutionPathTool

Recommended implementation steps

  1. Resolve the entry symbol.
  2. Build a shallow downstream call graph.
  3. Detect side-effect operations using heuristics:
    • EF Core save calls
    • repository writes
    • HTTP client calls
    • queue publish/send calls
    • logging and notification sends
  4. Detect important conditions such as authorization checks, validation gates, or feature flags.
  5. Collapse helper-only methods that do not change state.
  6. Return a numbered path summary rather than a raw graph.

Minimum viable version

Target entry points first:

  • controller actions
  • background job handlers
  • public service methods

Important notes

  • This tool should reuse call_graph, not reimplement symbol traversal from scratch.
  • It is a summarization-oriented tool, not only a parser.

3. impact_analysis

Verification

The pseudo code is correct, but it should separate direct impact from transitive impact. Those two categories should not be mixed.

Corrected flow

flowchart TD
        A[Input symbol or file] --> B[Resolve target]
        B --> C[Find direct references and related files]
        C --> D[Expand upstream and downstream impact]
        D --> E[Find affected configs and tests]
        E --> F[Separate direct and transitive risk]
        F --> G[Return impact report]
Loading

How to implement it

Tool contract

  • Inputs:
    • symbol
    • file_path
    • change_type
    • depth
  • Output:
    • direct dependents
    • transitive dependents
    • related configuration
    • test files to run
    • risk summary

Services to add

  • IImpactAnalysisService
  • ImpactAnalysisService
  • ImpactAnalysisTool

Recommended implementation steps

  1. Resolve the target symbol or file.
  2. Reuse the existing reference-analysis capability for direct usage.
  3. Reuse adjacent-file analysis for nearby dependencies.
  4. Reuse the current test-discovery capability for validation suggestions.
  5. Optionally reuse call_graph for behavior change impact.
  6. Separate output into:
    • direct impact
    • transitive impact
    • runtime/config impact
    • test impact
  7. Add a coarse risk score such as low, medium, high.

Minimum viable version

Ship a direct-impact-only report first. Add transitive depth in a second pass.

Important notes

  • This tool is an orchestrator of other analysis services.
  • It should not be implemented before the underlying reference-analysis and test-discovery foundations are in place.

4. config_lookup

Verification

The pseudo code is mostly correct. The missing piece is configuration precedence. The tool must distinguish where a value is defined from which source wins at runtime.

Corrected flow

flowchart TD
        A[Input config key or feature] --> B[Find config definitions]
        B --> C[Find loading and binding logic]
        C --> D[Model source precedence]
        D --> E[Find overrides and usage sites]
        E --> F[Return config chain]
Loading

How to implement it

Tool contract

  • Inputs:
    • config_key
    • feature_name
  • Output:
    • source files
    • bound options class if any
    • override order
    • usage sites

Services to add

  • IConfigurationAnalysisService
  • ConfigurationAnalysisService
  • ConfigLookupTool

Recommended implementation steps

  1. Search config files:
    • appsettings.json
    • environment-specific JSON files
    • local override files
    • .env or similar if present
  2. Search bootstrapping code for config providers.
  3. Search for bindings such as:
    • GetSection(...)
    • Bind(...)
    • IOptions<T>
    • indexer usage like Configuration["Key"]
  4. Build a precedence chain.
  5. Return both definition sites and runtime winner information.

Minimum viable version

Implement a .NET-focused version first because the current repository is a .NET app.

Important notes

  • In this repository, appsettings.Local.json is intentionally loaded as a local override.
  • This tool will be especially useful for support and deployment questions.

5. git_history_lookup

Verification

The pseudo code is correct, but it needs two implementation details: line-range blame and rename-aware history. File history without blame is often too broad.

Corrected flow

flowchart TD
        A[Input file, symbol, or line range] --> B[Resolve exact code region]
        B --> C[Run blame and history lookup]
        C --> D[Collect relevant commits]
        D --> E[Follow renames when needed]
        E --> F[Summarize timeline and intent]
Loading

How to implement it

Tool contract

  • Inputs:
    • file_path
    • symbol
    • line_range
  • Output:
    • commit hashes
    • author
    • timestamp
    • subject line
    • short summary of relevant changes

Services to add

  • IGitHistoryService
  • GitHistoryService
  • GitHistoryLookupTool

Recommended implementation steps

  1. Resolve the target region:
    • if symbol is given, first resolve its file and line span
  2. Use the git CLI or LibGit2Sharp.
  3. For file history, run rename-aware history.
  4. For exact lines, use blame for the range.
  5. Collect a small number of relevant commits.
  6. Optionally read commit bodies for additional explanation.
  7. Return a concise historical summary.

Minimum viable version

Use the git CLI first. That matches the current lightweight external-tool style already used for ripgrep.

Important notes

  • This tool is valuable for debugging regressions and explaining design history.
  • It should fail gracefully when the project root is not a Git repository.

Recommended Build Order

If the question is business priority, the most valuable additions are semantic_code_search and impact_analysis.

If the question is engineering dependency order, the better sequence is:

Phase 1: Repository understanding

  1. config_lookup
  2. impact_analysis

Reason:

  • these reuse the current file, reference, and test foundations
  • they improve repository-level reasoning with relatively low implementation risk
  • they immediately help architecture and maintenance questions

Phase 2: Higher-complexity reasoning

  1. semantic_code_search
  2. trace_execution_path
  3. git_history_lookup

Reason:

  • semantic_code_search needs indexing infrastructure
  • trace_execution_path needs reliable symbol resolution (call_graph is now available as a foundation)
  • git_history_lookup is independent, but easiest to add once region resolution exists

Concrete Coding Checklist for Any New Tool

When implementing a tool in AnswerCode, complete all of these steps:

  1. Create the tool class in Services/Tools.
  2. Add a stable ToolName constant.
  3. Define a precise JSON schema in GetChatToolDefinition().
  4. Validate arguments and return helpful errors.
  5. Resolve project-relative paths against ToolContext.RootPath.
  6. Put analysis logic in a reusable service.
  7. Reuse current exclusion rules for bin, obj, .git, and similar directories.
  8. Limit output size and include truncation messages.
  9. Register the tool in Program.cs.
  10. Update ToolResultFormatter for:
    - running summary
    - completed summary
    - detail items
  11. Add unit tests for:
    - happy path
    - ambiguous symbol
    - missing file/symbol
    - truncated result handling
  12. Add a short section to README.md once the tool is production-ready.

Final Recommendation

The original pseudo code was generally sound. The main issue was not wrong direction; it was missing implementation detail around indexing, symbol identity, ambiguity handling, caching, multilingual fallback, and confidence labeling.

If the goal is to make AnswerCode materially better at answering natural-language questions about source code, the strongest next investments are:

  1. semantic retrieval via semantic_code_search
  2. change-risk reporting via impact_analysis
  3. execution understanding via trace_execution_path (building on the existing call_graph)

The current combination already gives the agent better recall, better precision, and lower token cost than the original tool set.

That would move the product from "an AI that can use search tools" toward "an AI that actually understands code structure and behavior much better."

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions