Code Canary

Inspiration

We started with an observation: people are letting AI codegen tools pip install and npm install their way into unmaintainable Frankenstacks, and nobody actually audits what’s being pulled in. The problem isn’t just direct dependencies but rather the silent, transitive ones buried ten layers down.

Furthermore, supply chain attacks have evolved beyond simple dependency confusion; we're now seeing sophisticated campaigns targeting transitive dependencies through typosquatting, package namespace pollution, and malicious maintainer takeovers. Traditional SCA tools scan for known CVEs but miss the nuanced behavioral patterns that indicate supply chain compromise.

What it does

Our approach was to make the LLM the automated cybersecurity analyst, but feed it everything, not just the first 128k of your lockfile.

We built Code Canary as an automated security agent that combines SBOM generation, vulnerability enrichment, and LLM-powered threat analysis. The system ingests dependency manifests across package ecosystems (npm, PyPI, Maven, Go modules, Cargo), generates comprehensive Software Bills of Materials (SBOM), and performs deep semantic analysis on dependency relationships.

The core architecture leverages Modal's serverless infrastructure for parallel SBOM processing, with Python workers handling language-specific parsing (JavaScript/TypeScript via dependency trees, Python via AST analysis, Go via mod files). Each package gets enriched with vulnerability data from multiple sources, including OSV, NVD, and GitHub Security Advisories, along with metadata like maintainer trust scores, download patterns, and repository health indicators.

How we built it

We embed package metadata, vulnerability descriptions, and dependency relationships into a vector store using OpenAI's text-embedding-ada-002 for a RAG-style pipeline. When analysts query the system ("show me packages with suspicious maintainer changes" or "find transitive dependencies with privilege escalation vulns"), the agent performs semantic search across the embedded knowledge base and generates contextual analysis using GPT-4.

We handle typosquatting detection through edit distance algorithms and Levenshtein scoring against known legitimate packages. The system flags suspicious packages based on creation dates, download patterns, and semantic similarity to established libraries. Supply chain intelligence comes from cross-referencing package authors, commit patterns, and behavioral analysis of dependency updates.

For visualization, we use NetworkX to build the dependency graph from the SBOM output and export it for browser rendering. Each node represents a package, labeled with its name, and colored according to a small discrete severity palette. The graph layout is generated using a force-directed algorithm, giving a clear view of package clusters and dependency chains. Even with simple node attributes, the color coding and layout make it easy to spot high-risk packages and see their relationships at a glance, providing a quick visual summary before diving into detailed analysis.

Challenges we ran into

Technical challenges included handling polyglot dependency resolution (especially npm's nested dependency hell and Python's version conflicts), optimizing vector embeddings for large dependency graphs, and building reliable fallback mechanisms when external APIs rate-limit. We ended up implementing a hybrid approach combining local vulnerability scanning with periodic enrichment from cloud services.

Accomplishments that we're proud of

The Modal integration was particularly interesting. We use Modal serverless functions for CPU-intensive operations like dependency graph traversal and vulnerability correlation, while keeping the core API and vector operations local for low latency. The system can scale to handle enterprise codebases with thousands of dependencies by parallelizing SBOM generation across Modal workers.

As our proof of concept, we focused on demonstrating real vulnerability detection in popular open-source projects. The system successfully identifies known supply chain compromises like the PyTorch torchtriton incident and npm package squatting campaigns. The LLM agent can explain complex attack vectors in natural language, making it accessible to both security teams and developers.

Performance-wise, we're processing medium-sized projects (100-500 dependencies) in under 30 seconds including vulnerability enrichment and embedding generation. The vector search typically returns relevant results in sub-200ms, making the system viable for integration into CI/CD pipelines and security review workflows.

What we learned

Building Code Canary reinforced just how fragmented and inconsistent dependency management is across ecosystems. Even with established tooling, the quirks of each package manager, from npm’s deep nesting and duplicated subtrees to Python’s version conflict resolution, create edge cases that force bespoke parsing logic. Handling those differences in a way that still produces a clean, unified SBOM was a bigger lift than we expected.

We also gained a deeper appreciation for the tradeoffs between real-time enrichment and bulk offline scanning. Directly querying OSV, NVD, and GHSA provided the freshest data, but API rate limits and response variability meant we had to design a hybrid model that gracefully degrades to local scanning without losing accuracy. Getting that fallback logic correct, especially while parallelizing workloads across Modal, was a critical part of making the system feel responsive.

On the retrieval side, we learned that embedding and indexing a dependency graph is less about raw vector search speed and more about how you preserve relationships. Treating the graph as a first-class structure, rather than just a flat list of documents, made it much easier for the agent to reason about coordinated upgrades and blast radius. Even with relatively simple node attributes, structuring the context properly improved LLM output quality more than tweaking prompt wording.

What's next for Code Canary

Future work includes implementing behavioral analysis of dependency update patterns, and building integration APIs for popular security platforms like Snyk, Veracode, and GitHub Advanced Security.

Built With

click
d3.js
ghsa
modal
networkx
next.js
nvd
openai
osv.dev
puppeteer
python
react
tailwind
typescript/javascript
vis-network

Updates

Tony Wang started this project — Aug 14, 2025 08:07 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.