HackerStats

Inspiration

Hackathons generate a massive, fragmented graph of people, projects, tech stacks, and events. It’s hard to answer deceptively simple questions like “Who should I team with?” or “What ideas have traction?” without spending hours searching.
We built HackerStats to make the hidden structure of hackathons visible: discover relationships, map communities, and surface novel ideas using graph analysis and modern NLP vectorization.

What it does

Interactive graph of the hackathon ecosystem:
- Visualizes Hackers, Projects, and Hackathons as a connected network.
- Click any node to see rich details with deep links to Devpost profiles and project pages.
- Fuzzy search to instantly locate people/projects; matching nodes are highlighted.
Idea Brainstorming:
- Type a prompt; we vectorize it and retrieve top similar projects from a precomputed corpus.
- Get quick signal on originality, saturation, and competitiveness with heuristic diagnostics.
Smart subgraph expansion:
- Start from a single hacker and explore multi-hop relationships (contributors, shared events, related projects) with a depth/limit that ensures large, connected subgraphs without noise.

How we built it

Webcrawling
- Multithreaded Selenium instances across a shared queue to engage in recursive BFS search by hacker, by related devpost.
- AWS EC2 remote server instances for dynamic load balancing
Frontend (Next.js + D3):
- Custom D3 force-directed graph with zoom/drag, collision, layered hover labels, and highlight states.
- Node detail panel with auto-generated external links (Devpost user and software pages), social URL extraction, and metadata display.
- Serverless API routes act as a control plane for graph queries and NLP calls.
Graph data (Neo4j):
- We ingest and normalize entities (Hacker, Devpost/Project, Hackathon).
- Variable-length Cypher queries collect expansive yet connected subgraphs rooted at a specific node, then densify edges among selected nodes to maintain coherence.
Vectorization + NLP:
- Custom DevpostVectorizer combines multiple representation spaces:
- Sentence embeddings (transformers) for semantics.
- TF-IDF–derived signals for topical specificity.
- Domain and user-segmentation embeddings from curated ontologies.
- Tech-stack, awards, and team-composition features projected into fixed-length vectors.
- We concatenate these into a high-dimensional “combined” vector and compute cosine similarity for retrieval and clustering.
- Precomputed project vectors are stored on disk and loaded into memory for sub-100ms similarity queries.
Python service (FastAPI):
- Exposes /api/brainstorm and /api/vectorizer endpoints for vectorization and retrieval.
- Caches model and dataset to keep latency low; returns clean JSON for the frontend.
Infrastructure:
- Monorepo on Railway: separate services for frontend (Next.js) and backend (FastAPI), communicating via HTTP. Neo4j hosted with secure credentials.
- Environment-driven configuration; no filesystem coupling between services.

Challenges we ran into

Schema design for a heterogeneous graph:
- Balancing expressiveness (rich relationships) with query performance and visual clarity was nontrivial.
Vector fusion:
- Combining transformer embeddings with categorical/ontology-driven vectors required careful normalization to avoid any single subspace dominating cosine similarity.
Subgraph expansion at scale:
- Unbounded variable-length graph queries balloon quickly. We had to implement guards, deduplication, and a two-phase approach (select nodes, then fetch all intra-node edges).
Robust scraping/normalization:
- Profiles and project pages vary. We built conservative parsers to extract usernames, social URLs, and participation data.

Accomplishments that we’re proud of

A genuinely useful, highly interactive graph experience with real-time search and crisp labels.
A practical hybrid NLP pipeline that feels “intelligent” for ideation without relying solely on one model.
Clear, production-friendly separation of concerns (frontend ↔ backend ↔ database) suitable for cloud deployment.

What we learned

Graph UX is as much about hiding edges as showing them; clarity beats completeness.
Hybrid vector spaces outperform a single embedding on sparse, structured domains like hackathon data.
Serverless + dedicated Python inference services is a sweet spot for latency and iteration speed.

What’s next

Temporal analysis: visualize career trajectories and project lineages over time.
Community detection: identify clusters, bridges, and emerging idea neighborhoods.
Advanced retrieval: re-ranking with cross-encoders, few-shot concept search, and semantic filters.
Contributor reputation and influence scores powered by graph centrality and outcome signals.

Tech stack

Frontend: Next.js, TypeScript, D3, Framer Motion
Backend: FastAPI (Python), sentence-transformers, NumPy, scikit-learn
Database: Neo4j
Infra: Railway (monorepo, multi-service), environment-based config
Integrations: Devpost deep links; social URLs parsing

Try it

Explore the Graph to discover relationships and projects.
Use Brainstorm to validate and refine your next hackathon idea with similarity search and quick diagnostics.

Built With

Updates

Isaac Chacko started this project — Oct 04, 2025 11:30 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.