Inspiration
- Hackathons generate a massive, fragmented graph of people, projects, tech stacks, and events. It’s hard to answer deceptively simple questions like “Who should I team with?” or “What ideas have traction?” without spending hours searching.
- We built HackerStats to make the hidden structure of hackathons visible: discover relationships, map communities, and surface novel ideas using graph analysis and modern NLP vectorization.
What it does
- Interactive graph of the hackathon ecosystem:
- Visualizes Hackers, Projects, and Hackathons as a connected network.
- Click any node to see rich details with deep links to Devpost profiles and project pages.
- Fuzzy search to instantly locate people/projects; matching nodes are highlighted.
- Idea Brainstorming:
- Type a prompt; we vectorize it and retrieve top similar projects from a precomputed corpus.
- Get quick signal on originality, saturation, and competitiveness with heuristic diagnostics.
- Smart subgraph expansion:
- Start from a single hacker and explore multi-hop relationships (contributors, shared events, related projects) with a depth/limit that ensures large, connected subgraphs without noise.
How we built it
- Webcrawling
- Multithreaded Selenium instances across a shared queue to engage in recursive BFS search by hacker, by related devpost.
- AWS EC2 remote server instances for dynamic load balancing
- Frontend (Next.js + D3):
- Custom D3 force-directed graph with zoom/drag, collision, layered hover labels, and highlight states.
- Node detail panel with auto-generated external links (Devpost user and software pages), social URL extraction, and metadata display.
- Serverless API routes act as a control plane for graph queries and NLP calls.
- Graph data (Neo4j):
- We ingest and normalize entities (Hacker, Devpost/Project, Hackathon).
- Variable-length Cypher queries collect expansive yet connected subgraphs rooted at a specific node, then densify edges among selected nodes to maintain coherence.
- Vectorization + NLP:
- Custom DevpostVectorizer combines multiple representation spaces:
- Sentence embeddings (transformers) for semantics.
- TF-IDF–derived signals for topical specificity.
- Domain and user-segmentation embeddings from curated ontologies.
- Tech-stack, awards, and team-composition features projected into fixed-length vectors.
- We concatenate these into a high-dimensional “combined” vector and compute cosine similarity for retrieval and clustering.
- Precomputed project vectors are stored on disk and loaded into memory for sub-100ms similarity queries.
- Python service (FastAPI):
- Exposes /api/brainstorm and /api/vectorizer endpoints for vectorization and retrieval.
- Caches model and dataset to keep latency low; returns clean JSON for the frontend.
- Infrastructure:
- Monorepo on Railway: separate services for frontend (Next.js) and backend (FastAPI), communicating via HTTP. Neo4j hosted with secure credentials.
- Environment-driven configuration; no filesystem coupling between services.
Challenges we ran into
- Schema design for a heterogeneous graph:
- Balancing expressiveness (rich relationships) with query performance and visual clarity was nontrivial.
- Vector fusion:
- Combining transformer embeddings with categorical/ontology-driven vectors required careful normalization to avoid any single subspace dominating cosine similarity.
- Subgraph expansion at scale:
- Unbounded variable-length graph queries balloon quickly. We had to implement guards, deduplication, and a two-phase approach (select nodes, then fetch all intra-node edges).
- Robust scraping/normalization:
- Profiles and project pages vary. We built conservative parsers to extract usernames, social URLs, and participation data.
Accomplishments that we’re proud of
- A genuinely useful, highly interactive graph experience with real-time search and crisp labels.
- A practical hybrid NLP pipeline that feels “intelligent” for ideation without relying solely on one model.
- Clear, production-friendly separation of concerns (frontend ↔ backend ↔ database) suitable for cloud deployment.
What we learned
- Graph UX is as much about hiding edges as showing them; clarity beats completeness.
- Hybrid vector spaces outperform a single embedding on sparse, structured domains like hackathon data.
- Serverless + dedicated Python inference services is a sweet spot for latency and iteration speed.
What’s next
- Temporal analysis: visualize career trajectories and project lineages over time.
- Community detection: identify clusters, bridges, and emerging idea neighborhoods.
- Advanced retrieval: re-ranking with cross-encoders, few-shot concept search, and semantic filters.
- Contributor reputation and influence scores powered by graph centrality and outcome signals.
Tech stack
- Frontend: Next.js, TypeScript, D3, Framer Motion
- Backend: FastAPI (Python), sentence-transformers, NumPy, scikit-learn
- Database: Neo4j
- Infra: Railway (monorepo, multi-service), environment-based config
- Integrations: Devpost deep links; social URLs parsing
Try it
- Explore the Graph to discover relationships and projects.
- Use Brainstorm to validate and refine your next hackathon idea with similarity search and quick diagnostics.
Built With
- amazon-web-services
- neo4j
- nextjs
- python
- selenium
- tailwind

Log in or sign up for Devpost to join the conversation.