Project Title: Semantify
Overview
Semantify is an intelligent file organizer and hierarchical vector database designed to bring order to unstructured text documents. Users can upload disorganized files, which Semantify then categorizes based on their semantic content.
Through an interactive web interface, users can visualize their data as a vector database. To streamline document retrieval, the platform features a Retrieval-Augmented Generation (RAG) chat agent, which not only answers queries but also guides users to relevant document clusters within the visualized database.
Inspiration
Finding relevant documents in large, unstructured datasets—such as legal document reviews or corporate archives—can be time-consuming and costly. Many users, like Rickey (who keeps all his files on his desktop), struggle with file organization, making search and retrieval inefficient.
Manual sorting is impractical for large-scale datasets, especially in legal proceedings where document dumps can span thousands of files. Semantify addresses this challenge by automatically structuring files into meaningful directories and providing an advanced semantic search mechanism.
Key Features
Semantic File Organization
- Automatically clusters and categorizes files based on their content.
- Uses hierarchical clustering and topic modeling to create structured, easy-to-navigate directories.
- Supports recursive clustering to handle large and complex document collections.
- Automatically clusters and categorizes files based on their content.
Visual Vector Database
- Displays document embeddings in a 2D space using UMAP for an intuitive, interactive visualization.
- Color-codes documents by cluster labels, making it easy to explore semantic relationships.
- Displays document embeddings in a 2D space using UMAP for an intuitive, interactive visualization.
RAG-Powered Chat Assistant
- Allows users to query documents using natural language.
- Retrieves the most relevant document sections and generates responses using a language model (e.g., DeepSeek).
- Highlights source files in the vector database to ensure transparency and traceability.
- Allows users to query documents using natural language.
How We Built It
Semantify integrates state-of-the-art NLP models and frameworks, including:
- Sentence Transformers – For generating high-quality document embeddings.
- UMAP/t-SNE – For dimensionality reduction and visualization.
- KeyBERT – For extracting keywords and generating topic labels.
- Agglomerative Clustering / HDBSCAN – For hierarchical document organization.
- FastAPI – For building a scalable backend API.
- React – For an interactive and user-friendly frontend.
- Ollama / DeepSeek – For AI-powered document retrieval and response generation.
What’s Next for Semantify?
- Enhanced User Interface – Improving the UX with in-browser document previews and highlighting relevant sections after a semantic search.
- Cloud Integration – Allowing users to organize and search documents stored on platforms like Google Drive or Dropbox.
- Collaboration Features – Supporting multiple users to collaboratively organize and query shared document collections.
Try It Out
You can try Semantify by cloning the repository and following the setup instructions in the README. Start organizing, visualizing, and querying your documents today!
📂 GitHub Repository: Semantify Repo
🎥 Demo Video: Watch Here
Log in or sign up for Devpost to join the conversation.