Build software better, together

isaacus-dev / semchunk

Star

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

python nlp text splitting chunking text-chunking text-splitting semantic-chunking isaacus

Updated Mar 23, 2026
Python

mirth / chonky

Star

Fully neural approach for text chunking

ai ml chunking rag text-splitter llms semantic-chunking

Updated Oct 23, 2025
Python

jparkerweb / semantic-chunking

Star

🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows

vector embeddings chunking text-splitter llm text-chunking text-splitting semantic-chunking

Updated Apr 17, 2026
JavaScript

mburaksayici / RAG-Boilerplate

Star

RAG boilerplate with semantic/propositional chunking, hybrid search (BM25 + dense), LLM reranking, query enhancement agents, CrewAI orchestration, Qdrant vector search, Redis/Mongo sessioning, Celery ingestion pipeline, Gradio UI, and an evaluation suite (Hit-Rate, MRR, hybrid configs).

ai-agents reranking rag vector-database hybrid-search qdrant llm retrieval-augmented-generation rag-evaluation semantic-chunking crewai rag-pipeline propositional-models query-enhancement

Updated Nov 18, 2025
Python

zircote / rlm-rs

Star

Rust CLI implementing the Recursive Language Model (RLM) pattern for Claude Code. Process documents 100x larger than context windows through intelligent chunking, SQLite persistence, and recursive sub-LLM orchestration.

Updated Apr 20, 2026
Rust

prajwal10001 / semantic-chunker-langchain

Star

Token-aware, LangChain-compatible semantic chunker with PDF, markdown, and layout support

python nlp markdown pdf ai rag langchain semantic-chunking

Updated Jun 28, 2025
Python

jparkerweb / llm-distillery

Star

🍶 llm-distillery ⇢ use LLMs to run map-reduce summarization tasks on large documents until a target token size is met.

text-summarization text-processing tokenization text-compression token-management openai-api llm large-language-model semantic-chunking text-distillation ai-text-reduction

Updated Apr 17, 2026
JavaScript

ThanhHung2112 / Semantic_chunking

Star

Semantic Chunking is a Python library for segmenting text into meaningful chunks using embeddings from Sentence Transformers.

nlp text vector chunking rag text-split vector-database semantic-chunking

Updated Dec 15, 2024
Python

bazilicum / axonode-chunker

Sponsor

Star

Advanced semantic text chunking with custom structural markers, whole-text coherence preservation, and flexible token management. Features async processing, LangChain integration, and dynamic drift detection. Ideal for RAG systems, augmented text processing, and domain-specific document analysis.

lang rag test-split langchain semantic-chunking text-spl

Updated Aug 10, 2025
Python

wigtn / wigtnOCR-v1

Star

A research framework tA research framework to evaluate how document parsing quality determines downstream RAG performance.o evaluate how document parsing quality de

benchmark ocr evaluation rag document-parsing semantic-chunking

Updated Apr 3, 2026

darkzard05 / rag-system-ollama

Star

Advanced local-first RAG system powered by Ollama and LangGraph. Optimized for high-performance sLLM orchestration featuring adaptive intent routing, semantic chunking, intelligent hybrid search (FAISS + BM25), and real-time thought streaming. Includes integrated PDF analysis and secure vector caching.

python nlp semantic-search reranking faiss rag fastapi streamlit vector-database hybrid-search langchain pdf-chat local-ai ollama semantic-chunking langgraph ai-orchestration sllm thought-streaming

Updated Mar 21, 2026
Python

njyeung / go-semantic-chunking

Star

Sementic chunking algorithm in (mostly) Go

vector embeddings chunking semantic-segmentation text-splitter text-chunking semantic-chunking retreival-augmented-generation

Updated Feb 6, 2026
Go

v1jaysundaram / rag-with-langgraph

Star

A hands-on guide to RAG techniques using LangGraph.

mmr hyde hype reranking rag semantic-chunking contextual-compression context-window-enhancement sub-query-decomposition contextual-chunk-headers

Updated Apr 20, 2026
Jupyter Notebook

IanSousa04 / treechunk

Star

treechunk é uma biblioteca TypeScript para segmentação semântica de código JS/TS baseada em AST. Extrai funções, classes, métodos e exports em blocos coerentes com preservação de contexto, otimizada para RAG, embeddings, busca de código e análise de repositórios em larga escala.

javascript nlp open-source typescript tooling static-analysis code-analysis ast embeddings code-search developer-tools program-analysis source-code code-chunking rag repository-analysis llm semantic-chunking

Updated Mar 5, 2026
TypeScript

smart-models / Progressive-Summarizer-RAPTOR

Star

Cutting-edge semantic text processing system that uses hierarchical clustering and advanced language models to automatically organize and summarize large volumes of text.

docker rest-api gpu-acceleration raptor hierarchical-clustering rag llm semantic-chunking ollama-integration progressive-summarization

Updated Mar 15, 2026
Python

hoangtung386 / semantic-qdrant-pipeline

Star

A modular RAG pipeline for automated document processing using Semantic Chunking and Qdrant Vector Database.

python nlp rag vector-database qdrant semantic-chunking uv-package

Updated Apr 18, 2026
Python

url4irl / vectors-gateway

Star

A Sidecar service for applications that need vector database functionality to augment their LLMs. This service provides embeddings and retrieval capabilities by abstracting embeddings generation (LiteLLM) and vector storage and search (Qdrant).

embeddings vectors sidecar rag qdrant litellm semantic-chunking

Updated Dec 8, 2025
TypeScript

ProfEngel / OpenTuneWeaver_AIO

Sponsor

Star

All in One-Solution for converting documents to finetune LLMs

benchmarking ai lora dataset-generation quantization all-in-one gradio model-deployment finetuning pdf-processing qa-generation personal-ai llm vllm qlora gguf semantic-chunking educational-ai opentuneweaver

Updated Sep 26, 2025
Python

anujmumbaikar / Optimizing-RAG-Pipeline-Hybrid-Search-RRF-Fusion-Re-Ranking

Star

A high-performance Retrieval-Augmented Generation pipeline for technical Q&A workloads. Combines hybrid retrieval (dense + BM25), query expansion, Reciprocal Rank Fusion (RRF), and cross-encoder re-ranking to improve retrieval precision and answer grounding. Evaluated with Ragas, showing measurable gains in context recall and faithfulness.

hybrid-search cross-encoder semantic-chunking reciprocal-rank-fusion ragas-evaluation

Updated Apr 7, 2026
Jupyter Notebook

IcHiGo-KuRoSaKiI / Chomper

Star

Chomper - Chomp through any document. MCP server for parsing 36+ file formats with semantic chunking & TOON token optimization for Claude and AI systems.

python mcp xlsx text-extraction epub docx pdf-parser claude document-parser rag ai-tools llm anthropic semantic-chunking model-context-protocol mcp-server

Updated Jan 23, 2026
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

semantic-chunking

Here are 34 public repositories matching this topic...

isaacus-dev / semchunk

mirth / chonky

jparkerweb / semantic-chunking

mburaksayici / RAG-Boilerplate

zircote / rlm-rs

prajwal10001 / semantic-chunker-langchain

jparkerweb / llm-distillery

ThanhHung2112 / Semantic_chunking

bazilicum / axonode-chunker

wigtn / wigtnOCR-v1

darkzard05 / rag-system-ollama

njyeung / go-semantic-chunking

v1jaysundaram / rag-with-langgraph

IanSousa04 / treechunk

smart-models / Progressive-Summarizer-RAPTOR

hoangtung386 / semantic-qdrant-pipeline

url4irl / vectors-gateway

ProfEngel / OpenTuneWeaver_AIO

anujmumbaikar / Optimizing-RAG-Pipeline-Hybrid-Search-RRF-Fusion-Re-Ranking

IcHiGo-KuRoSaKiI / Chomper

Improve this page

Add this topic to your repo