Skip to content

Latest commit

 

History

History
39 lines (27 loc) · 1.1 KB

File metadata and controls

39 lines (27 loc) · 1.1 KB

DOCK_BYTE

The DOCK_BYTE module provides tools for extracting text from PDF and TXT documents and enables interactive chat-based exploration of the extracted content using a language model. It leverages various libraries for document processing and integrates with Streamlit for a GUI-based interface.

Features

  • Extract text from PDF documents using PyMuPDF.
  • Perform OCR on PDF documents using Tesseract.
  • Extract text from TXT files.
  • Use a language model to chat with the content of the documents.
  • GUI support with Streamlit for interactive usage.

Installation

pip install DOCK_BYTE

Usage

from dock_byte import chat_with_doc

chat_with_doc("gemma:2b", "data.txt", use_gui=True)

Runing

streamlit CODE_FILE.py 

License

This project is licensed under the MIT License - see the LICENSE file for details.

Repository

For more information and to contribute, please visit the GitHub repository.

Demo

Watch the video