This project is an AI agent that leverages the capabilities of large language models to extract answers from large PDF documents and post the results on Slack. It is built using OpenAI's language models and other technologies such as FastAPI, LangChain, and ChromaDB for efficient processing and retrieval.
- PDF Text Extraction: Uses Langchain PyPDFLoader to extract and chunk text from PDF documents.
- Question Answering: Utilizes OpenAI's LLM to answer questions based on the extracted text.
- Parallel Processing: Employs concurrent futures for efficient, parallel processing of multiple LLM queries for faster response times.
- Custom Chain Logic: Implements custom logic for handling multi-step query processing and retrieval with tools and agent framework.
- Microservice Architecture: This promotes modularity and maintainability by allowing individual components to be developed, tested, and deployed independently.
- Dependency Injection: This facilitates the decoupling of services, making the system more flexible and easier to scale, as components can be swapped or updated without extensive reconfiguration.
- Slack Integration: Json answers are generated which will be later posted as answers directly to a specified Slack channel. (Future work)
-
Clone the Repository:
git clone https://github.com/charu01smita28/pdf-chat-agent.git cd pdf-chat-agent -
Create and Activate Virtual Environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install Dependencies:
pip install -r requirements.txt
-
Set Up Environment Variables: Update the .env file in the root directory as per your installation or use the one in project.
-
Build and Run Docker Containers:
docker-compose up -d
-
Running the Application: You can run the FastAPI Application:
uvicorn app:app --reload
-
Access the Application: Open your browser and go to http://localhost:8000.
-
Swagger Endpoints: To try the endpoints, go to http://localhost:8000/docs
- POST /process-and-query/: This endpoint takes in a list of questions and the PDF document on which you need to chat with.
├── app.py # Main application entry point
├── dependencies.py # Dependency Injection
├── agents/
│ ├── model/
│ │ ├── question.py # Question model (for the input list of questions)
│ │ ├── answer.py # Answer model (for the answer to be returned)
│ ├── api/
│ │ └── endpoints.py # FastAPI endpoints for the app
│ ├── services/
│ │ ├── agent_service.py # AI Agent Logic
│ │ ├── slack_service.py # Slack integration
│ │ ├── pdf_loader.py # PDF loading and text extraction
│ │ ├── embedding_service.py # Model and vector embeddings usage
│ │ └── chromadb_service.py # Vector database interactions
| ├── utils.py # API Cost calculation function and more
| ├── frontend/ # Folder for adding front end components
├── tests/ # Unit and integration tests
│ └── test_chat_service.py # Test cases for PDF Chat service
├── Dockerfile # Dockerfile for the application
├── docker-compose.yml # Docker Compose configuration
├── logger.py # Custom Logger for logging
├── .env # Environment variables
├── .gitignore # Git ignore file
├── README.md # Project readme file
└── requirements.txt # Python dependencies
- Upload a PDF file and submit your questions via the provided API endpoint.
- The application processes the PDF, answers the questions using the AI model, and posts the results as json at the moment but future improvement to Slack.
Run unit and integration tests to verify functionality:
```sh
pytest tests/
- This project demonstrates the integration of advanced AI capabilities with real-world applications by leveraging OpenAI's language models to process and extract information from PDF documents.
- With features such as parallel processing and usage of LangChain agents, this solution provides an efficient and scalable way to automate question-answering tasks which can be further enhanced and easily appended with more tools (like Google Search or Wikipedia Search).
- The project also follows a microservice architecture with dependency injection, which promotes modularity and maintainability by allowing individual components to be developed, tested, and deployed independently. This approach enables seamless integration of various services, ensuring that each service can evolve without impacting others. Dependency injection facilitates the decoupling of services, making the system more flexible and easier to scale, as components can be swapped or updated without extensive reconfiguration.
-
How to make solution more accurate ?
-
Fine-Tuning the Language Model:
- Using domain-specific datasets to fine-tune the language model to better understand and answer questions related to the PDF.
-
Better Prompt Engineering:
- Crafting precise and contextually rich prompts to elicit more accurate responses from the language model.
- Experiment with different prompt formats and templates to see which yields the best results.
-
Larger Context Window:
- Utilizing language models with larger context windows to process and understand more extensive sections of the PDF at once, thereby providing more coherent and contextually informed responses.
-
-
Enhancing Modularity, Scalability, and Production Readiness
- Already used a microservice architecture for handling tasks like PDF processing, question answering, and Slack integration. This allows for independent scaling and deployment of each component.
- Already used dependency injection frameworks to manage and configure dependencies more cleanly, allowing for easier testing and flexibility in changing implementations.
- Already used parallel processing by using thread pools or asynchronous frameworks to handle multiple requests concurrently without blocking. This can be made better.
- We can utilize serverless computing (e.g., AWS Lambda) for functions that can benefit from event-driven scalability.
- We need to Implement a full suite of unit, integration, and performance tests to ensure reliability and to catch issues early in the development process.
- Robust Logging and Monitoring There is also an important need of Incorporate logging for critical operations and errors to aid in debugging and understanding system behavior. We can use MLFLow and DagsHub for monitoring and logging and also checking the trace of Large Language Models.
