RAGiFinance Challenge

Homepage
category
Question_sample
Leaderboard
Architecture

Inspiration

Creating quizzes and questions is a labor-intensive process that demands specialized expertise. By leveraging RAG, we plan to automate this task, enabling scalable quiz production. The project was born out of the need to provide focused, high-quality educational content for young students interested in financial literacy. We wanted to move away from the generic output of most language models and create something that taught real-world financial topics—borrowing, buying a car, credit, financial decisions, financial institutions, income, saving, and independent living—using authoritative and publicly available FDIC (Federal Deposit Insurance Corporation) learning materials as our foundation.

What it does

Our tool leverages a Retrieval-Augmented Generation (RAG) architecture coupled with a fine-tuned Llama base model to generate multiple-choice quiz questions. By parsing FDIC educational PDFs into markdown, we extract and focus on the key concepts and passages necessary for deep learning. The website, built using HTML, CSS, and Flask, not only delivers these quizzes but also includes a leaderboard feature to rank participants, encouraging competition and continuous learning.

How we built it

Data Extraction:
- We started by parsing FDIC PDFs into markdown format.
- Extracted critical passages and information related to various financial topics.
Model Fine-Tuning & RAG:
- Utilized a Retrieval-Augmented Generation approach to integrate the high-quality extracted data with the Llama base model.
- Fine-tuned the base model (Llama 7B parameter) with this specific data to generate accurate and tailored quiz questions.
Web Development:
- Developed a web application using HTML, CSS, and Flask to serve the generated quiz questions.
- Incorporated a leaderboard system to maintain engagement and track user performance.
Hyperparameter Optimization:
- Experimented with various hyperparameters to ensure the best performance of the fine-tuned model, ensuring high accuracy and quality in the generated quiz questions.

Challenges we ran into

Fine-Tuning with Niche Data:
Adapting a general-purpose language model to handle niche, specific content from FDIC materials proved challenging. Ensuring that the model understood and prioritized important financial concepts was a key hurdle.
Hyperparameter Tuning:
Identifying the optimal hyperparameter settings to balance model performance and data specificity required extensive experimentation and validation.
Accurate Data Extraction:
Converting complex PDF materials into a clean, usable markdown format without losing the integrity of critical information was both time-consuming and technically challenging.

Accomplishments that we're proud of

High-Quality Model Production:
We successfully fine-tuned a base model to generate educational content that is precise, reliable, and tailored to FDIC financial materials.
Scalability:
The model is adaptable to any educational PDF resource, paving the way for broad applications across various subjects beyond finance.
Engaging User Experience:
The integrated website and leaderboard system create an engaging and competitive environment for students, enhancing learning outcomes.

What we learned

Specialization vs. Generalization:
Tailoring a language model with niche and high-quality data can significantly improve its utility for specific educational purposes compared to generic LLM outputs.
Iterative Improvement:
Fine-tuning models and hyperparameter adjustments are critical components that require iterative refinement to achieve production-level accuracy and performance.
Interdisciplinary Approach:
Combining data science, machine learning, and full-stack web development can yield robust and scalable educational tools.

What's next for RAGiFinance Challenge

Expanding Topics:
We plan to extend the framework to cover more educational subjects, leveraging the adaptable nature of the model to create quizzes from various academic PDFs.
Enhanced Interactivity:
Future updates may include more interactive elements, such as personalized learning paths, detailed performance analytics, and adaptive questioning based on user progress.
Community and Feedback:
Integrating more user feedback will allow us to further refine the question-generation process and improve the overall learning experience.
Increased Accessibility:
Enhancing the user interface and supporting additional devices and platforms to reach a broader audience of students.

We are excited about the current achievements and look forward to building on this foundation to expand our impact in the educational technology space.

Built With

amazon-web-services
css
fine-tuning
flusk
html
javascript
llama
python
pytorch
rag
transformers

Updates

Kasra Ahmadi started this project — Apr 13, 2025 03:25 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.