This repository contains the code and QA dataset associated with the dissertation paper: "Keep Facts as Facts: Hybrid Graph-Based RAG for Semi-Structured FMEA Spreadsheets".
The locations for the dataset and QA set are located under the data directory:
- The
datasetsubdirectory includes the filefmea_dataset_filled.csv, which is the main data used for loading and experimentation. - The
questionssubdirectory includes the filefmea_qa_model.xlsx, which includes the evaluation questions, model answers, and associated operation type categories and nuggets.
To run the code in this repository, follow the instructions in this section to set up packages and configurations for external systems.
To install required packages for Python, run inside your virtual environment:
pip install -r requirements.txtA configuration file with the name .env should be set up with your details. A template version is located under .env template. Duplicate and rename this file, and fill in the values.
An instance of Neo4j should be set up and running on your system, as a new project. Change the details for the correct URI, and the username and password set to access this project.
An OpenAI key is also needed to access GPT models. Replace the placeholder with your key value.
Once systems have been set up, the load.py file is set up as a pseudo-command-line interface. Calling it with the appropriate arguments will set up related structures according to the paper.
First, run:
cd src
python3 load.py property_text skbThen, to fully set up a structure in (property_text, concept_text, row_text, row_all), run:
python3 load.py [structure] chroma
python3 load.py [structure] neo4j # except row_allThe relevant graph structure will be set up and ready to query in other code/ the Streamlit interface set up below.
Structures that have been set up are accessible to evaluation code. The evaluate.py file is set up in a similar way to load.py.
To automatically extract nuggets from as set of model answers (already done in model answer Excel files):
cd src
python3 evaluate.py property_descriptive nuggetThen, to run do a full experiment run-through for a strategy defined in src/scopes/__init__.py use:
python3 evaluate.py [strategy] ragThe RAG responses will be located under the evaluation/experiment_runs directory. To use LLM-as-a-Judge run:
python3 evaluate.py [strategy] evalAlternatively, the same metric calculation after manual marking can be done with:
python3 evaluate.py [strategy] metricThe evaluations are appended to the same files in the directory.
The work of this project involves a Streamlit interface that allows easy access to the configured RAG strategies and vector search collections.
To use the interface, run the following commands:
cd src
python3 -m streamlit run app/streamlit_app.pyThis will automatically open your browser to the app's local address. Alternatively, to run/refresh the interface without starting a new browser tab use:
cd src
python3 -m streamlit run app/streamlit_app.py --server.headless true