Honours 2025 - Heidi Leow

This repository contains the code and QA dataset associated with the dissertation paper: "Keep Facts as Facts: Hybrid Graph-Based RAG for Semi-Structured FMEA Spreadsheets".

Dataset

The locations for the dataset and QA set are located under the data directory:

The dataset subdirectory includes the file fmea_dataset_filled.csv, which is the main data used for loading and experimentation.
The questions subdirectory includes the file fmea_qa_model.xlsx, which includes the evaluation questions, model answers, and associated operation type categories and nuggets.

Loading

To run the code in this repository, follow the instructions in this section to set up packages and configurations for external systems.

Requirements

To install required packages for Python, run inside your virtual environment:

pip install -r requirements.txt

Configuration for External Systems

A configuration file with the name .env should be set up with your details. A template version is located under .env template. Duplicate and rename this file, and fill in the values.

An instance of Neo4j should be set up and running on your system, as a new project. Change the details for the correct URI, and the username and password set to access this project.

An OpenAI key is also needed to access GPT models. Replace the placeholder with your key value.

Loading Data

Once systems have been set up, the load.py file is set up as a pseudo-command-line interface. Calling it with the appropriate arguments will set up related structures according to the paper.

First, run:

cd src
python3 load.py property_text skb

Then, to fully set up a structure in (property_text, concept_text, row_text, row_all), run:

python3 load.py [structure] chroma
python3 load.py [structure] neo4j # except row_all

The relevant graph structure will be set up and ready to query in other code/ the Streamlit interface set up below.

Evaluation

Structures that have been set up are accessible to evaluation code. The evaluate.py file is set up in a similar way to load.py.

To automatically extract nuggets from as set of model answers (already done in model answer Excel files):

cd src
python3 evaluate.py property_descriptive nugget

Then, to run do a full experiment run-through for a strategy defined in src/scopes/__init__.py use:

python3 evaluate.py [strategy] rag

The RAG responses will be located under the evaluation/experiment_runs directory. To use LLM-as-a-Judge run:

python3 evaluate.py [strategy] eval

Alternatively, the same metric calculation after manual marking can be done with:

python3 evaluate.py [strategy] metric

The evaluations are appended to the same files in the directory.

Interface

The work of this project involves a Streamlit interface that allows easy access to the configured RAG strategies and vector search collections.

To use the interface, run the following commands:

cd src
python3 -m streamlit run app/streamlit_app.py

This will automatically open your browser to the app's local address. Alternatively, to run/refresh the interface without starting a new browser tab use:

cd src
python3 -m streamlit run app/streamlit_app.py --server.headless true

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
data		data
src		src
.env template		.env template
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Honours 2025 - Heidi Leow

Dataset

Loading

Requirements

Configuration for External Systems

Loading Data

Evaluation

Interface

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Honours 2025 - Heidi Leow

Dataset

Loading

Requirements

Configuration for External Systems

Loading Data

Evaluation

Interface

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages