This repository contains a Python script for processing and uploading research project listings to the CourseConnect Research page using Firebase Firestore. The tool parses JSON data containing research project information, formats it correctly, and optionally uploads it to the Firebase database.
The JSON data used in the project is created from the uf-research-labs-data repository. If more data is needed in the future please navigate there.
The tool performs the following functions:
- Reads research project data from JSON files in the
datadirectory - Processes faculty mentor and PhD student mentor information into the required format
- Creates a standardized data structure for each research listing
- Saves the processed data to a timestamped JSON file in the
outputdirectory - Optionally uploads the processed data to Firebase Firestore
- Python 3.6+
- Firebase project with Firestore database
- Firebase service account credentials
-
Clone the repository:
git clone https://github.com/yourusername/cc-research-script.git cd cc-research-script -
Create and activate a virtual environment:
# Using venv python -m venv env # On Windows env\Scripts\activate # On macOS/Linux source env/bin/activate -
Install the required packages:
pip install -r requirements.txt
-
Create a Firebase project at https://console.firebase.google.com/
-
Set up Firestore database in your Firebase project
-
Generate a private key for your service account:
- Go to Project Settings > Service Accounts
- Click "Generate New Private Key"
- Save the JSON file as
serviceAccount.jsonin the root directory of this project
-
Create a
.envfile in the project root with the following Firebase configuration:NEXT_PUBLIC_FIREBASE_API_KEY="your_api_key" NEXT_PUBLIC_FIREBASE_AUTH_DOMAIN="your_project_id.firebaseapp.com" NEXT_PUBLIC_FIREBASE_PROJECT_ID="your_project_id" NEXT_PUBLIC_FIREBASE_STORAGE_BUCKET="your_project_id.appspot.com" NEXT_PUBLIC_FIREBASE_MESSAGING_SENDER_ID="your_messaging_sender_id" NEXT_PUBLIC_FIREBASE_APP_ID="your_app_id" FIREBASE_COLLECTION_NAME="data_destination_collection_name"
Place your JSON files containing research project data in the data directory. The expected format should match the structure in the provided data/all_projects.json file.
-
Run the script:
python script.py -
The script will:
- Process all JSON files in the
datadirectory - Save the processed data to the
outputdirectory with a timestamp - Ask if you want to upload the data to Firebase
- Process all JSON files in the
-
If you choose to upload to Firebase, the script will:
- Connect to Firebase using your service account credentials
- Upload each research listing to the collection specified in the .env
- Create an empty "applications" subcollection for each listing
The script expects input JSON files to have the following structure:
[
{
"project_title": "Example Research Project",
"department": "Computer and Information Sciences and Engineering",
"faculty_mentor": "Faculty Name, [email protected]",
"terms_available": "Fall, Spring, Summer",
"student_level": "Junior, Senior",
"prerequisites": "Programming experience, coursework in related field",
"credit": "0-3 credits via EGN 4912",
"stipend": "None unless selected for University Scholars",
"application_requirements": "Resume, transcript, faculty interview",
"application_deadline": "Rolling basis",
"website": "https://example.com",
"project_description": "Description of the research project",
"ph.d._student_mentor(s)": "PhD Student, [email protected]"
}
]The script transforms the input JSON data for Firebase Firestore storage:
-
What's Being Parsed:
faculty_mentor: Parses name and email from strings like "Faculty Name, [email protected]"ph.d._student_mentor(s): Similarly extracts name and email information
-
How Data is Restructured: From this raw input format:
{ "project_title": "Example Research Project", "faculty_mentor": "Faculty Name, [email protected]", "ph.d._student_mentor(s)": "PhD Student, [email protected]" // ... other fields }To a Firestore-ready format:
{ "data": { "project_title": "Example Research Project", "department": "Computer and Information Sciences and Engineering", "faculty_mentor": { "[email protected]": "Lisa Anthony" }, "phd_student_mentor": { "info": "TBD based on project and availability" }, // ... other fields preserved } } -
Benefits of This Structure:
- Makes querying more efficient in Firestore
- Separates data components for better display and filtering
- Includes required metadata for the CourseConnect Research platform
- Enables proper functioning of the application system for each research listing
- Service Account Keys: Never commit your
serviceAccount.jsonfile to version control. It contains sensitive credentials. - Firebase Security: Ensure your Firestore database has appropriate security rules to protect the data.
The processed data will be saved to the output directory with a filename like all_processed_data_YYYYMMDD_HHMMSS.json. This file contains the transformed data ready for upload to Firebase.
When uploaded to Firebase, each research listing will be added to the collection that you specify in the .env in the FIREBASE_COLLECTION_NAME section. For CourseConnect we use the "research-listings" collection with an auto-generated document ID, and an empty "applications" subcollection will be created for each listing.