Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

README.md

Comprehensive Python Application for Psychometric and Neurological Data Analysis

Table of Contents

1.	Introduction
2.	Project Overview
3.	System Architecture
4.	Implementation Steps
5.	Code Modules
•	Data Ingestion Module
•	Data Preprocessing Module
•	Feature Extraction Module
•	Correlation Analysis Module
•	Visualization Module
6.	Dependencies and Installation
7.	Running the Application
8.	Conclusion
9.	Appendices
•	Appendix A: Full Source Code
•	Appendix B: Dataset Format Examples

Introduction

This comprehensive guide details the development of a production-ready Python application designed to intake, analyze, and identify connections and correlations between various psychometric, vitals, and neurological imaging data. The application handles data types including:

•	Eye tracking data
•	Face heat map tracking
•	Heart rate & blood pressure readings
•	EEG data
•	Self-response questionnaire surveys

The goal is to uncover previously unidentified correlations and insights from the collected datasets.

Project Overview

The application performs the following key functions:

1.	Data Ingestion: Reads and validates data from various sources.
2.	Data Preprocessing: Cleans and formats data for analysis.
3.	Feature Extraction: Extracts meaningful features from raw data.
4.	Correlation Analysis: Identifies relationships between different data types.
5.	Visualization: Generates graphs and heatmaps to illustrate findings.

System Architecture

The system consists of modular components to ensure scalability and maintainability:

•	Input Layer: Handles data ingestion from multiple sources.
•	Processing Layer: Performs preprocessing and feature extraction.
•	Analysis Layer: Conducts statistical and machine learning analyses.
•	Output Layer: Generates visualizations and reports.

Implementation Steps

Step 1: Set Up the Development Environment

Install Python 3.8 or higher. Create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install required packages:

pip install numpy pandas scipy scikit-learn matplotlib seaborn mne

Step 2: Organize the Project Directory

Create the following directory structure:

project_root/
│
├── data/
│   ├── eye_tracking/
│   ├── face_heatmap/
│   ├── vitals/
│   ├── eeg/
│   └── surveys/
│
├── src/
│   ├── data_ingestion.py
│   ├── data_preprocessing.py
│   ├── feature_extraction.py
│   ├── correlation_analysis.py
│   └── visualization.py
│
├── main.py
└── requirements.txt

Step 3: Develop the Data Ingestion Module

•	Purpose: Load data from various formats (CSV, JSON, EDF for EEG).
•	Implement functions to read each data type.

Step 4: Develop the Data Preprocessing Module

•	Clean data (handle missing values, normalize).
•	Synchronize data based on timestamps.

Step 5: Develop the Feature Extraction Module

•	Extract features like fixation durations from eye tracking.
•	Compute statistical measures from EEG (e.g., band power).
•	Derive scores from survey responses.

Step 6: Develop the Correlation Analysis Module

•	Use statistical methods (Pearson, Spearman correlations).
•	Implement machine learning models if necessary.

Step 7: Develop the Visualization Module

•	Create plots to visualize data distributions.
•	Generate heatmaps for correlation matrices.

Step 8: Integrate Modules in main.py

•	Coordinate the workflow:
•	Ingest data.
•	Preprocess data.
•	Extract features.
•	Perform analysis.
•	Visualize results.

Step 9: Test and Debug

•	Run the application with sample data.
•	Use logging to track the application’s execution.
•	Debug any issues that arise.

Step 10: Document the Application

•	Write docstrings for all functions and classes.
•	Create a README file with instructions.

Code Modules

Below are the detailed implementations of each module.

Data Ingestion Module

File: src/data_ingestion.py

import pandas as pd
import mne  # For EEG data

def load_eye_tracking_data(path):
df = pd.read_csv(path)
return df

def load_face_heatmap_data(path):
df = pd.read_csv(path)
return df

def load_vitals_data(path):
df = pd.read_csv(path)
return df

def load_eeg_data(path):
raw = mne.io.read_raw_edf(path, preload=True)
return raw

def load_survey_data(path):
df = pd.read_csv(path)
return df

Data Preprocessing Module

File: src/data_preprocessing.py

import pandas as pd

def preprocess_eye_tracking(df):
df.fillna(method='ffill', inplace=True)
# Additional preprocessing steps
return df

def preprocess_eeg(raw):
raw.filter(1., 50.)  # Bandpass filter
# Additional preprocessing steps
return raw

def synchronize_data(dfs, on='timestamp'):
from functools import reduce
df_merged = reduce(lambda left, right: pd.merge(left, right, on=on), dfs)
return df_merged

Feature Extraction Module

File: src/feature_extraction.py

import numpy as np

def extract_eye_tracking_features(df):
    features = {}
    features['avg_fixation_duration'] = df['fixation_duration'].mean()
    # Additional features
    return features

    def extract_eeg_features(raw):
    epochs = mne.make_fixed_length_epochs(raw, duration=2.0)
    psd, freqs = mne.time_frequency.psd_welch(epochs)
    features = psd.mean(axis=0)
    return features

Correlation Analysis Module

File: src/correlation_analysis.py

import pandas as pd

def compute_correlations(features_df):
corr_matrix = features_df.corr(method='pearson')
return corr_matrix

def identify_significant_correlations(corr_matrix, threshold=0.5):
significant = corr_matrix[(corr_matrix > threshold) & (corr_matrix != 1.0)]
return significant

Visualization Module

File: src/visualization.py

import seaborn as sns
import matplotlib.pyplot as plt

def plot_correlation_heatmap(corr_matrix):
sns.heatmap(corr_matrix, annot=True, fmt=".2f")
plt.show()

def plot_feature_distributions(features_df):
features_df.hist(bins=15, figsize=(15, 10))
plt.show()

Dependencies and Installation

Requirements File: requirements.txt

numpy
pandas
scipy
scikit-learn
matplotlib
seaborn
mne

Install dependencies using:

pip install -r requirements.txt

Running the Application

1.	Prepare the Data
•	Place your datasets in the respective folders within the data/ directory.
•	Ensure all data files are properly formatted (see Appendix B).
2.	Configure main.py

File: main.py

from src.data_ingestion import *
from src.data_preprocessing import *
from src.feature_extraction import *
from src.correlation_analysis import *
from src.visualization import *

def main():
# Data Ingestion
eye_data = load_eye_tracking_data('data/eye_tracking/eye_data.csv')
eeg_data = load_eeg_data('data/eeg/eeg_data.edf')
survey_data = load_survey_data('data/surveys/survey_data.csv')

Data Preprocessing

eye_data = preprocess_eye_tracking(eye_data)
eeg_data = preprocess_eeg(eeg_data)

Feature Extraction

eye_features = extract_eye_tracking_features(eye_data)
eeg_features = extract_eeg_features(eeg_data)
survey_features = survey_data.mean().to_dict()

Combine Features

features_df = pd.DataFrame([eye_features, eeg_features, survey_features])

Correlation Analysis

corr_matrix = compute_correlations(features_df)
significant_corrs = identify_significant_correlations(corr_matrix)

Visualization

plot_correlation_heatmap(corr_matrix)
plot_feature_distributions(features_df)

if __name__ == '__main__':
main()
  1. Execute the Application

    python main.py
    

Conclusion

This application provides a robust framework for analyzing complex psychometric and neurological datasets. By following the outlined steps and utilizing the provided code, you can uncover valuable insights and correlations within your data.

Appendices
Appendix A: Full Source Code

The full source code is available in the src/ directory and can be accessed as per the structure defined earlier.

Appendix B: Dataset Format Examples

Eye Tracking Data (eye_data.csv)

timestamp	fixation_duration	saccade_amplitude	pupil_size
1609459200	0.2	1.5	3.1
1609459201	0.15	1.2	3.0

EEG Data (eeg_data.edf)

•	Standard EDF file with properly labeled channels.
•	Sampling rate: 256 Hz.

Survey Data (survey_data.csv)

question_id	response_score
1	4
2	3