Design Document

By YOUR NAME HERE

Scope

What is the purpose of your database?

The purpose of this database is to provide a structured, reliable foundation for analyzing clinical and hospital-level data. It enables research on patient demographics, diagnoses, treatments, medications, surgical procedures, ICU stays, laboratory tests, and outcomes. The database is designed to support evidence-based decision making, quality improvement, and predictive modeling in healthcare.

Which people, places, things, etc. are you including in the scope of your database?

People: Patients admitted to the hospital, their encounters (hospital admissions, ICU stays), and the clinical staff involved in their care.

Places: Hospital wards, intensive care units, and other inpatient settings where treatment and monitoring occurred.

Things:

Clinical events such as diagnoses, prescriptions, surgical procedures, input/output records, and microbiology results.

Administrative details like admissions, discharges, transfers.

Measurements of patient physiology, including vital signs, lab results, and fluid balance data.

Caregiver identifiers (used only to track the source of clinical input, not personal caregiver information).

Which people, places, things, etc. are outside the scope of your database?

People: Outpatient visits, primary care encounters, and follow-up care outside the hospital. Family members, insurers, and non-clinical staff are not represented.

Places: Community health centers, clinics, and any healthcare services provided outside the admitting hospital system.

Things:

Non-clinical aspects such as billing, financial transactions, and patient satisfaction surveys.

Personal identifiers (names, addresses, contact information) are excluded to maintain patient privacy.

Broader social determinants of health (income, employment, housing) not captured in hospital records.

Functional Requirements

What should a user be able to do with your database?

Perform descriptive analysis: Summarize patient demographics, admission trends, ICU utilization, and clinical outcomes.

Investigate clinical relationships: Study how diagnoses, treatments, and interventions (e.g., drugs, fluids, surgeries) affect patient outcomes such as length of stay or mortality.

Evaluate resource utilization: Analyze prescription patterns, procedure frequency, and ICU resource allocation.

Support predictive modeling: Provide structured inputs for machine learning models that predict patient risk, treatment effectiveness, or mortality.

Conduct quality and safety monitoring: Detect variations in care, adverse outcomes, or unusually high-risk treatment patterns across units or patient groups.

What's beyond the scope of what a user should be able to do with your database?

Real-time clinical decision-making: The database is retrospective and cannot replace active hospital information systems for live patient care.

Non-hospital care: Outpatient, rehabilitation, and home care records are not captured.

Personalized identity tracking: The dataset is de-identified; users cannot re-identify or track individual patients or providers.

Patient communication: The system is not intended to provide direct interaction with patients (e.g., appointment booking, messaging).

Representation

Entities

Which entities will you choose to represent in your database?

Patients – core demographic and vital status information.

Admissions – hospital visits, admission/discharge dates, outcomes.

ICU Stays – detailed ICU-level records for critical care episodes.

Diagnoses – coded conditions (ICD-9) assigned during admissions.

Procedures & Surgeries – operations and interventions performed.

Prescriptions & Input Events – drugs and fluids administered.

Output Events – fluid output and related measurements.

Microbiology Events – lab test results, cultures, and pathogens.

Caregivers / Services – unit, department, or staff providing care.

What attributes will those entities have?

Patients: subject_id, gender, date of birth, date of death, expire_flag.

Admissions: hadm_id, subject_id, admission time, discharge time, admission type, hospital_expire_flag.

ICU Stays: icustay_id, hadm_id, careunit, length of stay, first/last careunit.

Diagnoses: hadm_id, icd9_code, description.

Procedures: hadm_id, icd9_code, procedure_time, description.

Prescriptions / Input Events: drug name, dose, start/end time, route, order category, rate/amount.

Output Events: itemid, value, valueuom, storetime.

Microbiology Events: chartdate, spec_type_desc, org_name, antibiotic, interpretation.

Why did you choose the types you did?

Primary/foreign keys (e.g., subject_id, hadm_id, icustay_id) – to maintain relational integrity and allow cross-table analysis.

Timestamps (TEXT/DATE) – to enable duration and trend analysis.

Categorical text fields (gender, service, careunit, drug name) – to support grouping and classification.

Numerical values (REAL/INTEGER) – for quantitative analysis of dosages, lab results, length of stay, and outcomes.

Why did you choose the constraints you did?

NOT NULL on key identifiers – ensures every record can be uniquely traced.

Foreign key constraints – guarantee referential integrity (e.g., every ICU stay links back to an admission, and every admission links to a patient).

CHECK constraints (e.g., age > 0, dose ≥ 0) – prevent invalid or nonsensical values.

Unique constraints where natural keys exist (e.g., row_id for event logs) – to avoid duplicate events.

Domain-specific constraints (e.g., gender limited to ‘M’, ‘F’, or unknown) – to standardize categories for reliable analysis.

Relationships

## Optimizations

Which optimizations (e.g., indexes, views) did you create? Why?

Created B-Tree indexes on high-usage foreign keys such as subject_id, hadm_id, and icustay_id across tables like admissions, icustays, prescriptions, and inputevents_mv.

Why: These keys are frequently used for joins and filters (e.g., finding all ICU stays for a patient or all prescriptions within an admission). Indexes reduce query time significantly.

Added indexes on timestamp columns such as admittime, intime, and charttime.

Why: Many analyses (e.g., length of stay, mortality trends, treatment timelines) involve filtering or ordering by time. Indexed dates improve performance of such range queries.

Built indexes on categorical fields like careunit, drug_name_generic, and icd9_code.

Why: These attributes are often grouped or aggregated (e.g., mortality by ICU unit, average dose by drug), so indexes speed up grouping operations.

Patient Summary View – combines patients and admissions to calculate age, lifespan, and length of hospital stay. Why: Eliminates repeated joins and ensures analysts can start with clean demographic + outcome info.

Limitations

In this section you should answer the following questions:

What are the limitations of your design?

1. Outcome limitations

Outcomes are mostly limited to mortality and length of stay.

Functional recovery, long-term health outcomes, and quality of life are not represented.

2. Ambiguity in coding

Diagnosis and procedure codes (ICD9, CPT) depend on coding practices.

This can introduce bias or noise if used for clinical inference without validation.

3. Data scope

The database represents a single hospital system.

Analyses may not generalize to other hospitals or populations.

4. De-identification impact

Dates are shifted to protect patient privacy.This limits calendar-based analyses (e.g., seasonal trends, exact cohort years).

What might your database not be able to represent very well?

Post-discharge outcomes – readmissions outside the hospital, outpatient care, or rehabilitation are missing.

Genomic/biological data – no lab genomics, imaging, or molecular-level detail beyond routine labs and microbiology.

Causal relationships – the database is observational and retrospective; it cannot establish cause-effect without external modeling.

Data Disclaimer

This project uses de-identified MIMIC-III Clinical Database Demo v1.4 data for educational and demonstration purposes only.
The original data is not redistributed. All queries and scripts were run on a local copy of the dataset.

The MIMIC-III demo is provided under the MIMIC-III License and is available at PhysioNet.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
Ralationship-Diagram.png		Ralationship-Diagram.png
mimic_demo.db		mimic_demo.db
queries.sql		queries.sql
schema.sql		schema.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Design Document

Scope

What is the purpose of your database?

Which people, places, things, etc. are you including in the scope of your database?

Which people, places, things, etc. are outside the scope of your database?

Functional Requirements

What should a user be able to do with your database?

What's beyond the scope of what a user should be able to do with your database?

Representation

Entities

Which entities will you choose to represent in your database?

What attributes will those entities have?

Why did you choose the types you did?

Why did you choose the constraints you did?

Relationships

Which optimizations (e.g., indexes, views) did you create? Why?

Limitations

What are the limitations of your design?

1. Outcome limitations

2. Ambiguity in coding

3. Data scope

4. De-identification impact

Data Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Design Document

Scope

What is the purpose of your database?

Which people, places, things, etc. are you including in the scope of your database?

Which people, places, things, etc. are outside the scope of your database?

Functional Requirements

What should a user be able to do with your database?

What's beyond the scope of what a user should be able to do with your database?

Representation

Entities

Which entities will you choose to represent in your database?

What attributes will those entities have?

Why did you choose the types you did?

Why did you choose the constraints you did?

Relationships

Which optimizations (e.g., indexes, views) did you create? Why?

Limitations

What are the limitations of your design?

1. Outcome limitations

2. Ambiguity in coding

3. Data scope

4. De-identification impact

Data Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages