A comprehensive system for integrating data from all major dinosaur archives and databases into one unified, cohesive database.
This project provides a complete solution for combining data from multiple dinosaur databases into a single, standardized format. It includes:
- Unified Schema: Comprehensive data model accommodating all major data sources
- Data Adapters: Source-specific adapters for seamless integration
- Deduplication Engine: Intelligent merging of duplicate records
- Query Interface: Command-line tools for searching and analyzing data
- Export/Import: JSON-based data exchange
- The Paleobiology Database (PBDB) - Global fossil occurrence data
- American Museum of Natural History (AMNH) - Museum collection records
- DinoData - Comprehensive dinosaur information
- Natural History Museum London - Dino Directory
- DinoAnimals Complete Database - Complete genus and species listings
- National Park Service Archives - US fossil site data
# Install the package
pip install .
# Generate sample database
dinosaur-cli sample --output sample_database.json
# Run demonstration
python -m demo
# View statistics
dinosaur-cli stats --database sample_database.json
# Query the database
dinosaur-cli query --name "Tyrannosaurus"pip install .For development, install in editable mode:
pip install -e .This installs the dinosaur-cli command while keeping the existing top-level modules
available for Python imports such as from integrator import DataIntegrator.
✓ Unified schema for all dinosaur data
✓ Automatic deduplication and merging
✓ Support for multiple data sources
✓ Comprehensive taxonomic classification
✓ Geographic and stratigraphic data
✓ Physical characteristics and measurements
✓ Museum collection tracking
✓ Reference management
✓ Data validation and quality checks
See INTEGRATION_GUIDE.md for detailed documentation including:
- Architecture overview
- Data schema details
- API reference
- Integration examples
- Extending the system
Dinosaur-combined/
├── schema.py # Unified data model
├── adapters.py # Data source adapters
├── integrator.py # Integration engine
├── dinosaur_cli.py # Command-line interface
├── demo.py # Demonstration script
├── INTEGRATION_GUIDE.md # Complete documentation
├── examples/ # Sample data files
│ ├── pbdb_sample.json
│ ├── dinodata_sample.json
│ └── amnh_sample.json
└── README.md # This file
from integrator import DataIntegrator
from schema import GeologicalPeriod
# Create integrator
integrator = DataIntegrator()
# Import from different sources
integrator.add_records_from_source('pbdb', pbdb_records)
integrator.add_records_from_source('dinodata', dinodata_records)
# Query
trex = integrator.database.get_by_name("Tyrannosaurus rex")
cretaceous = integrator.database.get_by_period(GeologicalPeriod.CRETACEOUS)
# Export
integrator.export_to_json('combined_database.json')# Import data
dinosaur-cli import pbdb examples/pbdb_sample.json
# Query by period
dinosaur-cli query --period cretaceous
# Query by clade
dinosaur-cli query --clade theropoda
# Show statistics
dinosaur-cli stats
# Validate database
dinosaur-cli validateContributions welcome! Areas for enhancement:
- Additional data source adapters
- Improved deduplication algorithms
- Web API and visualization tools
- Integration with online databases
This integration system is provided for educational and research purposes. Please respect the licenses and terms of use for each original data source.