Skip to content

atlanhq/lakehouse-solutions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lakehouse Solutions

Overview

This repository provides a collection of solutions for the Atlan Metadata Lakehouse (MDLH), enabling Atlan customers to deploy production-ready tools on their own compute environments. Solutions include analytical models (Gold Layer), maintenance utilities, and more.

Supported Platforms

  • Snowflake ❄️
  • Databricks 🔷
  • BigQuery ☁️
  • DuckDB 🦆 (coming soon)
  • Trino 🚀 (coming soon)

Solutions

Gold Layer

The Gold Layer provides curated, analytics-ready metadata views that serve as the single entry point for both human and AI consumption of lakehouse metadata.

What it delivers:

  • Unified Asset Registry: Centralized view of all assets across SQL, BI, pipelines, data quality, and object domains
  • Relational Asset Details: Consolidated metadata for databases, schemas, tables, views, columns, queries, and procedures
  • Governance & Classification: Tag and custom metadata views for data governance
  • Lineage: Complete multi-hop upstream and downstream lineage relationships
  • Data Quality: Views for Atlan-native and third-party data quality rules
  • Pipeline Details: Orchestration and pipeline asset metadata
  • Glossary: Business glossary terms, categories, and hierarchies

Catalog Integration (Snowflake)

Snowflake supports native Iceberg REST Catalog federation, allowing you to query the Atlan Lakehouse directly without any external scripts or scheduled refresh jobs. The catalog integration guide walks through creating an External Volume, Catalog Integration, and Linked Database.

Foreign/External Iceberg Tables (Databricks, BigQuery)

Databricks and BigQuery do not currently support querying federated Iceberg REST catalogs natively. These scripts provide a workaround by creating local table references (foreign Iceberg tables in Databricks Unity Catalog, external Iceberg tables in BigQuery) that point directly to the Atlan Lakehouse metadata files. The scripts handle both initial table creation and ongoing metadata refresh to keep tables in sync. Databricks supports both AWS S3 and Azure ADLS storage.

AI Agent Skill

An Agent Skills skill that teaches AI coding agents (Claude Code, Cortex Code, Genie Code) how to connect to and query the lakehouse. Includes ~45 SQL templates covering metadata completeness, lineage analysis, glossary export, and usage analytics (active users, feature adoption, engagement, retention, health scoring). See skills/atlan-lakehouse/ for installation instructions.

MDLH Table Maintenance (Snowflake only)

A native Snowflake Streamlit app that identifies stale Iceberg tables and provides an option to repair them by refreshing metadata and enabling auto-refresh.

Getting Started

Prerequisites

  • Access to one of the supported compute environments
  • Appropriate permissions to create databases, schemas, views, and tables
  • Connection to your Atlan metadata catalog

Platform Guides

Navigate to the platform-specific folder for available solutions and setup instructions:

Use Cases

  • Analytics & Reporting: Query-ready metadata for business intelligence and analytics
  • AI/ML Consumption: Structured metadata for AI agents and machine learning models
  • Data Governance: Standardized views for compliance and governance reporting
  • Lineage Analysis: Complete lineage visualization and impact analysis
  • Asset Discovery: Unified search and discovery across all metadata types

Repository Structure

lakehouse-solutions/
├── README.md                              # This file
├── snowflake/
│   ├── README.md                          # Snowflake solutions overview
│   ├── catalog-integration/
│   │   └── README.md                      # Catalog integration setup guide
│   ├── gold-layer/
│   │   ├── README.md                      # Gold Layer setup guide
│   │   └── MDLH_Gold_layer.sql            # Gold Layer deployment script
│   └── mdlh-table-maintenance/
│       ├── README.md                      # Table maintenance setup guide
│       └── MDLH_table_refresh_repair.py   # Streamlit app
├── databricks/
│   ├── README.md                          # Databricks solutions overview
│   ├── gold-layer/
│   │   ├── README.md                      # Gold Layer setup guide
│   │   ├── MDLH_Gold_layer.sql            # Gold Layer deployment script
│   │   └── refresh_materialized_views.sql # Scheduled refresh script
│   └── foreign-iceberg-tables/
│       ├── README.md                      # Foreign Iceberg Tables setup guide
│       ├── dbx_foreign_iceberg_tables_create.py   # Table creation script
│       └── dbx_foreign_iceberg_tables_refresh.py  # Table refresh script
├── bigquery/
│   ├── README.md                          # BigQuery solutions overview
│   ├── gold-layer/
│   │   ├── README.md                      # Gold Layer setup guide
│   │   └── MDLH_Gold_layer.sql            # Gold Layer deployment script
│   └── external-iceberg-tables/
│       ├── README.md                      # External Iceberg Tables setup guide
│       └── bq_external_iceberg_tables_create_refresh.py  # Create/refresh script
├── skills/
│   └── atlan-lakehouse/
│       ├── README.md                      # Installation & usage guide
│       └── SKILL.md                       # Agent skill definition
├── duckdb/                                # Coming soon
└── trino/                                 # Coming soon

Contributing

This repository is maintained by the Atlan team. For issues, questions, or contributions, please contact the Atlan engineering team.

License

[Specify license here]

Support

For support and questions:

  • Documentation: See platform-specific README files
  • Atlan Support: [Contact information]

Note: This repository contains deployment scripts for customer-managed infrastructure. All scripts are designed to be idempotent and production-ready.

About

We are building a Gold layer on top of the Lakehouse architecture. This repository contains the documentation and SQL scripts needed to deploy the Gold layer on customer-managed compute platforms (Atlan customers), including Snowflake, Databricks, BigQuery, and Trino.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages