- Old
lakehousecatalog is nowdelta - And
icebergalso exists with this deployment.
- Docker Desktop (with k8s enabled)
- make
In this workshop I provide to you a setup for a MinIO with a Metastore (utilising postgres internally) to successfully query delta tables via trino.
If you are using M1 chip:
make build-image COMPONENT=metastore DOCKER_BUILD="buildx build --platform linux/amd64" CONTEXT="tools/docker"Otherwise
make build-image COMPONENT=metastore CONTEXT="tools/docker"make deploy-local-metastore ENV=localTo delete
make delete-local-metastore ENV=localFor Hive, Postgres, MinIO we choose to use StatefulSet as the deployment type over Deployment.
A StatefulSet should be chosen for stateful components that require predictable Pod identities, ordered rollouts, and stable storage access.
Deployment should be chosen instead when your service is stateless, doesn't require persistent storage, and won't be affected by changes to Pod identities.
First add the trino helm repo!
helm repo add trino https://trinodb.github.io/chartsMake sure you use version >= 0.21.0
helm search repo trinoShould give:
NAME CHART VERSION APP VERSION DESCRIPTION
trino/trino 0.21.0 448 Fast distributed SQL query engine for big data ...To get latest chart versions
helm repo update NAME CHART VERSION APP VERSION DESCRIPTION
trino/trino 1.36.0 468 Fast distributed SQL query engine for big data ...We use default values for the deployment, but please note section:
catalogs:
iceberg: |-
connector.name=iceberg
hive.metastore.uri=thrift://hive-service.metastore:9083
hive.s3.endpoint=http://minio-service.metastore:9000
hive.s3.path-style-access=true
hive.s3.aws-access-key=minio
hive.s3.aws-secret-key=minio123
iceberg.file-format=parquet
delta: |-
connector.name=delta_lake
hive.metastore.uri=thrift://hive-service.metastore:9083
hive.s3.endpoint=http://minio-service.metastore:9000
hive.s3.path-style-access=true
hive.s3.aws-access-key=minio
hive.s3.aws-secret-key=minio123
delta.enable-non-concurrent-writes=true
rdbms: |-
connector.name=postgresql
connection-url=jdbc:postgresql://postgres-service.metastore:5432/hivemetastore?allowPublicKeyRetrieval=true&useSSL=false&serverTimezone=UTC
connection-user=metastore
connection-password=passwordThis will ensure we can query our tables from our metastore.
make deploy-local-trino NAMESPACE=trinoAnd to delete
make delete-local-trino NAMESPACE=trinoThe trino cluster is now up and running!
We will have to execute ourselves into the coordinator and create a table.
kubectl get pods -n trinokubectl exec -it datapains-trino-cluster-coordinator-747bc6ccff-b6tx2 -n trino -- /bin/bash[trino@datapains-trino-cluster-coordinator-747bc6ccff-b6tx2 /]$ trino
trino> SHOW CATALOGS;
Catalog
-----------
delta
iceberg
system
tpcds
tpch
(4 rows)
Query 20240526_120408_00000_kyyrh, FINISHED, 2 nodes
Splits: 20 total, 20 done (100.00%)
2.57 [0 rows, 0B] [0 rows/s, 0B/s]trino> CREATE SCHEMA delta.bronze;trino> CREATE TABLE delta.bronze.products ( name VARCHAR, price DOUBLE, product_no BIGINT, ingest_date DATE, created TIMESTAMP WITH TIME ZONE);trino> SELECT * FROM delta.bronze.products;
trino> INSERT INTO delta.bronze.products VALUES (CAST('Apple' AS VARCHAR), CAST(889.0 AS DOUBLE), CAST(1654523786273 AS BIGINT), CURRENT_DATE, CURRENT_TIMESTAMP);
trino> SELECT * FROM delta.bronze.products;trino> CREATE SCHEMA iceberg.bronze;trino> CREATE TABLE iceberg.bronze.products ( name VARCHAR, price DOUBLE, product_no BIGINT, ingest_date DATE, created TIMESTAMP WITH TIME ZONE);trino> SELECT * FROM iceberg.bronze.products;
trino> INSERT INTO iceberg.bronze.products VALUES (CAST('Apple' AS VARCHAR), CAST(889.0 AS DOUBLE), CAST(1654523786273 AS BIGINT), CURRENT_DATE, CURRENT_TIMESTAMP);
trino> SELECT * FROM iceberg.bronze.products;