Skip to content

Latest commit

 

History

History
85 lines (66 loc) · 2.54 KB

File metadata and controls

85 lines (66 loc) · 2.54 KB

Setup Guide

Option 1: DuckDB (Recommended — Fastest Setup)

DuckDB runs locally with zero configuration. Perfect for testing these queries.

Install DuckDB

# macOS
brew install duckdb

# pip (any OS)
pip install duckdb

Download the Dataset

  1. Go to Kaggle: Brazilian E-Commerce Dataset
  2. Download and extract all CSV files into the data/ folder in this repo

Load Data

-- Launch DuckDB
-- In terminal: duckdb olist.db

-- Create tables from CSVs
CREATE TABLE customers AS SELECT * FROM read_csv_auto('data/olist_customers_dataset.csv');
CREATE TABLE orders AS SELECT * FROM read_csv_auto('data/olist_orders_dataset.csv');
CREATE TABLE order_items AS SELECT * FROM read_csv_auto('data/olist_order_items_dataset.csv');
CREATE TABLE order_payments AS SELECT * FROM read_csv_auto('data/olist_order_payments_dataset.csv');
CREATE TABLE order_reviews AS SELECT * FROM read_csv_auto('data/olist_order_reviews_dataset.csv');
CREATE TABLE products AS SELECT * FROM read_csv_auto('data/olist_products_dataset.csv');
CREATE TABLE sellers AS SELECT * FROM read_csv_auto('data/olist_sellers_dataset.csv');
CREATE TABLE category_translation AS SELECT * FROM read_csv_auto('data/product_category_name_translation.csv');

Run Queries

# Run any query file
duckdb olist.db < queries/q01_revenue_kpi_dashboard.sql

Option 2: PostgreSQL

Create Database

createdb olist_analytics

Create Tables & Load Data

-- Create tables (example for orders — repeat for all tables)
CREATE TABLE orders (
    order_id VARCHAR PRIMARY KEY,
    customer_id VARCHAR,
    order_status VARCHAR,
    order_purchase_timestamp TIMESTAMP,
    order_approved_at TIMESTAMP,
    order_delivered_carrier_date TIMESTAMP,
    order_delivered_customer_date TIMESTAMP,
    order_estimated_delivery_date TIMESTAMP
);

-- Load from CSV
\COPY orders FROM 'data/olist_orders_dataset.csv' WITH CSV HEADER;

Option 3: Snowflake (Free Trial)

  1. Sign up at snowflake.com/en/data-cloud/overview/trial
  2. Create a database: CREATE DATABASE olist_analytics;
  3. Use Snowflake's PUT + COPY INTO commands to load CSVs from a stage
  4. All queries in this repo use standard SQL compatible with Snowflake

Option 4: Google BigQuery (Free Tier)

  1. Upload CSVs to a BigQuery dataset
  2. Minor syntax adjustments may be needed:
    • Replace DATE_TRUNC('month', col) with DATE_TRUNC(col, MONTH)
    • Replace ::DATE casts with CAST(col AS DATE)