Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
9f8dc72
fix: use synchronous DuckDB constructor to avoid bun runtime timeout
suryaiyer95 Apr 2, 2026
eaa10b7
revert: restore async DuckDB constructor — sync change was bogus
suryaiyer95 Apr 2, 2026
d110d6e
feat: add MSSQL/Fabric dialect mapping and data-parity support
suryaiyer95 Apr 6, 2026
3e6b3e0
feat: add Azure AD authentication to SQL Server driver (7 flows)
suryaiyer95 Apr 6, 2026
54aceed
docs: add MSSQL and Microsoft Fabric documentation to data-parity SKI…
suryaiyer95 Apr 6, 2026
1056c64
fix: delegate Azure AD credential creation to tedious and remove unde…
suryaiyer95 Apr 7, 2026
bfb1295
fix: upgrade `mssql` to v12 with `ConnectionPool` isolation and row f…
suryaiyer95 Apr 13, 2026
32d4afc
fix: resolve TypeScript spread-type errors in Azure AD conditional op…
suryaiyer95 Apr 13, 2026
fda536d
fix: resolve cubic review findings on MSSQL/Fabric PR
suryaiyer95 Apr 14, 2026
d004e1b
test: add fabric connection path and flattenRow coverage
suryaiyer95 Apr 14, 2026
b69a3d2
docs: document minimum versions and make @azure/identity optional
suryaiyer95 Apr 14, 2026
d1cdd1b
fix: acquire Azure AD tokens directly to bypass Bun browser-bundle re…
suryaiyer95 Apr 16, 2026
173d32f
fix: auto-acquire Azure AD token for `azure-active-directory-access-t…
suryaiyer95 Apr 16, 2026
63769f4
fix: side-aware CTE injection for cross-warehouse `data_diff` SQL-que…
suryaiyer95 Apr 17, 2026
1977232
chore: regenerate `bun.lock` to match drivers `peerDependencies` layout
suryaiyer95 Apr 17, 2026
872e082
fix: address all CRITICAL/MAJOR findings from multi-model review
suryaiyer95 Apr 17, 2026
38cfb0e
fix: address PR #705 bot review findings (coderabbitai + cubic + copi…
suryaiyer95 Apr 17, 2026
3ebcec1
chore: drop stale `@azure/identity` peer-dep entries from `bun.lock`
suryaiyer95 Apr 17, 2026
64dd815
fix: CI — isolate `data-diff-cross-dialect` tests from other files
suryaiyer95 Apr 17, 2026
282876b
fix: follow-up PR bot review findings (cubic P1/P2 + coderabbit MAJOR…
suryaiyer95 Apr 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions .opencode/skills/data-parity/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,19 @@ WHERE table_schema = 'mydb' AND table_name = 'orders'
ORDER BY ordinal_position
```

```sql
-- SQL Server / Fabric
SELECT c.name AS column_name, tp.name AS data_type, c.is_nullable,
dc.definition AS column_default
FROM sys.columns c
INNER JOIN sys.types tp ON c.user_type_id = tp.user_type_id
INNER JOIN sys.objects o ON c.object_id = o.object_id
INNER JOIN sys.schemas s ON o.schema_id = s.schema_id
LEFT JOIN sys.default_constraints dc ON c.default_object_id = dc.object_id
WHERE s.name = 'dbo' AND o.name = 'orders'
ORDER BY c.column_id
```

```sql
-- ClickHouse
DESCRIBE TABLE source_db.events
Expand Down Expand Up @@ -409,3 +422,56 @@ Even when tables match perfectly, state what was checked:

**Silently excluding auto-timestamp columns without asking the user**
→ Always present detected auto-timestamp columns (Step 4) and get explicit confirmation. In migration scenarios, `created_at` should be *identical* — excluding it silently hides real bugs.

---

## SQL Server and Microsoft Fabric

### Minimum Version Requirements

| Component | Minimum Version | Why |
|---|---|---|
| **SQL Server** | 2022 (16.x) | `DATETRUNC()` used for date partitioning; `LEAST()`/`GREATEST()` used by Rust engine |
| **Azure SQL Database** | Any current version | Always has `DATETRUNC()` and `LEAST()` |
| **Microsoft Fabric** | Any current version | T-SQL surface includes all required functions |
| **mssql** (npm) | 12.0.0 | `ConnectionPool` isolation for concurrent connections, tedious 19 |
| **@azure/identity** (npm) | 4.0.0 | Required only for Azure AD authentication; tedious imports it internally |

> **Note:** Date partitioning (`partition_column` + `partition_granularity`) uses `DATETRUNC()` which is **not available on SQL Server 2019 or earlier**. Basic diff operations (joindiff, hashdiff, profile) work on older versions. If you need partitioned diffs on SQL Server < 2022, use numeric or categorical partitioning instead.

### Supported Configurations

| Warehouse Type | Authentication | Notes |
|---|---|---|
| `sqlserver` / `mssql` | User/password or Azure AD | On-prem or Azure SQL. SQL Server 2022+ required for date partitioning. |
| `fabric` | Azure AD only | Microsoft Fabric SQL endpoint. Always uses TLS encryption. |

### Connecting to Microsoft Fabric

Fabric uses the same TDS protocol as SQL Server — no separate driver needed. Configuration:

```yaml
type: "fabric"
host: "<workspace-id>-<item-id>.datawarehouse.fabric.microsoft.com"
database: "<warehouse-name>"
authentication: "azure-active-directory-default" # recommended
```

Auth shorthands (mapped to full tedious type names):
- `CLI` or `default` → `azure-active-directory-default`
- `password` → `azure-active-directory-password`
- `service-principal` → `azure-active-directory-service-principal-secret`
- `msi` or `managed-identity` → `azure-active-directory-msi-vm`

Full Azure AD authentication types:
- `azure-active-directory-default` — auto-discovers credentials via `DefaultAzureCredential` (recommended; works with `az login`)
- `azure-active-directory-password` — username/password with `azure_client_id` and `azure_tenant_id`
- `azure-active-directory-access-token` — pre-obtained token (does **not** auto-refresh)
- `azure-active-directory-service-principal-secret` — service principal with `azure_client_id`, `azure_client_secret`, `azure_tenant_id`
- `azure-active-directory-msi-vm` / `azure-active-directory-msi-app-service` — managed identity

### Algorithm Behavior

- **Same-warehouse** MSSQL or Fabric → `joindiff` (single FULL OUTER JOIN, most efficient)
- **Cross-warehouse** MSSQL/Fabric ↔ other database → `hashdiff` (automatic when using `auto`)
- The Rust engine maps `sqlserver`/`mssql` to `tsql` dialect and `fabric` to `fabric` dialect — both generate valid T-SQL syntax with bracket quoting (`[schema].[table]`).
24 changes: 18 additions & 6 deletions bun.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion packages/drivers/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
"@google-cloud/bigquery": "^8.0.0",
"@databricks/sql": "^1.0.0",
"mysql2": "^3.0.0",
"mssql": "^11.0.0",
"mssql": "^12.0.0",
"oracledb": "^6.0.0",
"duckdb": "^1.0.0",
"mongodb": "^6.0.0",
Expand Down
7 changes: 7 additions & 0 deletions packages/drivers/src/normalize.ts
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,12 @@ const SQLSERVER_ALIASES: AliasMap = {
...COMMON_ALIASES,
host: ["server", "serverName", "server_name"],
trust_server_certificate: ["trustServerCertificate"],
authentication: ["authenticationType", "auth_type", "authentication_type"],
azure_tenant_id: ["tenantId", "tenant_id", "azureTenantId"],
azure_client_id: ["clientId", "client_id", "azureClientId"],
azure_client_secret: ["clientSecret", "client_secret", "azureClientSecret"],
access_token: ["token", "accessToken"],
azure_resource_url: ["azureResourceUrl", "resourceUrl", "resource_url"],
}

const ORACLE_ALIASES: AliasMap = {
Expand Down Expand Up @@ -104,6 +110,7 @@ const DRIVER_ALIASES: Record<string, AliasMap> = {
mariadb: MYSQL_ALIASES,
sqlserver: SQLSERVER_ALIASES,
mssql: SQLSERVER_ALIASES,
fabric: SQLSERVER_ALIASES,
oracle: ORACLE_ALIASES,
mongodb: MONGODB_ALIASES,
mongo: MONGODB_ALIASES,
Expand Down
Loading
Loading