A scalable, serverless solution for real-time JSON event validation on Google Cloud Platform. It allows anyone to easily validate the quality of any in-app event data (e.g., coming from server-side Google Tag Manager) before it hits your analytics or marketing destinations.
graph TD
%% Actors
User((User))
Source((Event Source))
%% UI Path
LB[HTTPS Load Balancer]
IAP{Identity-Aware Proxy}
WebUI[Streamlit UI<br/>Cloud Run]
%% API Path
APIGW[API Gateway]
CF[Validator Function<br/>Cloud Functions]
%% Data
GCS[(GCS Schema Bucket)]
BQ[(BigQuery Logs)]
%% UI Flow
User -->|HTTPS| LB
LB --> IAP
IAP -->|Authorized| WebUI
WebUI <-->|Read / Write| GCS
%% API Flow
Source -->|REST + API Key| APIGW
APIGW --> CF
CF -->|Fetch Schema| GCS
CF -->|Write Logs| BQ
validator_src/: Node.js source code for the Cloud Function.terraform_backend/: Infrastructure as Code (Terraform) for backend services - Cloud Function, BigQuery, GCS buckets, API Gateway.terraform_ui/: Infrastructure as Code (Terraform) for UI services - Streamlit Cloud Run, Load Balancer, IAP authentication.streamlit_ev/: (Optional) UI for schema management and parameter building.
- Dynamic Schema Loading: Loads JSON schemas from GCS based on event names.
- BigQuery Logging: Automatically logs validation results and processing errors for auditing.
- Performance Optimized: Parallel fetching and analysis (GCS + Health checks) ensures a snappy UI even with 100+ schemas.
- API Gateway Secured: Protected by API Keys with automated managed service activation.
- Fully Automated: One-click deployment with built-in propagation delays for stability.
- GA4 Ready: Pre-loaded with 36 recommended GA4 event schemas and a master parameter repository.
To validate data from Server-Side Google Tag Manager (sGTM), send the entire Event Data object to the validator function, eg. using the "JSON HTTP Request" tag (e.g., the popular template by stape.io) following the following steps:
- Tag Type: Use the JSON HTTP Request tag.
- Destination URL: Set to your
https://<API_GATEWAY_URL>/eventsValidator?key=<API_KEY>. - Body: Select the "Include in the body all Event Data" option.
- Triggering: It is highly recommended to sample incoming data based on volume to manage costs.
- Option: Use a specific Trigger in GTM to select only certain event types (e.g.,
purchase,sign_up) or validation-prone events.
- Option: Use a specific Trigger in GTM to select only certain event types (e.g.,
The validator expects a JSON body with a data object containing the event details.
{
"data": {
"event_name": "your_event_name",
"param_1": "value_1",
"param_2": 123
}
}The validator supports two levels of features: those accessible via the Streamlit UI (for easy management) and the full set supported by the Cloud Function (for advanced use cases via manual schema editing).
The UI allows you to define these validations visually:
- Structure Validation: Automatically checks if required parameters are present in the event.
- Optional Fields: Mark fields as Optional (validation is skipped if the field is missing).
- Conditional Validation: Validate a field ONLY if another specific parameter is present (e.g.,
item_idrequired only ifitem_list_nameexists). - Type Checking:
string,number,boolean,array. - Exact Match: Enforce a specific value (e.g.,
event_namemust bepurchase). - Regex Pattern: Validate strings against a Regular Expression (e.g.,
^user_\d+$). - Array Items: Define a schema for items within an array.
- "Any" Value: Enforces the Type but ignores the specific value.
- Nullable Numbers: In the UI, clearing a number field sets it to "empty" (null/undefined) rather than
0.
The core validation engine (validator_src) supports additional advanced features if you edit JSON schemas manually:
- Object Type: Validate nested JSON objects (not just arrays of objects).
- Exact Length: Validate that an array or string has a specific exact length (via
lengthproperty in JSON).
| Type | Description | UI Support | Code Support |
|---|---|---|---|
string |
Text values. | โ | โ |
number |
Integers or Floats. | โ | โ |
boolean |
true or false. |
โ | โ |
array |
List of items (can have nested schemas). | โ | โ |
object |
Nested JSON object. | โ | โ |
| regex | Regular Expression Pattern. | โ | โ |
Tip
Regex Best Practice: For strict validation, always use start (^) and end ($) anchors. Without them, the validator accepts partial matches (e.g., pattern \d+ will validly match "abc123xyz").
To utilize features not yet available in the UI (like nested Objects, Regex patterns, or exact length checks), you can manually edit the JSON schema file in your Google Cloud Storage bucket.
Use this structure to validate a generic object (e.g. user_info) containing specific fields.
"user_info": {
"type": "object",
"nestedSchema": {
"user_id": { "type": "string" },
"is_active": { "type": "boolean" }
}
}Validates that a String or Array has an exact specific length. (Note: This is strictly for exact length, not min/max).
"transaction_id": {
"type": "string",
"length": 10
},
"items": {
"type": "array",
"length": 3,
"nestedSchema": { ... }
}By default, the validator treats every field defined in the schema as Required. If a required field is missing or empty, validation fails.
Use "optional": true to allow a field to be missing, null, or an empty string.
"promo_code": {
"type": "string",
"optional": true
}Running this setup on GCP is designed to be cost-effective for validation workloads.
- Approximate Cost: ~$0.50 per day for ~50,000 processed events.
- Includes: Cloud Functions invocations, Cloud Storage class A/B operations, and minimal BigQuery streaming ingestion.
- Note: Costs may vary based on exact payload size and region.
Configure these flags in terraform_backend/terraform.tfvars to balance visibility with storage costs:
| Flag | Description |
|---|---|
LOG_VALID_FIELDS_FLAG |
Logs every single validated field (even if correct). |
LOG_PAYLOAD_WHEN_ERROR_FLAG |
Attaches the full JSON payload when an error is found. |
LOG_PAYLOAD_WHEN_VALID_FLAG |
Attaches the full JSON payload for successful events. |
- GCP Project: An active Google Cloud Project with Billing Account linked.
- Crucial: Many APIs (Compute, Cloud Build, API Gateway) will fail to activate or describe without an active billing account, causing the installer to hang or error.
- Tools:
- Terraform (>= 1.5.0)
- gcloud CLI (authenticated:
gcloud auth application-default login)
- Local Node.js: (Optional, for local testing) Node.js 20+.
You have two ways to authenticate Terraform:
Simply log in with your Google account in the terminal. Terraform will automatically pick up your permissions (ADC). No .json keys required!
gcloud auth application-default login
gcloud auth application-default set-quota-project YOUR_PROJECT_ID(Note: Your personal account must have sufficient permissions, ideally Editor + Project IAM Admin)
If you are deploying via an automated CI/CD pipeline or strictly require a Service Account:
- Create a Service Account in the IAM Console.
- Assign Roles:
Editor(Fastest for testing) OR the following specific roles:API Keys Admin(For Gateway security)ApiGateway Admin(For the entry point)Artifact Registry Administrator(For UI images)BigQuery Admin(For logs)Cloud Functions Admin&Cloud Run Admin(For the validator function and UI)Compute Admin(For Load Balancer and Global IPs)IAP Policy Admin&IAP Settings Admin(For UI authentication)Project IAM Admin(To grant permissions to worker accounts)Service Account Admin&Service Account Key Admin(To manage identities)Service Management Administrator&Service Usage Admin(To enable APIs automatically)Storage Admin(For logs and schemas)
Service Account User(Always required for Terraform to deploy resources)
- Generate a JSON key.
- Uncomment
credentials_fileinside yourterraform.tfvarsfiles and point it to the downloaded JSON key (e.g.credentials_file = "credentials.json"). Never commit the.jsonkey to Git.
This project implements a secure GitOps workflow using Workload Identity Federation (Keyless Authentication).
-
Terraform Automation:
- Creates a
Workload Identity Pool&Providerfor GitHub in GCP. - Creates a dedicated Service Account for the CI/CD pipeline (
github-actions-uploader). - Populates your GitHub Repository with initial schemas and configure secrets automatically:
GCP_WORKLOAD_IDENTITY_PROVIDERGCP_SERVICE_ACCOUNTGCS_BUCKET_NAME
- Creates a
-
Workflow:
- Any push to
mainin your schema repository triggers a GitHub Action. - Authentication is handled via OIDC tokens (No long-lived JSON keys stored in secrets!).
- Changed JSON schemas are validated and instantly synced to the GCS bucket.
- Any push to
To achieve a "Zero-Touch" experience, Terraform manages most configuration files.
| File | Component | Owned By | Description |
|---|---|---|---|
terraform_backend/credentials.json |
Backend | User | Deployer Service Account key (Manual). |
terraform_backend/terraform.tfvars |
Backend | User | Backend project configuration (Manual). |
terraform_ui/credentials.json |
UI | User | Deployer Service Account key (Manual). |
terraform_ui/terraform.tfvars |
UI | User | UI project configuration (Manual). |
streamlit_ev/.env |
UI | Terraform | App config: Bucket, Project, etc. |
Important
Files owned by Terraform are managed automatically. Do not edit them manually.
To make provisioning flawless and "Zero-Touch", we provide a unified Setup Wizard (install.sh) that dynamically configures endpoints, creates Git repositories, and runs Terraform for you without requiring JSON key downloads.
- Authenticate your terminal users:
- GCP:
gcloud auth application-default login - GitHub:
gh auth login
- GCP:
Execute the installer from the root directory:
./install.shThe script will interactively:
- Verify access to your desired GCP Project and compute regions.
- Bind Application Default Credentials to bypass limits safely (no
.jsonrequired!). - Automatically provision or connect an existing GitHub Repository for schemas.
- Auto-generate all required
.tfvarssecurely without pushing them to Git. - Deploy the backend and retrieve Gateway URLs.
- Gather OAuth credentials and fully deploy the UI via Cloud Build + Cloud Run.
The infrastructure is split into two independent projects for flexibility:
Important
Before running terraform apply, you must create your own terraform.tfvars file from the provided example in each project directory. The .tfvars files contain sensitive configuration and are not committed to the repository.
- Create a Repository: Create a new, empty private repository on GitHub.com. Do not initialize with README/.gitignore.
- Generate a Token: Create a Personal Access Token (PAT).
- Classic Token (Recommended): Select scopes
repo(full control) andworkflow. - Fine-grained Token: Must have Read and Write access to:
ContentsSecretsWorkflowsAdministration(Required for Branch Protection rules)
- Classic Token (Recommended): Select scopes
cd terraform_backend
# Create your configuration file from the example
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with:
# 1. Project details (project_id, region, location)
# 2. GitHub Integration (Optional: github_token, schema_repo_owner, schema_repo_name)
#
# IMPORTANT: Update all placeholder values before proceeding
terraform init
terraform apply(Note: The deployment includes a 60s delay to allow Google's API Gateway to propagate. Initial GA4 schemas will be pushed to your GitHub repository automatically.)
Save the outputs - you'll need them for the UI deployment:
terraform output schemas_bucket
terraform output bq_dataset
terraform output bq_tablecd terraform_ui
# Create your configuration file from the example
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with:
# 1. Project details (project_id, region)
# 2. schemas_bucket (from backend outputs)
# 3. IAP credentials (iap_client_id, iap_client_secret)
# 4. GitHub Integration (Optional: github_token, schema_repo_owner, etc.) for UI capabilities
# IMPORTANT: Update all placeholder values
terraform init
terraform apply- Get Endpoint Details:
cd terraform_backend terraform output api_gateway_url # API Key is sensitive/hidden by default. Use -raw to reveal it: terraform output -raw api_key
- Test the Validator:
Response:
curl -X POST "https://<URL>/eventsValidator?key=<KEY>" \ -H "Content-Type: application/json" \ -d '{ "data": { "event_name": "example", "example_param": "success" } }'
{"status":"event valid","eventsLogged":1}
The streamlit_ev/ application provides a "Parameter Repository" approach to schema management.
If you deployed using the steps above:
- Backend (
terraform_backend) has:- Created the schema bucket and pre-loaded 36 GA4 schemas and
repo.json. - Created the BigQuery dataset and table for validation logs.
- Created the schema bucket and pre-loaded 36 GA4 schemas and
- UI (
terraform_ui) has:- Created a dedicated
streamlit-workerService Account. - Granted it
Storage Object Adminpermissions on the schema bucket. - Granted it
BigQuery Data ViewerandJob Userpermissions for reading logs. - Generated
streamlit_ev/.envwith your project, bucket, and BigQuery details.
- Created a dedicated
- Authenticate:
gcloud auth application-default login cd streamlit_ev- Install Dependencies:
uv sync
- Run:
uv run streamlit run app/app.py
Before deploying to the cloud, you must configure the following using Google Cloud Console:
- OAuth Consent Screen: Set to "Internal" and add
iap.googleapis.comscope. - OAuth Client ID: Create a "Web application" ID.
- Update Config: Add
iap_client_id,iap_client_secret, and the list ofauthorized_usersto yourterraform_ui/terraform.tfvarsfile.
Warning
Important for Direct IAP (Cloud Run Preview):
This project uses the new Identity-Aware Proxy integration directly on Cloud Run (Preview feature).
If you encounter Error 400: redirect_uri_mismatch after logging in, you MUST add the following to your OAuth Client ID's Authorized redirect URIs:
https://<YOUR-CLOUD-RUN-URL>/_gcp_iap/authenticate
(If you switched back to use_classic_load_balancer = true, use: https://iap.googleapis.com/v1/oauth/clientIds/<YOUR_CLIENT_ID>:handleRedirect)
Tip
User Identity Format: In the authorized_users list (within terraform.tfvars), ensure you use the proper prefix:
- Individual:
user:[email protected] - Group:
group:[email protected]
- Build Image:
# Ensure you match the region defined in terraform.tfvars (e.g., europe-west1) gcloud builds submit --region=europe-west1 --tag europe-west1-docker.pkg.dev/[PROJECT_ID]/event-validator-ui-repo/event-validator-ui:latest ./streamlit_ev - Terraform Apply:
cd terraform_ui terraform apply
- Params Repo: Centralized database of parameters with strict type validation.
- Smart Sync & Diff Review: See exactly what changed (JSON diff) before syncing Repo updates to GCS.
- Performance Caching: Session-based memoization for instant tab switching and bulk operations.
- Overridable Values: Assign specific values in the Builder while staying synced with Repo metadata.
- Empty vs Zero Support: Robust handling of numeric fieldsโset fields to "empty" (null) instead of forcing 0.0.
- Explorer: Direct visibility and health-check analysis for your GCS schema bucket.
- Auto-Sync: Propagate changes from the Repo to all GCS schemas with one click.
- Health Checks: Automatically detect when GCS schemas are out of sync with your repository.
- Direct IAP Integration: Secure access using Google's native Cloud Run authentication (Preview).
- License: This project is licensed under the GNU General Public License (GPL). It is free to use, fork, and modify.
- Contributions: We encourage contributions! Please fork the repository and submit pull requests.
- Roadmap: View the public roadmap at github.com/orgs/defuseddata/projects/1.
- Maintainer: Maintained by Defused Data.