Skip to content
This repository was archived by the owner on Mar 16, 2026. It is now read-only.

OstinUA/Ads.txt-Validator-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AdOps Shield: Ads.txt Validator & Analyzer

A production-ready Streamlit analytics toolkit for validating, auditing, and operationalizing ads.txt and app-ads.txt supply-path declarations at scale.

Python Streamlit License: Apache-2.0 Data Processing Visualization

Important

This tool focuses on syntax-level and structural validation of ads.txt and app-ads.txt records aligned with common IAB-style formatting expectations (Domain, Publisher ID, Account Type, optional Certification ID).

Table of Contents

Features

  • Multi-source ingestion pipeline:
    • Accepts user-provided domains/URLs.
    • Accepts uploaded local .txt files.
  • Smart URL normalization:
    • Adds https:// when protocol is missing.
    • Appends /app-ads.txt when a raw domain/root URL is provided.
  • Parser for canonical record format:
    • Extracts Domain, Publisher_ID, Account_Type, Certification_ID, and trailing inline comments.
  • IAB-style syntax validation checks:
    • Detects insufficient comma-separated fields (minimum 3 required).
    • Detects invalid account types (must be DIRECT or RESELLER).
  • Error observability workflow:
    • Captures line-level parse errors with original line content and reason.
    • Displays errors as a navigable table in the UI.
  • Analytics-ready data model:
    • Creates a structured pandas.DataFrame for valid records.
    • Generates normalized values (e.g., lowercase domains, uppercase account types).
  • Operational dashboarding:
    • KPI cards for total valid rows, unique partners, DIRECT, and RESELLER counts.
    • Pie chart for account type distribution.
    • Horizontal bar chart for top supply partners.
  • Data explorer and export:
    • Full-table browser with text search by domain or publisher ID.
    • One-click CSV export of the filtered view.
  • Lightweight architecture:
    • Clear separation of concerns between parsing logic (adops_logic.py) and UI (app.py).

Tip

Use this as a first-line QA utility in AdOps workflows before reconciling seller declarations against deeper business rules or external partner allowlists.

Tech Stack & Architecture

Core Stack

  • Language: Python 3.10+
  • Web UI: Streamlit
  • Data Processing: Pandas
  • Charting: Plotly Express
  • HTTP Client: Requests

Project Structure

Ads.txt-Validator-Analyzer/
├── app.py                 # Streamlit interface and dashboard orchestration
├── adops_logic.py         # Fetching, parsing, validation, and stats engine
├── requirements.txt       # Runtime dependencies
├── LICENSE                # Apache-2.0 license text
├── README.md              # Project documentation
└── .github/
    └── FUNDING.yml        # Sponsorship metadata

Key Design Decisions

  1. Separation of concerns

    • app.py owns interaction and visualization.
    • adops_logic.py owns data retrieval, validation, and parsing.
    • This keeps parser logic reusable for CLI, API, or batch integrations.
  2. Fail-soft parsing strategy

    • Invalid lines are collected into an error list instead of terminating parsing.
    • Valid lines still produce analytics and exportable datasets.
  3. Schema-first normalization

    • Parser emits a fixed column model for compatibility with BI workflows.
    • Account types are normalized to uppercase and domains to lowercase.
  4. Operational UI ergonomics

    • Sidebar input for source selection.
    • In-page KPI + charts + searchable table for quick triage.
flowchart TD
    A[User Input] --> B{Source Type}
    B -->|URL/Domain| C[fetch_from_url]
    B -->|Text File| D[Read Uploaded Content]
    C --> E[Raw ads.txt/app-ads.txt Content]
    D --> E
    E --> F[parse_content]
    F --> G[Valid Records DataFrame]
    F --> H[Line-level Validation Errors]
    G --> I[get_stats]
    G --> J[Plotly Visualizations]
    G --> K[Search + CSV Export]
    H --> L[Error Table in UI]
    I --> M[KPI Metrics]
Loading

Note

The current parser implements practical syntax checks and normalization. It is intentionally lightweight and does not implement every possible ads.txt semantic enforcement rule.

Getting Started

Prerequisites

  • Python 3.10 or newer
  • pip (latest recommended)
  • Internet connectivity (for URL-based fetching)

Installation

  1. Clone the repository:
git clone https://github.com/your-username/ads.txt-validator-analyzer.git
cd ads.txt-validator-analyzer
  1. Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate
  1. Install dependencies:
pip install --upgrade pip
pip install -r requirements.txt
  1. Run the application:
streamlit run app.py
  1. Open the local URL shown by Streamlit (typically http://localhost:8501).

Testing

This repository currently does not ship with a formal automated test suite, but you can run the following quality checks to validate runtime health and parser behavior.

  1. Syntax and import checks:
python -m compileall app.py adops_logic.py
  1. Manual validation in the Streamlit interface:
streamlit run app.py
  1. Optional parser smoke test in Python REPL:
python - <<'PY'
from adops_logic import AdsTxtParser

sample = """
google.com, pub-123, DIRECT, f08c47fec0942fa0
invalid.example, pub-456, WRONG
# comment-only line
"""

parser = AdsTxtParser()
df, errors = parser.parse_content(sample)
print(df)
print(errors)
print(parser.get_stats(df))
PY

Warning

If you add CI later, include deterministic tests for parser edge cases (comments, malformed rows, mixed casing, and missing optional fields) to prevent silent regressions.

Deployment

Production Run (Single-Instance)

Use Streamlit’s standard runtime with explicit host/port settings:

streamlit run app.py --server.address 0.0.0.0 --server.port 8501

Containerization (Recommended)

A minimal Docker deployment command (after adding your own Dockerfile):

docker build -t adops-shield:latest .
docker run --rm -p 8501:8501 adops-shield:latest

CI/CD Integration Guidelines

In your CI pipeline, add stages for:

  • Dependency installation (pip install -r requirements.txt)
  • Static/syntax checks (python -m compileall ...)
  • Optional parser smoke tests
  • Container image build and push

Caution

For internet-facing deployments, place this app behind TLS termination and standard reverse-proxy protections; URL fetching accepts external input and should be monitored with request limits/timeouts.

Usage

Launch the UI

streamlit run app.py

Analyze from URL

  1. Open the sidebar and select Load from URL.
  2. Enter either:
    • a root domain (example.com), or
    • a full path (https://example.com/ads.txt).
  3. Click Fetch Data.
  4. Inspect KPIs, chart insights, and syntax errors.
  5. Export filtered records via Download CSV.

Analyze from File Upload

  1. Select Upload File in the sidebar.
  2. Upload an ads.txt/app-ads.txt-style .txt file.
  3. Review parsed output and validation report.
  4. Filter by domain or publisher ID and export to CSV.

Programmatic Parsing Example

from adops_logic import AdsTxtParser

raw_text = """
google.com, pub-0000000000000000, DIRECT, f08c47fec0942fa0
appnexus.com, 12345, RESELLER
invalid.com, account-1, PARTNER
"""

parser = AdsTxtParser()

# Parse content into structured rows + validation errors.
df, errors = parser.parse_content(raw_text)

# Compute aggregate stats for dashboards or downstream reporting.
stats = parser.get_stats(df)

print("Valid rows:")
print(df)
print("\nErrors:")
print(errors)
print("\nStats:")
print(stats)

Expected behavior:

  • The first two records are accepted as valid rows.
  • The third record is flagged due to invalid account type.
  • stats is calculated only from valid rows.

Configuration

Input and Fetch Behavior

Option Current Behavior Notes
URL protocol Auto-prepends https:// if missing In fetch_from_url
Default path Appends /app-ads.txt if URL does not end with ads.txt/app-ads.txt Supports quick root-domain input
HTTP timeout 10 seconds Uses requests.get(..., timeout=10)

Validation Behavior

Rule Requirement Failure Output
Field count At least 3 comma-separated fields Insufficient parameters (minimum 3 required)
Account type Must be DIRECT or RESELLER Invalid Account Type: ...
Comment handling Inline # comments stripped from parse payload Preserved in Comment column

Environment Configuration

This project currently has no mandatory .env file. If you need environment-driven behavior, consider introducing the following pattern:

# Example optional runtime settings for future extension
APP_PORT=8501
APP_HOST=0.0.0.0
FETCH_TIMEOUT_SECONDS=10
# Example startup with explicit Streamlit flags
streamlit run app.py --server.address 0.0.0.0 --server.port 8501

Note

A future enhancement could externalize fetch timeout and default path behavior into typed config for reproducible multi-environment deployments.

License

This project is licensed under the Apache License 2.0. See LICENSE for full terms.

Contacts & Community Support

Support the Project

Patreon Ko-fi Boosty YouTube Telegram

If you find this tool useful, consider leaving a star on GitHub or supporting the author directly.

About

Professional AdOps utility for automated IAB compliance validation of ads.txt and app-ads.txt files. Features real-time syntax error detection, supply chain visualization (DIRECT vs. RESELLER), and structured CSV exporting to streamline inventory auditing and partner management.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages