Skip to content
View Pawansingh3889's full-sized avatar

Block or report Pawansingh3889

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Pawansingh3889/README.md
Typing SVG

Pawan Singh Kapkoti

Data engineer in food manufacturing. I build pipelines, compliance tools, and local AI for factories where data can't leave the building. I also hack on the open-source tools I depend on.

Python SQL dbt Airflow PostgreSQL Databricks Docker Terraform GitHub Actions AWS Ollama ChromaDB LangChain Streamlit FastAPI PySpark Kafka Plotly Power BI

Portfolio Resume LinkedIn Email


What I'm building

On-prem AI for food factories. Ask your database questions in English.

$ ollama pull phi3:mini
$ streamlit run app.py

> "What was yesterday's yield?"
  → SELECT ... FROM ProductionRuns
  → 94.2% yield, 38kg waste
  → "Yesterday's yield was above target..."

Ollama ChromaDB LangChain SQLAlchemy 36 tests

Docs · Code

BRC/HACCP food safety. Replaces the Excel spreadsheets.

$ make setup && make run

  Traceability score:  97.2%
  Temperature control: 99.1%
  Overall compliance:  PASS
  Batches tracked:     674
  Temp readings:       8,640

Streamlit PySpark Databricks z-score anomaly detection

Live · Code

End-to-end: live API → PostgreSQL → dbt → Streamlit.

$ python ingestion/fetch_crimes.py
  Fetching 10 cities × 6 months...
  Loaded 99,675 records (ON CONFLICT DO NOTHING)

$ dbt run && dbt test
  4 models, 53 tests passed
  fct_crimes_by_city: 814 rows
  fct_crime_hotspots: top 100 streets

PostgreSQL dbt Prefect 3 CI/CD workflows weekly auto-ingest

Live · Code

GitHub Action that reviews .sql in PRs using local AI.

# One file. That's the setup.
- uses: Pawansingh3889/sql-ops-reviewer@v1
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
⚠ WARNING: SELECT * fetches all columns
✗ ERROR: String concatenation → injection risk
  Total: 2 findings across 1 file

GitHub Actions Ollama zero API keys

Code


Upstream contributions

Project What I shipped
vllm 75K★ Improved DCP/PCP error messages with actionable backend guidance
pandas 45K★ Clarified str.cat() return type docs + code review that led to merged test case
ChromaDB 18K★ 220-line HNSW tuning guide + replaced 103 ValueError with InvalidArgumentError across API layer
pgspecial/pgcli 12K★ Added \dS system object metacommands matching psql behavior
drt-hub/drt Snowflake source + ClickHouse, Parquet, Teams, CSV/JSON destinations — 6 PRs merged

GitHub Stats GitHub Streak Activity Graph

I break things, read source code, and ship fixes upstream.

Yorkshire, UK · Available for collaboration · [email protected]

Pinned Loading

  1. uk-crime-pipeline uk-crime-pipeline Public

    End-to-end data pipeline: Police UK API → PostgreSQL → dbt → Streamlit dashboard. CI/CD with GitHub Actions

    Python

  2. OpsMind OpsMind Public

    On-premises AI assistant for food factories. Natural language SQL, document search, waste prediction, compliance dashboards. Runs locally with Ollama.

    Python 1

  3. uk-education-attainment uk-education-attainment Public

    ML analysis of UK A-Level attainment gaps by ethnicity, gender & deprivation using DfE data

    Jupyter Notebook

  4. Hackathon-mediask Hackathon-mediask Public

    MediAsk — health Q&A platform for factory workers. Flask, PostgreSQL, Gemini AI, Docker. Live on Render.

    Python

  5. manufacturing-compliance-dashboard manufacturing-compliance-dashboard Public

    BRC/HACCP compliance dashboard for food manufacturing. Batch traceability, temperature monitoring, allergen matrix. Streamlit + SQLite.

    Python

  6. sql-ops-reviewer sql-ops-reviewer Public

    AI-powered SQL review for pull requests. Catches performance anti-patterns, security risks, and optimization opportunities. Uses Ollama locally — no cloud APIs.

    Python