Skip to content
View soumitra9's full-sized avatar
๐ŸŽฏ
Focusing
๐ŸŽฏ
Focusing
  • Autodesk
  • San Francisco

Block or report soumitra9

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
soumitra9/README.md

LinkedIn Email GitHub


About Me

Senior Data Scientist at Autodesk with 6+ years of experience building production ML systems that deliver measurable business impact. I specialize in the full ML lifecycle โ€” from feature engineering and model development to deployment, monitoring, and stakeholder dashboards.

  • ๐Ÿข Autodesk โ†’ ML for cloud optimization, anomaly detection, LLM-powered tooling
  • ๐Ÿ“Š Expedia โ†’ Large-scale propensity modeling for 200M+ customers
  • ๐Ÿค– Tata Elxsi โ†’ CNN-based health monitoring & deep learning R&D
  • ๐Ÿ“ Bay Area, CA ย |ย  ๐ŸŽ“ MS Computer Science, Penn State

Impact Highlights

๐Ÿ’ฐ  ~$4M annual cloud compute savings via cost-aware ML model at Autodesk
๐Ÿ“‰  12% reduction in customer opt-out rates at Affine Analytics
๐Ÿ‘ฅ  200M+ customers served through personalized recommendation systems
๐Ÿšจ  Real-time API anomaly detection preventing production incidents
๐Ÿค–  Text-to-GraphQL interface via LLMs for streamlined developer experience

Tech Stack

Languages

Python PySpark SQL

ML / AI

PyTorch TensorFlow scikit-learn LangChain HuggingFace

Data & Cloud

Snowflake AWS Apache Spark Airflow DBT

MLOps & Tooling

SageMaker Docker Git Looker


Featured Projects

๐ŸŒŸ AI / LLM Projects

Chat with Documents โ€” Advanced RAG

A local RAG web app with two production-grade pipelines built on top of a swappable LLM/embedding/vectorstore backend.

Feature Details
๐Ÿ“„ Document Q&A Upload PDFs โ†’ persistent vector store โ†’ grounded answers with citations (filename, page, excerpt)
๐ŸŽฏ Resume Tailor 6-agent pipeline: Resume Understanding โ†’ Job Analysis โ†’ Gap Analysis โ†’ Suggestions โ†’ Tailoring โ†’ Judge
๐Ÿ”„ Runtime config Swap LLM, embeddings & vectorstore from the UI without restarting
๐Ÿค Model support Ollama (local) ยท OpenAI ยท Anthropic

Python LangChain RAG Vector Store Agentic AI Ollama OpenAI Anthropic


Snowflake MCP Server โ€” Pure Async

A production-ready MCP (Model Context Protocol) server for Snowflake using the low-level async API โ€” giving full control over server lifecycle, tool registration, and async execution.

Feature Details
๐Ÿ”Œ Tools exposed execute_query, list_databases, list_schemas, list_tables, describe_table, check_database_exists
๐Ÿ›ก๏ธ Query safety Read-only validation (SELECT, WITH, SHOW, DESCRIBE, EXPLAIN only)
โš™๏ธ Production features Persistent connection ยท health checks ยท timeout control ยท query tagging ยท cache control ยท row limiting
๐Ÿ“ Config Pydantic models for type-safe, validated configuration with clear startup errors

Python MCP Snowflake Async LLM Tooling Pydantic


Computer Vision

Project Description Stack
๐Ÿซ COVID-19 Chest X-ray Detection Transfer learning with VGG16 for binary medical image classification PyTorch ยท VGG16 ยท OpenCV

Classical ML & NLP

Project Description Stack
๐Ÿท๏ธ StackOverflow Tag Predictor Multi-label classifier on 6M+ posts scikit-learn ยท TF-IDF ยท Linear Models
๐Ÿงฎ Neural Network from Scratch Two-layer NN with backprop โ€” no frameworks Python ยท NumPy

GitHub Stats


What I'm Exploring

current_focus = {
    "LLMs":    ["Fine-tuning", "RAG systems", "Agentic workflows"],
    "MLOps":   ["Feature stores", "Model monitoring", "SageMaker pipelines"],
    "GenAI":   ["LangChain", "LangGraph", "Prompt engineering"],
}

Building ML systems that don't just work in notebooks โ€” they work in production.

Profile views

Popular repositories Loading

  1. Predict-tags-on-StackOverflow-with-linear-models Predict-tags-on-StackOverflow-with-linear-models Public

    The repository consists of Multi class classifier implements on stack Overflow data to predict tags

    Jupyter Notebook 2 6

  2. Deep-Neural-Network-From-scratch Deep-Neural-Network-From-scratch Public

    The repository contains two layer Deep Layer Network programmed from scratch using Python

    Python 1 1

  3. Covid-19-Detection-from-chest-Xrays Covid-19-Detection-from-chest-Xrays Public

    VGG16 model for detecting Covid-19

    Python 1

  4. InteractiveStory InteractiveStory Public

    The Android App entails a seven paged Story,It asks the user for Username and frames a story based on name.

    Java

  5. Hangman Hangman Public

    Made a Hangman game using Java on IntelliJ IDE. The gameโ€™s purpose is to display an empty word, represented in dashes and the user is prompted to guess the letters of the word. It shows the remainiโ€ฆ

    Java 2

  6. Stormy Stormy Public

    Android Weather Forecast Application

    Java