Skip to content

sainikhilp/sainikhilp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 

Repository files navigation

Hi, I'm Sai Nikhil Pillai πŸ‘‹

I’m an MS in Information Science ’26 student at UT Austin with 3 years of experience at FHLB Dallas and Accenture. I am having a strong interest in data engineering, cloud platforms, and AI/ML. I enjoy building end-to-end data systems that turn raw data into reliable, analytics-ready insights.

About Me

  • MSIS ’26 at The University of Texas at Austin.
  • Background in data engineering, cloud automation, and applied AI.
  • Hands-on experience in Azure, Databricks, PySpark, SQL, Delta Lake, and streaming architectures.
  • Interested in building scalable, production-style data pipelines, intelligent workflows and ML systems.

Technical Skills

  • Programming & Libraries: Python, Pandas, scikit-learn, PyTorch, SQL, Power Automate, Power Apps, Power Query
  • Cloud & Data Engineering: PySpark, dbt, Azure Data Factory, Azure Data Lake Storage, Azure Event Hubs, Delta Live Tables, Lakeflow Jobs, Outsystems
  • Data Visualization & Reporting Matplotlib, Seaborn, Power BI, Tableau, Excel
  • Applied AI: Prompt Engineering, LLMs, RAG Pipelines, Vector Search, Semantic Similarity
  • Tools: Git, VS Code, Jupyter Notebook, Visio, monday.com

Projects

  • movielens-dbt-elt-pipeline
    Layered dbt project on Databricks transforming 20M+ MovieLens ratings into clean star-schema dimensions and facts for analytics, with robust data quality tests, SCD Type 2 snapshots for user tag history, seed-based movie enrichment, and interactive dbt docs showcasing end-to-end DAG lineage.

  • RideStream Data Pipeline
    Built an end-to-end Azure lakehouse pipeline for ride-hailing analytics, combining batch ingestion, real-time event streaming, Databricks transformations, and dimensional modeling into a silver OBT and gold star schema.

  • SoundWave Azure Medallion Pipeline
    Developed a medallion-architecture pipeline using Azure Data Factory and Databricks to ingest, transform, and organize music data for analytics and reporting.

  • pi-level-rag-curriculum-mapper
    Designed a multi-method NLP system that maps BSN nursing syllabi to AACN competency domains using LDA, NER, BioWordVec, BERT embeddings, FAISS, and PI-level RAG to deliver interpretable, evidence-backed curriculum alignment.

The full code for these projects is available in the pinned repositories below for a detailed view of the implementation.

Career Interests

I’m interested in roles in: Data Engineering, Analytics Engineering, Cloud Data Platforms, AI / ML Engineering, Applied Data Science

Contact

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors