Jekyll2026-04-04T19:45:17+00:00https://akshay326.com/feed.xmlAkshay SharmaWelcome to Akshay Sharma's personal blog. Here you will find my publications, resume, posts/blogs. You can review my open source contributions, follow my twitter handle, and get my book reading updates. Akshay SharmaChaiverse: Quick Start Guide2024-01-30T15:00:00+00:002024-01-30T15:00:00+00:00https://akshay326.com/2024/01/30/submitting-model-chaiverseChaiverse: Quick Start Guide

Welcome fellow explorer! Today we’ll dive into how as an Machine Learning Engineer, I submitted a fine-tuned model via the revolutionary AI crowdsourcing platform - the Chaiverse. Hold onto your hats, here we go! Here’s the colab notebook: https://colab.research.google.com/drive/1iBopRkUnF5_R0VUZpxICvgOThAoQP3UR?authuser=3#scrollTo=M4rauAXbTWI1

Welcome to Chaiverse! 🚀

Throughout this post, we’re going to walk you through submitting a Huggingface model to Chai, gather real-time user feedback, browse through the Chaiverse leaderboard, retrieve a lost model submission, and deactivate your model(this is for when you’re ready to make your own submission). Buckle up!

Installation and Login 👋

First, we need to install and login to the Chaiverse. For this, you’ll need your developer key, you can obtain this by joining the Chaiverse Discord. Don’t worry about remembering it, your terminal will keep track.

!pip install -U chaiverse
import chaiverse as chai

chai.developer_login()

Submitting Your First Model 🧑‍🚀

So, you’re ready to submit your model? Great! To do this, you first need to push your model to Huggingface. In tandem, also push the tokenizer and your model’s architecture configuration. This helps us verify your model type.

import chaiverse as chai

model_url = "ChaiML/phase2_winner_13b2" # Your model URL

generation_params = {
    'temperature': 0.99,
    'top_p': 0.2,
    "top_k": 40,
    "stopping_words": ['\n'],
    "presence_penalty": 0.5,
    "frequency_penalty": 0.5,
    "max_input_tokens": 1024,
    "best_of": 4
    }
submission_parameters = {'model_repo': model_url, 'generation_params': generation_params, 'model_name': 'my-awesome-llama'}

submitter = chai.ModelSubmitter(verbose=True)
submission_id = submitter.submit(submission_parameters)

Chat with Your Submission 💬

Once submitted, let’s verify your model by chatting with the deployed bots. Choose a bot and start up a conversation.

chatbot = chai.SubmissionChatbot(submission_id)
chatbot.show_available_bots()

chatbot.chat('leo', show_model_input=False)

Getting Model Feedback From Real Life Users 📖

model_feedback = chai.get_feedback(submission_id)
model_feedback.sample()

df = model_feedback.df
df.head()

raw_data = model_feedback.raw_data

Getting Chaiverse Leaderboard 🥇

Wondering how your model is performing? Check the leaderboard.

leaderboard = chai.display_leaderboard()

leaderboard = chai.display_leaderboard(detailed=True)

Retrieving Your Submission IDs + Deactivating Models 😶‍🌫️

Just in case you’ve misplaced your submission IDs, it’s easy to retrieve them.

submission_ids = chai.get_my_submissions()
submission_ids

Conclusion

That wraps up our walkthrough. We’ve covered everything from installations to submitting and evaluating your model. All the best on your journey!

Happy AI building!

]]>
Akshay Sharma
Fine-tuning Microsoft’s Phi-2 Machine Learning Model with DPO2024-01-23T15:00:00+00:002024-01-23T15:00:00+00:00https://akshay326.com/2024/01/23/microsoft-phi-2-dpo-finetunedFine-tuning Microsoft’s Phi-2 Machine Learning Model with DPO

Introduction

In this blog, we’ll focus on fine-tuning a cutting-edge language model from Microsoft, known as Phi-2, with Differential Privacy Optimization (DPO). We’ll do this using an open-source dataset, modern Python libraries and the power of Google’s Colaboratory. Our code examples are in Python and are designed for simplicity and clarity.

TL;DR

  • Try the model here: https://huggingface.co/akshay326/akshay326-dpo-finetuned-phi-2
  • Finetune the model on your own on Google Colab: https://colab.research.google.com/drive/1nwpBZQQGjYjzWpQdBaf3xhk4S8CDpVM4#scrollTo=YpdkZsMNylvp

Import Necessary Libraries

# -*- coding: utf-8 -*-
"""Fine-tune Phi-2 model with DPO.ipynb

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/drive/1nwpBZQQGjYjzWpQdBaf3xhk4S8CDpVM4
"""

!pip uninstall -y transformers
!pip install git+https://github.com/huggingface/transformers

!pip install -q datasets trl peft bitsandbytes sentencepiece wandb

import os
import gc
import torch

import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig
from datasets import load_dataset
from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_kbit_training
from trl import DPOTrainer
import bitsandbytes as bnb
from google.colab import userdata
import wandb

Here, we import an array of deep learning and language modeling libraries, such as Transformers, datasets, and TRL (for DPO-based training).

Setup Tokens and Model Names

# Defined in the secrets tab in Google Colab
hf_token = userdata.get('HF_TOKEN')
wb_token = userdata.get('WANDB_API_KEY')
wandb.login(key=wb_token)

model_name = "microsoft/phi-2"
new_model = "akshay326-dpo-finetuned-phi-2"

For security purposes, we retrieve the Hugging Face and Weights & Biases tokens from Google Colab’s secrets tab.

Load and Format the Dataset

def chatml_format(example):
    # Format system
    if len(example['system']) > 0:
        message = {"role": "system", "content": example['system']}
        system = tokenizer.apply_chat_template([message], tokenize=False)
    else:
        system = ""

    return {
        "prompt": system + prompt,
        "chosen": chosen,
        "rejected": rejected,
    }

# Load dataset
dataset = load_dataset("Intel/orca_dpo_pairs")['train']

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True,)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

# Format dataset
dataset = dataset.map(
    chatml_format,
    remove_columns=original_columns
)

# Print sample
dataset[1]

In this block, we load our training dataset and format it so that it suits our model’s requirements.

Training the Model with DPO

# Training arguments
training_args = TrainingArguments(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    gradient_checkpointing=True,
    learning_rate=5e-5,
    lr_scheduler_type="cosine",
    max_steps=200,
    save_strategy="no",
    logging_steps=1,
    output_dir=new_model,
    optim="paged_adamw_32bit",
    warmup_steps=100,
    fp16=not HAS_BFLOAT16,
    bf16=HAS_BFLOAT16,
    report_to="wandb",
)

# Fine-tune model with DPO
dpo_trainer.train()

This code chunk contains the main logic to fine-tune the Phi-2 model using Differential Privacy Optimization (DPO). We train the model using various hyperparameters.

Saving, Uploading, and Inferencing

# Save artifacts
dpo_trainer.model.save_pretrained("final_checkpoint")
tokenizer.save_pretrained("final_checkpoint")

# Flush memory
del dpo_trainer, model, ref_model
gc.collect()
torch.cuda.empty_cache()

# Merge base model with the adapter
model = PeftModel.from_pretrained(base_model, "final_checkpoint")
model = model.merge_and_unload()

# Save model and tokenizer
model.save_pretrained(new_model)
tokenizer.save_pretrained(new_model)

# Push them to the HF Hub
model.push_to_hub(new_model, use_temp_dir=False, token=hf_token)
tokenizer.push_to_hub(new_model, use_temp_dir=False, token=hf_token)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])

Once the fine-tuning is complete, we save and upload the model using the push_to_hub method. Then we leverage the pipeline utility for inferencing and print the resulting text.

That’s it! This post covered how to fine-tune Microsoft’s Phi-2 model using Differential Privacy Optimization. Happy coding!

]]>
Akshay Sharma
Mistral 7B Document Chat: Your Casual PDF Companion2024-01-11T15:00:00+00:002024-01-11T15:00:00+00:00https://akshay326.com/2024/01/11/RAG-mistral-huggingfaceUnleashing Mistral 7B: PDF Conversations Made Easy!

Hey, document enthusiasts! 📄 Ready to level up your PDF game? Meet Mistral 7B Document Chat – the coolest sidekick for all your PDF queries.

Screenshot 2024-01-11 at 7 44 09 PM

What’s the Buzz?

Quick and Casual PDF Convos

Mistral 7B is all about easy-breezy PDF interactions. Ask questions, throw in follow-ups – this chat gets it! 🚀

References? Got ‘Em!

Worried about clarity? Fear not! Mistral 7B drops document references in its responses. So, you know where the magic info is stored.

How to Dive In

  1. Ask Away: Hit Mistral 7B with your burning PDF questions.
  2. Chill for Answers: The AI will cook up responses using its retrieval-augmented magic. 🔮
  3. Follow-Up Banter: Keep the convo flowing with follow-up questions.

Get Your Hands Dirty

Wanna see it in action? Check out the Live Demo. Feeling adventurous? Dive into the Code and see the behind-the-scenes wizardry.

Wrapping it Up

Mistral 7B Document Chat is your laid-back PDF buddy. No formalities, just the info you need, when you need it. Try it out and embrace a new era of PDF coolness!

Ready to chat? Head over to the Live Demo or peek at the Code. Let the PDF party begin! 🎉

]]>
Akshay Sharma
Vertical and Horizontal Scaling in databases2022-01-26T19:00:00+00:002022-01-26T19:00:00+00:00https://akshay326.com/2022/01/26/nosql-vs-sqlBefore we delve into nature of scaling, let’s dig the history of databases.

NoSQL v/s SQL

SQL databases were developed in 1970’s to reduce data duplication since data storage was costly compared to developer/maintaince costs then. NoSQL (or ‘not SQL’) databases were an alternative developed after the internet boom in 2000’s to allow faster scaling, low database maintainence costs, and making schema changes easy.

Vertical v/s Horizontal Scaling

SQL databases require vertical scaling

Vertical scaling refers to manipulating resources (compute, storage, etc) built on a rigid base or architecture - schema in the case of a database - to achieve scaling. On the other hand, horizontal scaling means manipulating (adding or removing) replicas of a system to achieve. Advances in semiconductor technology resulted in affordable and larger storage solutions. This was followed in tandem by rapid adoption of cloud technologies.

But how scaling affects SQL databases?

Consider a hypothetical social media company that has a SQL database worth ~100GB in size (on RAM) stored on cloud, say an EC2 instance. If the platform suddenly experienced a ten-fold increase in traffic, it would require increasing the RAM of EC2 instance to 1TB roughly (or a lower value if platform developers come up with a way to optimize database schema in real-time). This might be a costly operation considering that a typical platform with over a million users can garner databases in petabytes!

On the other hand, if the company had a NoSQL database of the same size and experienced a ten-fold increase in traffic, it would require adding nine more EC2 instances with similar instances, orchestrated by Kubernetes or AWS Fargate.

References

  • https://blog.teamtreehouse.com/should-you-go-beyond-relational-databases
  • http://nickcraver.com/blog/2013/11/22/what-it-takes-to-run-stack-overflow
  • http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.html
]]>
Akshay Sharma
GSoC’20 - A summer full of optimization and Julia2020-08-23T19:00:00+00:002020-08-23T19:00:00+00:00https://akshay326.com/2020/08/23/gsoc2-finalLatest updates on the work done. Post-GSoC tasks

Hi all! This is the final blog in the series marking my progress in Differentiable Optimization Problems. You may enjoy:

  1. Reading my first blog.
  2. Checking the code repository here.
  3. Reading the docs @ https://aks1996.github.io/DiffOpt.jl/dev/.

The project - progress

Milestones completed:

  1. Using MatrixOptInterface.jl as a dependency in DiffOpt.jl - PR#37
  2. Fix MathOptSetDistances.jl dependency - PR#35
  3. Support sparse structures in DiffOpt - PR#47. The biggest bottleneck in inducing sparsity was matrix A; thanks to @blegat, I drew inspiration from an SCS.jl PR
  4. Minor updates in converting MOI model to MatrixOptInterface.jl model - PR#7
  5. Updated docs with manual and examples

What do we mean by differentiating a program?

For a primer on differentiable optimization, refer to the introduction page in the documentation or this TF Dev Summit’20 video.

What have we achieved? How can you use it?

As of now, one can differentiate:

  • convex conic programs (with linear objectives), and
  • convex quadratic programs (with affine constraints)

written in MOI, and the theoretical models rely, respectively, on:

I’ve included some examples (for both the methods) in the documentation, plus plenty of examples (with reference CVXPY code) in the tests folder. For a primer on matrix inversion, I would suggest this example. If you face any problem, feel free to create an issue.

Post-GSoC improvements

Although we’ve almost approached the last week for GSoC, I’ll make sure to improve on the following things post-GSoC too.

  1. Making the code independent of SCS.jl (Issue#38) - although we did resolve SCS specific code in MatrixOptInterface.jl, but some part of differentiation still relies on SCS specific code. Removing this dependency should generalize differentiation to any available conic solver
  2. Derivative in terms of ChainRules.jl - this is one of the foremost suggestions by @matbesancon; although we couldn’t include AD in GSOC timeline well, we’ve begun discussion with ChainRules.jl community
  3. Time benchmarking - it’ll be really cool to make DiffOpt.jl fast. Started a beginner PR to profile computation time - PR#40
  4. From MOI to JuMP - since the code is already working with MOI backend, it shouldn’t be hard to differentiate JuMP models too. I’ll be interested in improving the interface and API usage using JuMP
  5. Many specific improvements in MODistances.jl and MatrixOptInterface.jl

Final thoughts

I had a really good experience with the GSoC project. As I’ve already mentioned in my previous blog, the JuMP developer community is indeed welcoming and helpful.

A shoutout to Julia project maintainers, JuMP developer community, and my mentors!!

It wouldn’t be wrong to say that at times I was confused with how to make an effective segue from optimization theory to working models. We’ve been able to develop the codebase because of the biweekly GSoC standups, the JuMP developer call, numerous issues/PR discussions, and slack threads.

100+ commits, 20+ PRs and 3000+ lines of code and counting

Finally, I would like to thank Google Summer of Code team for providing us this great learning opportunity amidst the COVID pandemic.

]]>
Akshay Sharma
GSoC’20 - Create Dependencies and Documentation2020-08-10T19:00:00+00:002020-08-10T19:00:00+00:00https://akshay326.com/2020/08/10/gsoc2-w8Move DiffOpt.jl code to other JuMP-dev packages; Begin documenting the code

Hi all! This blog is one of a series marking my progress in Differentiable Optimization Problems. You may enjoy reading my first blog.

The project - progress

Milestones completed:

  1. Adding projections on cones and their derivatives to MathOptSetDistances.jl (PR#5) and using it as a dependency in DiffOpt.jl (PR#35)
  2. Moving the matrix builder code to MatrixOptInterface.jl - PR#7
  3. Initial version of documentation - https://aks1996.github.io/DiffOpt.jl/dev/

Minor contribution (PR#1098) in JuMP Development Sprint :)

What’s next?

Finaly we arrive at the last two weeks of the summer program. We’ll focus on

  1. Supporting sparse structures to speed up computations - Issue#31
  2. Comprehensively document the codebase developed

Staying in touch

If you’re interested in knowing more about the project, join the #opt-diff-gsoc channel on julia slack.

]]>
Akshay Sharma
GSoC’20 - Support Semidefinite constraints2020-07-27T19:00:00+00:002020-07-27T19:00:00+00:00https://akshay326.com/2020/07/27/gsoc2-w6Support SDP constraints, add windows appveyor build, make DiffOpt.jl robust

Hi all! This blog is one of a series marking my progress in Differentiable Optimization Problems. You may enjoy reading my first blog.

The project - progress

Milestones completed:

  1. Adding appveyor build to CI - PR #29
  2. Testing the whole contconic.jl test suite and supporting the latest Julia version on CI - PR #28
  3. Extracting matrices dynamically and supporting SDP constraints - PR #30

Bonus: Had JuMP Developer call on July 24th - find the related notebook here

What’s next?

For the next two weeks, we’ll focus on

  1. Adding matrix builder code to MatrixOptInterface.jl and using it as a dependency in DiffOpt.jl
  2. Adding projections on cones and their derivatives to MathOptSetDistances.jl and and using it as a dependency in DiffOpt.jl

Staying in touch

If you’re interested in knowing more about the project, join the #opt-diff-gsoc channel on julia slack.

]]>
Akshay Sharma
GSoC’20 - Differentiating conic programs2020-07-10T19:00:00+00:002020-07-10T19:00:00+00:00https://akshay326.com/2020/07/10/gsoc2-w4Solving and differentiating convex conic programs w.r.t. problem data, improving solver interface, and benchmarking with diffcp

Hi all! This blog is one of a series marking my progress in Differentiable Optimization Problems. You may enjoy reading my first blog. In this post, I will briefly describe how we enabled DiffOpt.jl to differentiate a conic program with an SOCP constraint and improved the interface by adding more tests.

The project - progress

Milestones completed:

  1. Testing solver for quadratic programs contquadratic.jl - PR#19
  2. Find projections (and derivatives) on dual cones. Differentiate a simple conic program with an SOCP constraint Issue#24 - PR#26
  3. Add several MOI tests to improve solver interface - PR#28

What’s next?

For the next two weeks, we’ll focus on

  1. Differentating conic programs with SDP constraints
  2. Benchmark differentiation of conic programs with diffcp

Staying in touch

If you’re interested in knowing more about the project, join the #opt-diff-gsoc channel on julia slack.

]]>
Akshay Sharma
GSoC’20 - Differentiating LPs and QPs2020-06-29T19:00:00+00:002020-06-29T19:00:00+00:00https://akshay326.com/2020/06/29/gsoc2-w2Creating an optimizer layer, implementing MOI tests, and benchmarking with QPTH, CVXPY

Hi all! This blog is one of a series marking my progress in Differentiable Optimization Problems.

You may enjoy reading my first blog. In this post, I will briefly describe the MOI layer we’ve created over DiffOpt.jl and how we’be made it more robust by adding more tests - both testing interface usage and benchmarking against existing QPTH and CVXPYLayers projects.

The project - progress

Milestones completed:

  1. Making DiffOpt robust - Implementing contlinear.jl MOI tests for linear programs - PR#19
  2. Turning DiffOpt to a MOI optimization layer Issue#12 - PR#18

What’s next?

For the next two weeks, we’ll focus on

  1. Implementing tests from contquadratic.jl
  2. Supporting conic programs in DiffOpt

Staying in touch

If you’re interested in knowing more about the project, join the #opt-diff-gsoc channel on julia slack.

]]>
Akshay Sharma
GSoC’20 begins - Differentiable Optimization problems2020-06-13T19:00:00+00:002020-06-13T19:00:00+00:00https://akshay326.com/2020/06/13/gsoc2-introLearn more about initial steps while developing DiffOpt.jl to differentiate some optimization problems

Hi all! I am excited about my project Differentiable Optimization problems in Google Summer of Code 2020. This is the 1st blog in a series, which was encouraged by our mentors at NumFOCUS. In this blog, I will share progress about the project and my involvement in the JuMP.jl community so far. You can find the proposal and more about the GSoC project here.

All about GSOC

This is my 2nd time doing a GSoC Project. Before drafting this blog, I reflected upon my previous GSoC project and how I shared my experiences then. If you’re interested in how to apply for GSoC, you can find all about it here.

Why I chose this project?

At the inception of COVID-19 crises and beginning of a nationwide lockdown in our country, one of my batchmates suggested this project to me. It was enticing to me in first sight as I was pursuing my Master’s project in large scale convex optimization. I began finding more about the project, JuMP, and Julia. As I went through the details of the project - the requirements, deliverables - it became clear to me that it perfectly aligns with my interests. Also, this was a great learning opportunity for me as I was not well versed in Julia and had almost no experience in differentiable optimization.

A shoutout to the Julia community

While drafting the GSoC proposal and making initial contributions, I had a excellent experience. Specifically, Julia is a really good language - just after my first few Julia/JuMP scripts, I enjoyed writing models in pure symbolic mathematics. And, the Julia community is welcoming, and I received help from several Julia project maintainers, contributors and my prospective mentors (cheers to Benoît, Joaquim, Mathieu, Mario). It wouldn’t be wrong to say that such a warm experience propelled me to contribute more to this great community!

Why are Differentiable optimization problems important?

Differentiable optimization is a promising field of convex optimization and has many potential applications in game theory, control theory and machine learning (specifically deep learning - refer this video for more). Recent work has shown how to differentiate specific subclasses of convex optimization problems. But several applications remain unexplored (refer section 8 of this really good thesis). With the help of automatic differentiation, differentiable optimization can a significant impact on creating end-to-end systems for modelling a neural network, stochastic process, or a game.

The project - overview

JuMP is a modeling language for mathematical optimization embedded in Julia. It supports many solvers for a variety of problem classes and has many features. The project aims at equipping JuMP with the ability to differentiate specific convex optimization problems in Julia. In conjunction with another concurrent GSoC project, we will enable JuMP to differentiate an optimization problem with respect to parameters (for instance, its problem data).

The project - progress

The GSoC community bonding period and first two weeks of the GSoC coding period have passed at the time of writing this blog. You can find the project repository here (DiffOpt.jl). Currently, DiffOpt.jl is able to differentiate LPs and QPs written in MOI, taking references largely from QPTH. We are continuous discussion on the slack channel (refer the next section). We are having a bi-weekly catchup call for discussing issues, setting objectives (plus I’m attending JuMP monthly developer call too!)

Our objectives for the next 2 weeks:

  1. Making DiffOpt robust - improving testing on several optimization problems
  2. Turning DiffOpt to a MOI optimization layer
  3. Supporting conic programs in DiffOpt

Staying in touch

If you’re interested in knowing more about the project, join the #opt-diff-gsoc channel on julia slack.

]]>
Akshay Sharma