somusan's ckpt This website is a virtual proof that I'm awesome https://soumya997.github.io/ Diffusion Policy Visuomotor Policy Learning Via Action Diffusion — Paper Explained The problem statement Diffusion Policy solves is Visuomotor manipulation... - Introduction We all know diffusion models like DALL-E and Stable Diffusion for their ability to generate stunning images by iteratively removing noise. But what if we applied that exact same principle to robotic control? Diffusion Policy is a groundbreaking approach to visuomotor manipulation that adapts the DDPM architecture to solve imitation learning. Instead of converting a latent vector into an image, it learns to denoise a random sequence into a highly accurate “action chunk”—a trajectory of 7-DoF end-effector poses. By conditioning this denoising process on camera observations rather than text prompts, Diffusion Policy gracefully handles the multi-modal, non-Markovian nature of... Fri, 13 Mar 2026 00:00:00 -0400 https://soumya997.github.io/2026-03-13-diffusion-policy-visuomotor-policy-learning-via-action-diffusion-paper-explained/ https://soumya997.github.io/2026-03-13-diffusion-policy-visuomotor-policy-learning-via-action-diffusion-paper-explained/ Best Project I've Worked on! My work on Autonomous vehicle at IIIT Delhi - One of the best projects I worked on was developing and deploying an end-to-end Traffic Light Following ADAS feature on an autonomous vehicle at IIIT Delhi. Here is a demo: Typically, an autonomous vehicle is given a start and goal pose, and it must plan an optimal path, avoid obstacles, and maintain a desired velocity while reaching the destination. For the traffic light following feature, the challenge was slightly different: given a start and end position, the vehicle needed to find an optimal path, stop before the stop line if a red light was detected, and resume movement when the... Tue, 25 Mar 2025 00:00:00 -0400 https://soumya997.github.io/2025-03-25-my-best-projects/ https://soumya997.github.io/2025-03-25-my-best-projects/ SAR-cus! Tricks that you wanna learn - 102 SAR Image Data 102 Gonna discuss all bout EM Wave Basics, need of SAR data, from pros and cons, data collection, more details. - TOC: Complex form of a sar image Deep learning applications complex form of a sar image: Physical properties of the terrain change the phase and amplitude of the EM wave. And SAR is used to calculate these two properties of a wave, which are amplitude(A) and phase(phi). It is represented as a number pair, (A cos(shi), Asin(shi)). How bright or dark the pixels are depended upon the strength of eco received. This amplitude and phase information is captured in the form of a complex number. Every complex number consists of a real part and an imaginary part. Here, Real $part(a)... Thu, 21 Apr 2022 00:00:00 -0400 https://soumya997.github.io/2022-04-21-sar-image-data2/ https://soumya997.github.io/2022-04-21-sar-image-data2/ Elastic InfoGAN - Paper Summary Motives/problems of Elastic info-GAN, solution of those problems. - Motive of the Paper: This paper tries to exploit mainly two faults of the Info-GAN paper, by keeping the other good qualities/improvements intact. These two shortcomings are, We have gone through Info-GAN, where it mainly focuses on generating disentangled representation by manipulating the latent code vector $c$ [please go through the info-GAN blog if the terms looks ambiguous to you]. In that we have seen that it considers that, this latent code vector is made of continuous and discrete latent variables. Now, one of the assumption they used were, the discrete latent variable has uniform distribution -> $c_1 \sim Cat(K=10,p=0.1)$... Sun, 17 Apr 2022 00:00:00 -0400 https://soumya997.github.io/2022-04-17-elastic-infogan-paper-summary/ https://soumya997.github.io/2022-04-17-elastic-infogan-paper-summary/ InfoGAN - paper summary and Notes Over view of Info-GAN, Need of info-GAN, workings, Objective function, derivations. - Important Notes: This is the 1st paper in which authors talked about “finding meaning in the latent variable”. If we change one dimention of a 100 dimentional vector, then we want to accociate meaning with that. “Basicully, we want to find the mapping from latent vector to the generated images.” And doing this will give us more control over the generated images. “InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. “ to control the output of the InfoGAN we use two things, latent vectos $z$... Sat, 16 Apr 2022 00:00:00 -0400 https://soumya997.github.io/2022-04-16-infogan-paper-summary/ https://soumya997.github.io/2022-04-16-infogan-paper-summary/ GAN - Generative Adversarial Nets Paper Summary and Notes Over view of GAN, need of GAN, workings, Objective function, derivations. - We know How GAN works, right?, GAN = “generative adversarial network”, it has two parts generative and Adversarial part. The main objective of GAN is to mimic a distribution, or we can say we use GAN for distribution modeling. We know what are the parameters we need to define a normal distribution or uniform distribution. Now, given a random distribution how do we know the distribution parameters, so to get the distribution parameters we use GAN. eg, You have height data of each individual student of a class. And given that you want to generate similar heights that match the... Fri, 15 Apr 2022 00:00:00 -0400 https://soumya997.github.io/2022-04-15-gan-paper-summary/ https://soumya997.github.io/2022-04-15-gan-paper-summary/ SAR-cus! Tricks that you wanna learn - 101 SAR Image Data 101 Gonna discuss all bout EM Wave Basics, need of SAR data, from pros and cons, data collection, more details. - Source TOC: Introduction EM Wave Basics Comparison of the optical image and SAR image Comparison b/t different Remote Sensing Methods Need of sar How it is captured Deep into range and azimuth resolution Different ranges k-band,x-band etc and applications Scattering Why it uses side projection Comparison with bats What is footprints What is antena Introduction: I have always been a computer vision fanatic, and I got to learn a lot more about this domain and its real-life applications from my internship at IIT, Kharagpur. I was assigned to work on PolSAR images for various tasks like image segmentation, colorization, etc.... Sun, 27 Mar 2022 00:00:00 -0400 https://soumya997.github.io/2022-03-27-sar-image-data1/ https://soumya997.github.io/2022-03-27-sar-image-data1/ Pytorch Training and Validation Loop Explained [mini tutorial] I always had doubts regarding few pieces of code used in the training loop, but it actually make more sence when you think of forward and backward pass. - Pytorch Training Loop Explained This there things are part of backpropagation, after doing forward pass by doing model(x_input) we need to calculate the loss for each back and update the parameters based on the derivatives. Doing loss.backward() helps to calculate the derivatives/gradients and optim.step() goes backward and update all the parameters. And we mainly use optim.zero_grad() to get rid of gradient accumulation problem, we prefer to claculate the gradients of each bach seperately. looking into the code might help. General Deep Learning pipeline looks like this, epochs = 5 for e in range(epochs): train_loss = 0.0 for data, labels in... Sun, 20 Mar 2022 00:00:00 -0400 https://soumya997.github.io/2022-03-20-pytorch-params/ https://soumya997.github.io/2022-03-20-pytorch-params/ GPU Utilization Tips for Pytorch Pipeline 8 tips to make your code utilize GPU, that it never done before. - GPU Utilization Tips for Pytorch Pipeline I was having this problem, but at the end I was kinda able to figure out the solution, thats why im sharing this here, as a note to myself. Check model and data is in cuda or not: Make sure to initialize the model on cuda, model = CNN(in_channels=cfg["in_channels"], num_classes=cfg["num_classes"]).to(device) shift your data on cuda, using data = data.to(device=device) targets = targets.to(device=device) Use W&B to monitor GPU utilization: Log in to w&b with the below code import wandb try: from kaggle_secrets import UserSecretsClient user_secrets = UserSecretsClient() api_key = user_secrets.get_secret("WANDB") wandb.login(key=api_key) anonymous = None except:... Sun, 20 Mar 2022 00:00:00 -0400 https://soumya997.github.io/2022-03-20-gpu-utilization/ https://soumya997.github.io/2022-03-20-gpu-utilization/ Make Your Terminal Look Awesome [Few Resources] Make your terminal look awesome,install windows terminal, WSL,Oh-my-posh,zsh,terminal icon - This is the repo(https://github.com/soumya997/windows-terminal-setup) where i have put my terminal setup files, and mentioned how i did that. I have 6 shells in my windows terminal Powershell, PowerShell 7, cmd, wsl ubuntu, bash Azure cloud shell Don’t really use every one of them, I usually prefer to use PowerShell, but got all of them too. I did not customize Azure shell as I don’t really use it and it takes time to load. I have discussed and shared the resources on how to make the windows terminal look like below. If you face any problem make an issue. Windows terminal... Wed, 09 Feb 2022 00:00:00 -0500 https://soumya997.github.io/2022-02-09-terminal/ https://soumya997.github.io/2022-02-09-terminal/ DataLoder in pytorch Three ways that I know After digging a little bit more I got to know that,there are three ways of loading data in a pytorch model... - When I started learning PyTorch, it was very difficult for me to understand how to load data as batches in a model as different tutorials were using different techniques to load data. After digging a little bit more I got to know that, there are three ways of loading data in a PyTorch model, datasets.ImageFolder,creating a custom class for loading data, and downloading directly from torchvision datasets and using DataLoader. And this is because file structure and the arrangement of data are different in different casses. By that I mean, say you have the cat vs dog classification dataset, in... Wed, 09 Feb 2022 00:00:00 -0500 https://soumya997.github.io/2022-02-09-pytorch-dataloader/ https://soumya997.github.io/2022-02-09-pytorch-dataloader/ Math Tricks Easily Finding the Answer to any Number Multiplied by 11 Most children do end up memorizing the multiplication tables up to 10. But this can be taken one step further by knowing how to multiply with 11 as well quickly. Let’s try multiplying 45 with 11. Separate the digits, 4 and 5 with a space between them, such as 4 [ ] 5. Now, carry out the addition of the two digits in the centre, such as 4 [4+5 = 9] 5. That’s your answer. 45 x 11 = 495. If the sum happens to be a two-digit number, such... Wed, 09 Feb 2022 00:00:00 -0500 https://soumya997.github.io/2022-02-09-math-tricks/ https://soumya997.github.io/2022-02-09-math-tricks/ Implement RNN Using Tf2.x This is a small and simple guide to RNNs, we will discuss all the basic requirements that you need to get started with RNNs... - Hi reader, this is a small and simple guide to RNNs, we will discuss all the basic requirements that you need to get started with RNNs from underneath concepts to code implementation. We will be implementing using TensorFlow 2.0. Table of Content: What is RNNs Many to One RNN Some thing about the data Loading Data Using Pandas Some Basic Exploratory Data Analysis Data Preprocessing Model Building Train Test Split Model Training Model Evaluation Conclusion What is RNNs? The R in RNN stands for recurrent which literally means repeating. Now you will think what is repeating here? Here repeating refers... Wed, 09 Feb 2022 00:00:00 -0500 https://soumya997.github.io/2022-02-09-implement-rnn-using-tf2/ https://soumya997.github.io/2022-02-09-implement-rnn-using-tf2/ Using kaggle Dataset api [Mini tutorial] This guide is specific to Kaggle NB. The main steps are same, you might need to ... - [Mini tutorial] Create Kaggle Dataset w/ kaggle API » This guide is specific to Kaggle NB. The main steps are same, you might need to change some code thats it. If you are facing problem copying the file: remove existing /root/.kaggle !rm /root/.kaggle recreate that directory /root/.kaggle !mkdir /root/.kaggle Follow these steps, copy your kaggle api token to the root directory: !cp -v /kaggle/input/random-private-files/kaggle.json /root/.kaggle check out your kaggle.json file [optional]: !cat /kaggle/input/random-private-files/kaggle.json >>> {"username":"soumya9977","key":"xxx99898y9uhoxxxxausdui"} initialize a dataset: Create a folder containing the files you want to upload, then run the below command on your dataset path, that will create a... Sat, 29 Jan 2022 00:00:00 -0500 https://soumya997.github.io/2022-01-29-create-kaggle-dataset-w-kaggle-api/ https://soumya997.github.io/2022-01-29-create-kaggle-dataset-w-kaggle-api/ Gleipnir Season 1 review Gleipnir comes under genre Action fiction, Adventure, Ecchi.It was just the right thing that I was looking for... - “Gleipnir” was just the right thing that I was looking for. Since the placement season has ended,also my long pledged days of restraining myself from watching animes and using social media have also seen their last days. Before that I watched the 5 episodes of “Sunny Boy”, I liked it too, but the thrill that was inside “Gleipnir”, was something else. I always had an attraction toward action fiction and adventure (bit of dark type) type animes where the protagonist is not much strong but tries his/her best and he/she does that for good a cause, and “Gleipnir” was a... Mon, 15 Nov 2021 00:00:00 -0500 https://soumya997.github.io/2021-11-15-Gleipnir-review/ https://soumya997.github.io/2021-11-15-Gleipnir-review/