Kavish Hukmani

Laundry Insights From Scraping 4000 Washer/Dryers in San Francisco

2025-12-12T15:46:00+00:00

A few months ago, I had the terrible misfortune of needing to house-hunt in San Francisco’s insanely competitive rental market. After weeks of getting beat out for every apartment I applied to, I finally found one where I wasn’t applicant #11. It was perfect and checked all my boxes except one: an in-unit washer/dryer.

The situation wasn’t the worst though. There was a large laundry room in the basement, and an app that shows the current status of every machine. That got my wheels spinning. Maybe I could figure out the laundry habits of a couple hundred of my new neighbors to avoid standing around waiting for someone else’s cycle to end.

The app, Wash Connect, requires you to sign-up for your specific laundry room by either via Bluetooth (while in proximity of the room) or using an eight digit location code. Once successfully signed up, you don’t need to be physically in the room to view the available machines.

I started poking around to see if they had a web app since that would make my life much easier. But alas, Android and iOS only. I went down the rabbit hole of intercepting traffic from an Android app and ended up with a rooted virtual device and HTTP Toolkit. Next, I created a burner account, found the APIs being called, and a few Claude prompts later, I had a scraper running on a Linode box.

I checked back in after a few days and everything was going better than expected, i.e., nothing broke. I also started looking at the code Claude wrote and noticed there was no authentication on the API! I was baffled that I had somehow missed that. I thought to myself, can I analyze laundry habits for the entire country?

While this was running, I spent the next few days were spent solving a different problem: how do I tie these laundry rooms to their addresses? While digging through the API responses, I found a few nuggets of information that could help. The room name field was the first line of the room’s address. I could easily verify this for my apartment. The other piece of the puzzle was that the first two digits of a location’s identifier represented the state code. I was still missing key details like the city and ZIP code, but I had enough to mostly piece things together. Pasting these two parts into Google usually identified the full address. I ended up using the Google Maps Geocoding API to automate exactly this. It wasn’t perfect, but it seemed good enough and was all I had.

I came back to my scraper a week later to find that my requests were suddenly failing. After some digging, I concluded that my box’s IP had been blocked, probably because it was generating a huge amount of traffic and someone noticed. Determined to finish a project I had gotten so far into, I decided to scale down. I spun up a new Linode box and filtered my scraper for only locations in San Francisco.

I left it running for the entire month of October. Here are the results.

Key Metrics and Plots

• Wash Connect operates 451 laundry rooms in San Francisco with 2,095 washers and 2,023 dryers
• Most Wash operated locations have fewer than 10 machines, likely condos and smaller apartments
• While most rooms have a 1:1 W/D ratio, that’s not always the case. Additionally, Dryer usage is typically higher because the cycles run longer
• Time of day matters far more than day of week. That being said, the peak laundry time is Sunday 11 a.m.

Laundry Room Locations by Neighborhood

Washer/Dryer Usage Trends

Trends by Neighborhood (Click to expand)

Circuit Sketch

2025-11-17T11:53:00+00:00

View source code on GitHub

Circuit Sketch

Draw shapes and match them against real Formula 1 circuit layouts using shape recognition algorithms.

🏎️ Live Demo: https://circuit-sketch.kavi.sh/

Features

🎨 Draw & Match: Draw shapes to match F1 circuits with instant feedback
🏎️ Browse Circuits: Explore all F1 circuits with detailed information
📊 Multiple Algorithms: Choose between Hausdorff, Frechet, or turning angle matching
📚 Rich Data: Pre-loaded Wikipedia facts and statistics
🌓 Dark Mode: Full theme support
⚡ Static Hosting: No server required, works offline

Quick Start

npm install
npm run dev

Deployment

The site is automatically deployed to GitHub Pages at circuit-sketch.kavi.sh on every push to the main branch via GitHub Actions.

⚠️ First-time setup required: To enable deployment, configure GitHub Pages in repository settings:

Go to Settings → Pages
Set “Source” to GitHub Actions (not “Deploy from a branch”)
Push to main branch to trigger deployment

Data Management

Circuit layouts and Wikipedia data are stored locally in src/data/ and updated monthly via GitHub Actions.

Manual update:

npm run data:pull

Individual updates:

npm run data:circuits     # Update circuit layouts from bacinger/f1-circuits
npm run data:wikipedia    # Update Wikipedia data

Adding New Circuits

Add the circuit to bacinger/f1-circuits
Add Wikipedia mapping in scripts/scrape-wikipedia.ts (WIKIPEDIA_MAPPING object)
Run npm run data:pull

Build

npm run build
npm run preview

License

MIT License - Copyright GitHub, Inc.

Acknowledgements

The checkered flag emoji used in the logo and favicon is from Twemoji by Twitter, Inc and other contributors. Licensed under CC-BY 4.0.

ReIndigo

2025-09-20T19:38:00+00:00

ReIndigo

Continuing the legacy of Indigo

View source code on GitHub

ReIndigo is a continuation of the Indigo Jekyll theme, which I’ve been using as my website theme for over 6 years. While Indigo still works great, over time I’ve added some features that improved usability and fit my needs better — and that’s how ReIndigo was born.

Features

Interactive Plots

Embed interactive visualizations in your posts with Bokeh

LaTeX Support

Write math seamlessly using MathJax

\[Y_i = f(X_i, \beta) + \varepsilon_i\]

Improved Tags Page

Tags are now counted and sorted by most common for easier navigation
View Tags →

Sitemap

A sitemap.xml is generated automatically via the jekyll-sitemap gem for better SEO
View Sitemap →

Credits

ReIndigo builds on the excellent work of Sergio Kopplin and all the contributors to the Indigo theme. Without their design and foundation, this project wouldn’t exist.

Purchasing A Domain in 2025 — Enshitiffication

2025-04-26T08:58:00+00:00

This blog is primarily about my experience with acquiring kavi.sh

Acquiring a domain has been a solved problem for several years now. You go to Namecheap.com, type in what you want, and if it’s available you can have it for around $15/year.

All is good while you stay on this happy path. As soon as you want a domain remotely desirable, you fall into this rabbit hole of domain squatting, reverse auctions, and a general enshitiffication of the process.

Recently, I went about trying to acquire my namesake domain, kavi.sh. I was determined to acquire it after five years of regret from not doing so when it was last up for sale. This was compounded by the fact that if the right person acquires it, they would sit on it forever.

I’m sure this is a feeling common to many others who are trying to get that perfect domain for their business or blog and is exactly what the registries and registrars exploit.

Domain squatting has been an issue for as long as my Gen Z brain remembers and is nothing new. A random “entrepreneur” will purchase a large number of desirable domains, do nothing but sell them for a huge markup if you reach out. I’m sure this business model is successful given the high number of competitors thriving in this space and the fact that it has existed for so long. I also know it’s lucrative because certain litigious registrars, who I won’t name for legal reasons, take part in it. If the domain is short or uses common words, you can expect this to be the case. If you also show any signs of interest in a domain, this is often the case. This includes looking up a domain’s availability, WHOIS, or contacting the owner through some of these service providers and suddenly the domain gets marked as premium and is not eligible at the usual price or gets squatted.

A simple way to combat this has been to just wait for the squatter to lose interest and then snipe the domain using a drop catching service.

Now, I approximately knew all of the above and what to expect but I found that this goes way deeper than I expected.

I had been keeping an eye on the domain for years now, occasionally checking the WHOIS information and setting a calendar reminder around its expiry. Once I found out that it had not been renewed and its grace period was about to lapse, I began to check daily while figuring out my drop catching options. I primarily used Namecheap for my lookups since I knew that using a less reputable provider would result in it being squatted again.

While trying to drop catch a .sh domain, I learnt that most of the reputed providers do not support it. I found park.io (not sponsored) as my only option. If others exist I wasn’t able to find any. Snapnames does exist but largely didn’t work in my experience.

Domain Lifecycle (source: ICANN)

As the days went by, I expected to see it get released and caught as per ICANN’s lifecycle and data points for similar domains but was largely surprised when this wasn’t the case. I reached out to the support email at park.io to better understand this and learnt that my domain was selected as a premium domain by the .sh registry (NOT any registrar service)!

There was basically no information online about this for .sh domains and most of what I learnt was through them and my experience. This meant that the domain does not drop immediately and goes through a reverse auction process. This whole process seems to be administered by Identity Digital.

Now this was odd because there were similar domains that dropped directly. I suspect that it was because it received hits on the WHOIS records.

WHOIS Lookup

Soon after, the WHOIS records no longer showed any useful information but mentioned that it was in the Identity Digital dropzone. There was no information about how many people are interested, what was the current price or what the timeline for price drops or general availability would be. All I could do was put in my bid at park.io (who also expressed frustration about these points) and trust the process.

I ended up getting into my own head, increasing my bid over a few days while being stuck with no information and spending way too much on a domain nobody else was probably interested in.

I get that TLDs can be a not insignificant income for island nations [1, 2] but largely think that this process is purposely opaque and manipulative. They could do Second-price auctions like ads if multiple people are interested or hike up the price for all of their domains. Individually exploiting data points to maximize profit is true enshitiffication of the domain registration process.

I realize this conclusion is somewhat rich coming from someone who does Data Science for a living. I guess this is what being at the other end of some analytics team’s KPIs feels like.

Fast Travel

2025-03-21T00:00:00+00:00

View source code on GitHub

Try it out

Fast Travel

Supercharge your search bar

Fast Travel is a Bunnylol-inspired, configurable, command-based search engine replacement that you can easily host on GitHub Pages. It streamlines your searches and navigates the web quickly with a lightweight, static engine designed for speed and simplicity.

For example:

Typing g kittens will search for “kittens” on Google.
Typing ddg privacy will search for “privacy” on DuckDuckGo.
Typing r/technology will take you directly to the r/technology subreddit.
Typing hn will take you directly to Hacker News.
Typing apps balatro will search for “balatro” on Steam on desktop, the Google Play Store on Android, and the App Store on iOS.
Typing $AAPL will open the Yahoo Finance stock quote page for AAPL.

Features

Configurable Commands: Easily modify, add, or remove search commands.
Lightweight & Fast: Optimized for speed with a minimalistic design and static hosting.
Device-Aware Routing: Automatically directs your queries based on your device (Windows, Android, iOS, etc.).
Typo Detection: Automatically detects and suggests corrections for misspelled commands.

Setup Instructions

Fork the Repository:
Click the “Fork” button on the repository page to create your own copy.
Deploy on GitHub Pages:
- Update the page URL in your repository settings (navigate to Settings > Pages > Custom domain) or configure the CNAME file.
- It is recommended to use a subdomain (e.g., fast-travel.yourdomain.com) or a root domain (e.g., yourdomain.com) rather than deploying on a subdirectory (e.g., yourdomain.com/fast-travel), as not all platforms correctly recognize subdirectory URLs for search engines.
Optional – Enable GitHub Actions:
If you experience workflow failures, consider enabling GitHub Actions with write access to ensure necessary updates are applied.
Optional – Update Your Config:
Open the config.json file to modify, add, or remove commands and routes.
Replace Your Search Engine:
Follow these steps to replace your default search engine in popular browsers:

Chrome on Windows/MacOS

Cover Letter Generator LLM

2023-12-17T22:52:00+00:00

View source code on GitHub

Introduction

Cover Letter Generator App was created as part of the interview process at ALMA (https://www.tryalma.ai/). The project took approximately 4-5 hours to complete, with around 3 hours dedicated to creating backend and UI, and an additional 1-2 hours for fine-tuning the prompt to optimize results.

LLM Powered

The LLM used is gpt-3.5-turbo for its cost-effectiveness and impressive capabilities at the given price range.
Local models were avoided due to a lack of available compute resources. However, the design through LangChain allows for easy switching to alternative models. The repo contains two main prompts, each designed with detailed instructions focusing on a different style of cover letter. These prompts can easily be modified or expanded to include additional styles.

Steps

Upload Resume:
- Allows the user to upload a resume in PDF format. The resume is then converted to text using the pypdf2 library.
Input Job Listing:
- Allows the user to input a job listing URL or text. A simple LLM based scraper is used to extract the relevant job description and company information from the URL.
  - Note: For websites which are hard to scrape, please use the text option to enter the job description manually.
Style Selector:
- Allows the user to select a style of cover letter, determining which prompt will be used to generate the cover letter.
  - The Classic style is designed to be a more traditional cover letter with a suitable length and tone.
  - The Modern style is designed to be a more concise cover letter with a more casual tone.
Generate Cover Letter:
- Generates a cover letter based on the selected style and the provided resume and job listing. The generated cover letter is displayed in the UI and can be easily copied to the clipboard.

How to Run

Prerequisites

Make sure you have Poetry installed.

Clone the Repository

git clone https://github.com/DoubleGremlin81/cover-letter-llm.git
cd cover-letter-llm

Install Dependencies

poetry install

Create a `.env` File

echo "OPENAI_API_KEY=" >> .env

Run the Application

poetry run streamlit run app.py

Screenshots

Examples

Example 1

Resume: Resume
Job Listing: Job Listing

Classic Cover Letter

Dear Hiring Manager,

I am writing to express my interest in the Data Scientist position at AppLovin. With a decade of experience in data analysis and statistical modeling, I believe I would be a valuable addition to your team.

Throughout my career, I have demonstrated a strong ability to extract meaningful insights from large datasets and make data-driven decisions. I have extensive experience in conducting in-depth data analysis, identifying trends and patterns, and providing valuable insights to drive business strategies. I also have a deep understanding of statistics and have successfully applied statistical concepts to real-world problems.

In addition, I have strong proficiency in data analysis tools and packages such as Python and SQL, which I can leverage to manipulate and analyze data efficiently. I am also skilled in data visualization using tools like Matplotlib and Tableau, allowing me to effectively communicate insights to both technical and non-technical stakeholders.

I am highly detail-oriented and have a passion for uncovering insights within data. I am also a strong problem solver and have a track record of collaborating effectively in cross-functional teams. I am constantly learning and staying updated with the latest trends and advancements in data analysis and statistical techniques.

I am excited about the opportunity to contribute to AppLovin’s advertising technology and help drive its growth. I believe that my skills and experience align well with the responsibilities of the Data Scientist position, and I am confident that I would be a valuable asset to your team.

Thank you for considering my application. I look forward to the opportunity to discuss my qualifications further and how I can contribute to AppLovin’s success.

Sincerely, Terrence Coleman

Modern Cover Letter

Hi, I’m Terrence Coleman, an analytically minded self-starter with a decade of experience collaborating with cross-functional teams to ensure data accuracy and integrity. I have a strong background in data analysis, statistical expertise, and data visualization using tools like Python and SQL. I have successfully led teams and implemented predictive modeling to drive business efficiency and strategic goals. I am excited about the opportunity to join AppLovin as a Data Scientist and apply my skills to analyze large datasets, uncover insights, and provide valuable recommendations to drive the advertising technology forward. With my strong analytical and problem-solving abilities, attention to detail, and effective communication skills, I am confident that I would be a great fit for this role. I look forward to the opportunity to contribute to AppLovin’s success. Thank you for considering my application.

Example 2

Resume: Resume
Job Listing: Job Listing

Classic Cover Letter

Dear Hiring Manager,

I am excited to apply for the Staff Data Scientist position at Grid. With my decade of experience in data science and analytics, I am confident that I can make a significant impact on Grid’s progress and contribute to the success of its users.

In my current role as a Senior Data Scientist at Best Buy, I have led data extraction and evaluation efforts, resulting in cost savings of over 11M. I have also partnered with the product team to build a production recommendation engine in Python, which generated $450K in incremental annual revenue. These experiences demonstrate my ability to collaborate with cross-functional teams and deliver actionable insights.

Furthermore, my experience at 2U as a Data Scientist and Data Analyst has honed my skills in A/B testing, data extraction, and modeling. I have successfully optimized algorithms to target the learning audience by 15% and increased profitability by 4% through Python clustering methods.

With my proficiency in Python, SQL, and machine learning techniques, I am well-equipped to develop and validate models that align with Grid’s strategic objectives. Additionally, my understanding of the financial industry, gained through building and scaling FinTech products, will enable me to contribute domain knowledge to the team.

I am eager to join Grid and help build out the data science team and practice. I am confident that my analytical mindset, autonomy, and curiosity make me the ideal candidate for this role. Thank you for considering my application.

Sincerely, Terrence Coleman

Modern Cover Letter

Hi, I’m Terrence Coleman, an analytically minded self-starter with a decade of experience in data science and analysis. I have a strong background in statistical inference and machine learning, which aligns well with Grid’s need for a Staff Data Scientist. In my current role at Best Buy, I led data extraction efforts and built a recommendation engine that resulted in increased revenue. Additionally, I developed customer attrition models and improved monthly retention. My experience at 2U allowed me to optimize algorithms and improve learning platforms through A/B testing. I have also worked extensively with Python, SQL, and Excel to extract and analyze data. With my expertise in data science, autonomy, and domain knowledge in the financial industry, I believe I would be a great fit for Grid’s team. I am excited to contribute to Grid’s mission of leveling the financial playing field and would love the opportunity to further discuss how my skills and experience can benefit your company. Thank you for considering my application.

Better Visuals

2023-05-15T13:40:00+00:00

Better Visuals is a Plotly Dash app designed to host interactive and visually appealing dashboards.

Dashboards

Data Privacy

Better Visuals is committed to user privacy. Any data collected is anonymized and not used for commercial purposes. The data is only used to track metrics and conduct educational analysis at an aggregated level.

Contributing

We welcome contributions to Better Visuals. Feel free to open an issue or submit a pull request if you have any suggestions or improvements.

License

This project is licensed under the MIT License - see the LICENSE file for details.

View source code on GitHub

Aspects of Analytics That You Cannot Learn In Class

2022-04-03T23:59:00+00:00

This blog was initially written as part of my coursework for BAX 462 - Practicum Elaboration at UC Davis.

Photo by Markus Winkler on Unsplash

Analytics is hard.

It is not something that can be taught entirely through a textbook. Of course, that is not to say that the classes I took as part of my master’s degree were not effective. They ensured that I understood the theory and thought process behind every process while also giving me a hands-on experience through projects and assignments. However, they do not provide any training on certain aspects related to the scale and complexity of real-world projects.

This is where my year-long practicum came in to picture. For the uninitiated, a practicum, as defined by Merriam-Webster, is a course of study [..] that involves the supervised practical application of previously studied theory. The MS in Business Analytics program at UC Davis consists of a year-long practicum, a critical factor in my decision to join the program. The goal of the practicum is to provide real-world experience in analytics by working on real projects in a team at an external organization.

In this blog, I will be talking about six practical aspects of implementing analytics that I learned through the practicum. While some of these topics are mentioned in various courses, their elaborate nature is hard to gauge. To this end, I will also provide ways that you, the reader, can gain these skills (or at least get a better understanding of them). I shall refer to this as the Best Alternative to a Practicum Experience, or BATPE for short.

1. Dealing with multiple sources of data

While Kaggle datasets might be fantastic for someone new to analytics, they are not representative of working in the industry. Most companies have their data spread across multiple tables, data warehouses, and types of databases. It is common to join multiple tables at different levels of data, each with its own intricacies. This requires a combination of domain knowledge as well as SQL expertise to ensure correct and quick execution. During my practicum, I learned this lesson the hard way when our team ran some inefficient joins, which hogged all the resources, making the database inaccessible to anyone else.

BATPE: Work on an analytics project with multiple related data sources. Since finding a curated dataset in this format is hard, you would have to use multiple public datasets and even scrape your own data. My advice would be to choose a topic revolving around geographic or demographic differences as fields such as County and Gender are present in many diverse datasets which are publically available.

2. Dealing with dirty data

Although this is a topic often covered in courses, it is glossed over as it is time-consuming to try out practically. While most larger companies have Data Engineering teams that deal with data cleaning, it is still an important skill to have for an analyst, especially as one should know the ins and outs of the data before performing any analysis.

BATPE: This might be the easiest of my learnings to replicate as there are quite a few dirty datasets available. It is common to find incomplete information in government-released information. I would recommend Australian Marriage Law Postal Survey, 2017, and NYCs 311 Service Requests from 2010 to Present if you are looking for a challenge. Another option would be to scrape some data yourself.

3. Experiment tracking and collaboration

While group projects provide some exposure to the collaborative nature of working in a team, they are seldom as structured as working in a well-oiled team at a job. They also do not emphasize tracking and versioning models and datasets.

BATPE: Maintaining a proper Git repository for the code while following best practices can drastically reduce the friction of collaboration in university or boot camp group projects. Weights & Biases is another incredible tool that helps with experiment tracking, dataset versioning, and model management.

4. Model runtime

There are two aspects to a model’s runtime- training time and execution/prediction time. Both of these might or might not be significant based on the use case. The importance of training time depends on how long it takes and how regularly the model is updated, whereas prediction time might be critical when dealing with streaming data.

BATPE: Monitor times while creating models. A few tips from my experience would be to use scalable and parallelizable frameworks and algorithms such as Dask or PySpark instead of Pandas and LightGBM instead of XGBoost. Another common trick is to drop excess data and precision, trading off some accuracy for better performance.

5. Model deployment

An ML project does not end with computing the F1-Score on the test set. Deployment is a vital part of the analytics life cycle, without which nobody would be able to use the incredibly beneficial models you built. During my practicum, I deployed the models my team built on Azure and exposed them through a REST API to run nightly batch processes. While this is not very complicated, it is important to know the pros and cons of different kinds of deployments to ensure that the model is being run without any hiccups or manual intervention.

BATPE: There are some high-quality online resources, such as this blog by StackOverflow and this course by DeepLearning.AI which provide an in-depth look into deployments and the world of MLOps.

6. Performance tracking

Deployment is also not the end of an ML project. It is critical to monitor the model inputs and KPIs for drift to detect any external changes affecting the model’s performance. This ensures that the model has not worsened over time, causing it to provide negative business value.

BATPE: Neptune.ai has a beautiful guide that dives deep into monitoring models in production.

Bonus: Timelining

Creating a timeline for any sort of work is hard. Even experienced professionals often misjudge the time needed to solve a problem. People often overestimate the efficiency and do not factor in complexities that might arise mid-way through a project. While there is no BATPE for this, working on more projects helps you get a better grasp on estimating deadlines. In accordance with Hofstadter’s law, a common rule of thumb is to double any initial estimate.

Not impressed?

Is there any aspect that you think that I should have mentioned? Or a BATPE that you recommend? Feel free to tweet at me using the button below.

Tweet to @2gremlin181

Analyzing Online Conversations of the 2022 Russian Invasion of Ukraine

2022-03-19T23:11:00+00:00

This analysis was a part of my BAX-452 Machine Learning course final project at UC Davis.

You can find the GitHub repo here.

Data Characteristics

For the analysis, we used data from two different datasets, which were linked temporally and topically.

Twitter Data

We used tweets between the 02-27-2022 to 03-10-2022 from Ukraine Conflict Twitter Dataset. These tweets were then filtered for English only using Twitter’s language parameter. These Tweets were further preprocessed and cleaned to prepare them for analysis. For Topic Modeling, they were converted to lowercase, tokenized, stripped of stop words, links, numbers, and symbols before being stemmed using the Porter Stemmer algorithm. For Sentiment Analysis, the only preprocessing done was removing URLs and mentions due to the other factors providing additional context to our models.

ACLED Data

The Armed Conflict Location & Event Data Project (ACLED) is a non-profit organization that provides reputable granular information about worldwide conflicts such as battles and protests. We used their Data Export Tool to collect data regarding events of interest in Russia and Ukraine. This data was last updated on 03-11-2022.

Analyses and Insights

Topic Modeling on Tweets

The first step in our analysis was to understand the topics that people were talking about. We started with a naive approach of creating a frequency distribution of the words and plotting a word cloud.

This approach was not able to provide us with any meaningful insights due to the most common word stems, somewhat predictably, being based on the words Ukraine, Russia, Putin, and War.

Our next approach was to use Latent Dirichlet Allocation (LDA) to cluster similar topics. We then mapped out the most prominent topics using the words, their weights, and some context through the content of the Tweet and news.

Word Stems	Interpreted Topic
ukrain, student, border, indian, poland, evacu	Indian students stranded at the Ukraine-Poland border in poor conditions after a failed evacuation attempt.
ukrain, support, help, nuclear, plant, power	Russian attack on a Ukrainian nuclear power plant.
ukrain, russian, mariupol, kharkiv, kyiv, armi	Russian army invasion of the Ukrainian cities of Kyiv, Kharkiv, and Mariupol
close, ukrain, nato, stoprussia, stopputin, un	People asking the UN and NATO to intervene and assist Ukraine
weapon, provid, defend, humanitarian, putin, civilian	People talking about providing Ukrainian civilians humanitarian aid

Sentiment Analysis on Tweets

Bag of Words Approach

The first attempt at sentiment analysis used a rudimentary bag of words approach. We used the NLTK corpus through TextBlob which provided us with two metrics- polarity and subjectivity. The polarity score ranged from -1.0 for a very negative sentiment to +1.0 for a very positive sentiment. The subjectivity ranged from 0.0 for very objective statements to +1.0 for very subjective statements. The results from this approach were not promising due to the lack of depth in the sentiment tags and the inability to parse context which is a limitation of the bag of words approach.

Transformer Models

To improve the sentiment analysis, we switched to using transformers. We used ROBERTa (Robustly Optimized BERT Pre-training Approach) models. ROBERTa is an optimized variant of BERT (Bidirectional Encoder Representations from Transformers), which is a direction agnostic model that makes use of masking. We used two pre-trained models created by Cardiff NLP, a research group at Cardiff University. This allowed us to better predict the sentiments for tweets as the models were fine tuned on the TweetEval dataset. Combined together, we had seven labels for each tweet- positive, neutral, negative, joy, optimism, anger, and sadness.

Hashtag Mapping

Since hashtags are such an essential part of Tweets, it is important to also understand them. It is difficult to directly analyze the sentiment of a hashtag as they are often made up of acronyms and multiple words without delimiters. To better understand them, we extracted the hashtags from each tweet and used the seven sentiment values derived from the previous exercise as target variables for regressions. The hashtags were then converted to dummy variables whose coefficients would provide context regarding the overall sentiment. We limited the analysis to the top 40 most popular hashtags to avoid outliers.

Sentiment Label	Top Hashtags (Based on +ve coefficient value)
negative	‘Mariupol’, ‘SafeAirliftUkraine’, ‘StopPutin’, ‘UkraineUnderAttack’, ‘Putin’
neutral	‘BREAKING’, ‘EU’, ‘China’, ‘US’, ‘NATO’
positive	None
anger	‘UKRAINE’, ‘StopRussia’, ‘StopPutin’, ‘putin’, ‘RussianUkrainianWar’
joy	‘SlavaUkraini’, ‘Zelenskyy’
optimism	‘StandWithUkraine’, ‘StandWithUkraine️’, ‘China’, ‘EU’, ‘SafeAirliftUkraine’
sadness	‘Mariupol’, ‘SafeAirliftUkraine’, ‘UkraineUnderAttack’, ‘BREAKING’, ‘Kharkiv’

As can see from the results, we were able to successfully model the sentiment behind the hashtags which would not be otherwise possible by directly looking at them. Seeming neutral words such as names like Putin and Zelenskyy were correlated with sentiments that make sense given the context.

Correlation with Real-World Events

To link the online sentiment with real-world events, we merged the average values for daily sentiment with the count of each event type from the ACLED dataset.

From the graph we can infer that the sentiment is fairly stable, however, there is a noticeable lagged effect. This can be clearly seen from 03-06-2022 to 03-08-2022 when there were a large number of anti-war protests in Russia leading to a drop in anger and negativity.

Additionally, we can also look at the correlation between the event type and sentiment to provide context.

While not perfect, we can see that types such as protests were positively correlated with positivity and negatively correlated with sadness, whereas event types such as violence against civilians were correlated with anger.

Header Image by Joshua Hoehne on Unsplash

Enhancing Client Engagement Through Better Branding

2022-02-25T23:59:00+00:00

This blog was initially written as part of my coursework for BAX 462 - Practicum Elaboration at UC Davis.

Photo by Austin Distel on Unsplash

Every Job is a Sales Job

I’m sure you’ve heard this quote thrown around before. You might have even read a “bestselling” book that goes by the same title. Not that the terms New York Times Bestseller or Wall Street Journal Bestseller carry much meaning these days. As noted in this Observer article, these lists are easily manipulated by authors and publishers purchasing a large number of books at launch to achieve bestseller status as a marketing tactic. While this might or might not have been the case for this book, I’m sure the irony is not lost on you.

In fact, let me blow your mind by telling you that the author of the previous article exposing these practices runs a company that offers services to market books and make them bestsellers.

Branding is the first step towards engaging with your customers. While these were innovative ways of branding products, I will be talking about a different application of branding that isn’t given as much thought but is equally applicable in our day-to-day lives- Project Branding.

Each project in an organization has a brand attached to it by people consciously or subconsciously. Generally speaking, a team working on new bleeding-edge features is seen as sexy even though it might not be contributing to the bottom line. On the other hand, a team working on maintaining a core product is seen as boring. This perception can have an impact on the engagement and priority given to them.

A similar effect also exists while dealing with clients as consultants. In this blog, I talk about my experience with enhancing client engagement through project branding as a data analyst in the practicum at the UC Davis MSBA. As part of the practicum, a team of fellow students and I worked for an external organization whom we met on a weekly basis. As the primary objective of the practicum is to offer a unique learning experience to students, the onus rests on them to be the primary driver of the projects. Since the degree of learning possible depends on the client’s engagement, it is essential to create and maintain an impression of eagerness to learn and competency in delivery with them.

This has led to one of my key takeaways from the MSBA Practicum this quarter not being about some novel algorithm and its unique application but rather about the importance of project branding. This was heavily influenced by an article in the MIT Sloan Management Review- Why Every Project Needs a Brand (and How to Create One).

While the article mentions several key factors, some of them are not always tweakable, such as the Pitch or desirability of a project. This was also the case of the practicum, where the pitch stage was already completed. However, there are several things that you can do, both individually and as a team, that lead to the client having a favorable impression and, in turn, leading to enhanced engagement.

Here are five activities my team implemented that contributed towards creating a better brand for ourselves

1. Providing an agenda and minutes for each meeting

As we generally only met with the client once a week, we sent out the meeting agenda a day prior. This not only allowed us to set an expectation for the meeting but also demonstrated how we were well prepared for it. Similarly, the minutes also showed our professionality along with acting as a source of truth for the discussions held.

2. Asking intelligent questions

Asking intelligent questions is an essential aspect of one’s brand. It demonstrates the ability to quickly and accurately grasp the core elements of a discussion, and helps develop trust and strengthen rapport. I found The Surprising Power of Questions [HBR May–June 2018] to be an excellent resource on how to ask better questions.

3. Making our efforts visible

Not everything one does is successful. You could be an expert in a field and still face unexpected setbacks. This is especially common as students who are learning along the way. Showcasing setbacks and failures provided the client with an inside eye on the efforts we put in, along with providing an opportunity to get feedback and suggestions based on their domain knowledge on how to approach the issue.

4. Creating a structured plan to tackle problems

Since the nature of the practicum led to rigid overall time constraints, we used Trello to keep track of all the tasks and their deadlines. This showcased our professional competency and commitment to deliver on the project in the given timeframe.

5. Setting additional meetings when required

When faced with a roadblock that required the client’s assistance, reaching out and setting up additional meetings rather than waiting for the weekly call showed a sense of urgency and priority towards the project.

Branding is part of client engagement. It is a sales job.

Client engagement is everywhere; in the case of the practicum, the goal was to extract more in the form of learning, and in the real world, it can ensure buy-in from your client and create customer loyalty. In this case, you, the reader, is engaging with my blog, and I want to ensure that my personal brand makes a good impression that makes you want to come back and interact more with my content. If it did, follow me on GitHub or Tweet at me with your thoughts.

Follow @DoubleGremlin181 Tweet to @2gremlin181

Kavish Hukmani

Laundry Insights From Scraping 4000 Washer/Dryers in San Francisco

Key Metrics and Plots

Laundry Room Locations by Neighborhood

Washer/Dryer Usage Trends

Circuit Sketch

Circuit Sketch

Features

Quick Start

Deployment

Data Management

Adding New Circuits

Build

License

Acknowledgements

ReIndigo

ReIndigo

Features

Interactive Plots

LaTeX Support

Improved Tags Page

Sitemap

Credits

Purchasing A Domain in 2025 — Enshitiffication

Fast Travel

Fast Travel

Features

Setup Instructions

Cover Letter Generator LLM

Introduction

LLM Powered

Steps

How to Run

Prerequisites

Clone the Repository

Install Dependencies

Create a .env File

Run the Application

Screenshots

Examples

Example 1

Classic Cover Letter

Modern Cover Letter

Example 2

Classic Cover Letter

Modern Cover Letter

Better Visuals

Dashboards

Data Privacy

Contributing

License

Aspects of Analytics That You Cannot Learn In Class

Analytics is hard.

1. Dealing with multiple sources of data

2. Dealing with dirty data

3. Experiment tracking and collaboration

4. Model runtime

5. Model deployment

6. Performance tracking

Bonus: Timelining

Not impressed?

Analyzing Online Conversations of the 2022 Russian Invasion of Ukraine

Data Characteristics

Twitter Data

ACLED Data

Analyses and Insights

Topic Modeling on Tweets

Sentiment Analysis on Tweets

Bag of Words Approach

Transformer Models

Hashtag Mapping

Correlation with Real-World Events

Enhancing Client Engagement Through Better Branding

Here are five activities my team implemented that contributed towards creating a better brand for ourselves

1. Providing an agenda and minutes for each meeting

2. Asking intelligent questions

3. Making our efforts visible

4. Creating a structured plan to tackle problems

5. Setting additional meetings when required

Branding is part of client engagement. It is a sales job.

Create a `.env` File