Towards Data Science

Bytes Speak All Languages: Cross-Script Name Retrieval via Contrastive Learning
Deep Learning

Why learn 8 scripts when you can learn 256 bytes?

Vedant Jumle

Apr 26

12 min read
I Reduced My Pandas Runtime by 95% — Here’s What I Was Doing Wrong
Programming

Most slow Pandas code “works”, until it doesn’t. Learn how to spot hidden bottlenecks, avoid…

Ibrahim Salami

Apr 26

18 min read

Latest

Causal Inference Is Different in Business
Data Science

How does decision-gravity dictate this gap?

Alejandro Alvarez Perez

Apr 25

12 min read
The Essential Guide to Effectively Summarizing Massive Documents, Part 2
LLM Applications

We have the document clusters, and it’s time to unlock their true potential! Let’s explore…

Vinayak Sengupta

Apr 25

18 min read
Introduction to Approximate Solution Methods for Reinforcement Learning
Machine Learning

Learn about function approximation and the different choices for approximation functions

Oliver S

Apr 24

9 min read
I Built an AI Pipeline for Kindle Highlights
Large Language Models

A local, zero-cost project that cleans, structures, and summarizes your reading automatically

Pol Marin

Apr 24

12 min read
How to Improve Claude Code Performance with Automated Testing
Agentic AI

Learn how to get the most out of Claude Code

Eivind Kjosbakken

Apr 24

10 min read
How to Select Variables Robustly in a Scoring Model
Data Science

More variables don’t make a better scoring model. Stable variables do. Here’s how to find them.

JUNIOR JUMBONG

Apr 24

7 min read
Using a Local LLM as a Zero-Shot Classifier
Large Language Models

A practical pipeline for classifying messy free-text data into meaningful categories using a locally hosted…

Braden Riggs

Apr 23

8 min read
I Simulated an International Supply Chain and Let OpenClaw Monitor It
Agentic AI

Mario asked me why 18% of his shipments were late when every team hit their…

Samir Saci

Apr 23

9 min read
Your Synthetic Data Passed Every Test and Still Broke Your Model
Data Science

The silent gaps in synthetic data that only show up when your model is already…

Poorna Reddy

Apr 23

11 min read

See all of the latest

Editor’s Picks

Using Causal Inference to Estimate the Impact of Tube Strikes on Cycling Usage in London
Data Science

Turning free-to-use data into a hypothesis-ready dataset

Luke Stuckey

Apr 22

19 min read
Ivory Tower Notes: The Methodology
Data Science

A short intro to scientific methodology to combat “prompt in, slop out”

Marina Tosic

Apr 22

6 min read
What Does the p-value Even Mean?
Data Science

And what does it tell us?

Sara A. Metwalli

Apr 20

7 min read
The LLM Gamble
Artificial Intelligence

Why it tickles your brain to use an LLM, and what that means for the…

Stephanie Kirmer

Apr 20

8 min read
AI Agents Need Their Own Desk, and Git Worktrees Give Them One
Agentic AI

Git worktrees, parallel agentic coding sessions, and the setup tax you should be aware of

Ruben Broekx

Apr 18

20 min read
Beyond Prompting: Using Agent Skills in Data Science
Artificial Intelligence

How I turned my eight-year weekly visualization habit into a reusable AI workflow

Yu Dong

Apr 17

7 min read
A Practical Guide to Memory for Autonomous LLM Agents
Agentic AI

Architectures, pitfalls, and patterns that work

Nick Lawson

Apr 17

14 min read
What It Actually Takes to Run Code on 200M€ Supercomputer
Distributed Computing

Inside MareNostrum V: SLURM schedulers, fat-tree topologies, and scaling pipelines across 8,000 nodes in a…

Ferran Alia

Apr 16

11 min read
Your Chunks Failed Your RAG in Production
Large Language Models

The upstream decision no model, or LLM can fix once you get it wrong

Priyansh Bhardwaj

Apr 16

22 min read

The Variable Newsletter

Exciting Changes Are Coming to the TDS Author Payment Program
Writing

Authors can now benefit from updated earning tiers and a higher article cap

TDS Editors

Mar 2

2 min read
TDS Newsletter: Vibe Coding Is Great. Until It’s Not.
The Variable

Sorting through the good, bad, and ambiguous aspects of vibe coding

TDS Editors

Feb 5

4 min read

Deep Dives

Lasso Regression: Why the Solution Lives on a Diamond
Machine Learning

It’s simpler than you think.

Nikhil Dasari

Apr 23

24 min read
Correlation vs. Causation: Measuring True Impact with Propensity Score Matching
Data Science

Learn how Propensity Score Matching uncovers true causality in observational data. By finding “statistical twins,”…

Gustavo Santos

Apr 22

12 min read
DIY AI & ML: Solving The Multi-Armed Bandit Problem with Thompson Sampling
Machine Learning

How you can build your own Thompson Sampling Algorithm object in Python and apply it…

Jacob Ingle

Apr 21

17 min read
Git UNDO : How to Rewrite Git History with Confidence
Programming

For any data scientist who works in a team, being able to undo Git actions…

Omer Rosenbaum

Apr 21

24 min read
I Replaced GPT-4 with a Local SLM and My CI/CD Pipeline Stopped Failing
Machine Learning

The hidden cost of probabilistic outputs in systems that demand reliability

Benjamin Nweke

Apr 21

13 min read
Context Payload Optimization for ICL-Based Tabular Foundation Models
Artificial Intelligence

Conceptual overview and practical guidance

Chinmay Kakatkar

Apr 20

16 min read