<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[RecsysML + LLMs]]></title><description><![CDATA[State of the art advances in applied machine learning with a focus on recommender systems]]></description><link>https://recsysml.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!acOO!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2b277d6-34ae-4d98-8e47-b2be639d7d6b_500x500.png</url><title>RecsysML + LLMs</title><link>https://recsysml.substack.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 11 Apr 2026 04:22:43 GMT</lastBuildDate><atom:link href="https://recsysml.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[RecSysML Journal team]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[recsysml@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[recsysml@substack.com]]></itunes:email><itunes:name><![CDATA[Gaurav Chakravorty]]></itunes:name></itunes:owner><itunes:author><![CDATA[Gaurav Chakravorty]]></itunes:author><googleplay:owner><![CDATA[recsysml@substack.com]]></googleplay:owner><googleplay:email><![CDATA[recsysml@substack.com]]></googleplay:email><googleplay:author><![CDATA[Gaurav Chakravorty]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[LLM Evals From Scratch: Run Your First Benchmarks ]]></title><description><![CDATA[Part 1 of the Bonsai LLM eval series &#8212; Beginner]]></description><link>https://recsysml.substack.com/p/llm-evals-from-scratch-run-your-first</link><guid isPermaLink="false">https://recsysml.substack.com/p/llm-evals-from-scratch-run-your-first</guid><dc:creator><![CDATA[Vish Sangale]]></dc:creator><pubDate>Thu, 09 Apr 2026 18:03:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Dcyv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d19828a-d892-4199-b7bd-2f09d5f4c953_1220x498.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><strong>Prerequisites</strong></h2><ul><li><p>Python 3.9+</p></li><li><p>A terminal and a virtual environment</p></li><li><p>No GPU required (CPU is fine for GPT-2)</p></li></ul><div><hr></div><h2><strong>Step 1 &#8212; Why training loss isn&#8217;t enough</strong></h2><p>Here&#8217;s a failure mode worth knowing before you spend time training anything. You run a model for a few thousand steps, validation loss drops steadily, you call it converged. Then you throw a basic factual question at it and it hallucinates confidently.</p><p>This happens because loss only measures how well the model predicts the next token on your training data. A model that memorizes plausible-sounding token sequences can get low loss while being wrong about nearly everything testable.</p><p>An <strong>eval</strong> ties a task, a metric, and a decision together &#8212; so the number you get means something beyond &#8220;did loss go down.&#8221;</p><p>The taxonomy is worth knowing too, because people mix up format and scoring:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/HKEcd/2/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7d19828a-d892-4199-b7bd-2f09d5f4c953_1220x498.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/34d57dd3-80ca-470b-b0fe-87430c11fb7e_1220x498.png&quot;,&quot;height&quot;:242,&quot;title&quot;:&quot;Created with Datawrapper&quot;,&quot;description&quot;:&quot;&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/HKEcd/2/" width="730" height="242" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>GSM8K asks for a final numeric answer &#8212; it&#8217;s a short-answer benchmark scored with exact match, not a &#8220;generation eval&#8221; just because the model generates text. This tutorial uses multiple-choice with log-prob scoring.</p><div><hr></div><h2><strong>Step 2 &#8212; The three benchmarks we&#8217;ll run</strong></h2><h3><strong>ARC-Easy (AI2 Reasoning Challenge)</strong></h3><p>4-way multiple choice, grade-school science. Here&#8217;s a real item from the dataset:</p><blockquote><p><em>&#8220;Which of the following best describes a physical change? (A) Burning wood (B) Rusting iron (C) Melting ice (D) Digesting food&#8221;</em></p></blockquote><p>Answer: C. The question requires knowing that melting is reversible and doesn&#8217;t change chemical composition. Random baseline: 25%.</p><h3><strong>PIQA (Physical Intuition QA)</strong></h3><p>2-way multiple choice about physical tasks. Real item:</p><blockquote><p><em>Goal: Separate egg whites from yolk using a water bottle.</em> <em>Solution 1: Squeeze the bottle, hold it over the yolk, then release &#8212; the suction pulls the yolk in.</em> <em>Solution 2: Fill the bottle with water and pour it over the egg to wash away the whites.</em></p></blockquote><p>Answer: Solution 1. The model has to reason about suction and physical manipulation. Random baseline: 50%.</p><h3><strong>HellaSwag</strong></h3><p>4-way sentence completion, adversarially constructed. Correct completions describe what naturally happens next; wrong options were specifically written to fool models that rely on surface-level patterns rather than understanding. That makes it harder than it looks, and harder than ARC and PIQA for most small models. Random baseline: 25%.</p><div><hr></div><h2><strong>Step 3 &#8212; Why we use acc_norm in this tutorial</strong></h2><p>For multiple-choice tasks, the harness scores each answer choice by its log-probability under the model. The problem: longer choices accumulate more log-probability tokens, creating a length bias toward verbose options.</p><p><strong>acc_norm</strong> divides each choice&#8217;s log-probability by its token count before comparing. When answer lengths are similar (like PIQA&#8217;s two short options), the difference from raw <code>acc</code> is small. When they vary significantly, it matters &#8212; so we use <code>acc_norm</code> throughout to stay consistent.</p><blockquote><p>The harness outputs both <code>acc</code> and <code>acc_norm</code>. The numbers for PIQA will be nearly identical. For HellaSwag, they can differ by a few points because the wrong answers are sometimes much longer than the correct one.</p></blockquote><div><hr></div><h2><strong>Step 4 &#8212; Set up your environment</strong></h2><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;6815ffb7-d764-4ece-9b14-275b6124de71&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate

pip install lm_eval transformers torch</code></pre></div><p><code>lm_eval</code> is the <a href="https://github.com/EleutherAI/lm-evaluation-harness">EleutherAI lm-evaluation-harness</a>.</p><div><hr></div><h2><strong>Step 5 &#8212; Get the script</strong></h2><p>Clone the full repo:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;2e081a18-0395-4f50-9169-f3708d057b44&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">git clone https://github.com/vishsangale/bonsai-llm
cd bonsai-llm/posts/part1-llm-evals-intro
python eval_gpt2.py</code></pre></div><div><hr></div><h2><strong>Step 6 &#8212; What the script does</strong></h2><p>The core call:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;06fccfa0-eadf-4ff3-bff9-f53fb15bdc23&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">results = lm_eval.simple_evaluate(
    model=&#8221;hf&#8221;,
    model_args=&#8221;pretrained=gpt2&#8221;,
    tasks=[&#8221;arc_easy&#8221;, &#8220;piqa&#8221;, &#8220;hellaswag&#8221;],
    num_fewshot=0,
    batch_size=8,
)</code></pre></div><p><code>num_fewshot=0</code> means zero-shot: just the question, no worked examples in the prompt.</p><p>On the first run, the harness downloads benchmark datasets from HuggingFace into <code>~/.cache/huggingface/datasets/</code> &#8212; you&#8217;ll see progress bars for each task. If you&#8217;re on a slow connection this is where most of the wait time goes. The GPT-2 weights are about 550MB and land in <code>~/.cache/huggingface/hub/</code>. Subsequent runs skip all of this.</p><p>One thing that trips up beginners: if a task name is wrong or not installed, the harness fails with a <code>KeyError</code> rather than a clear error message. If that happens, run <code>lm_eval --tasks list</code> to see what&#8217;s available and check your spelling.</p><div><hr></div><h2><strong>Step 7 &#8212; Read the output</strong><br></h2><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/gifVE/1/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cea89657-19ba-425c-956a-059fb6af6569_1220x500.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/444d6fe0-f71d-4afb-83ca-05bc61617b67_1220x500.png&quot;,&quot;height&quot;:243,&quot;title&quot;:&quot;[ Insert title here ]&quot;,&quot;description&quot;:&quot;&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/gifVE/1/" width="730" height="243" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>The <code>Stderr</code> column tells you how reliable the estimate is. For ARC-Easy (~2600 questions) it&#8217;s &#177;0.01, meaning the true number is probably within one percentage point of what&#8217;s shown. For tasks with fewer examples, stderr grows &#8212; keep that in mind when comparing small differences between models.</p><div><hr></div><h2><strong>Step 8 &#8212; Interpret the results</strong></h2><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/tva4Q/1/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bcd0c339-c1c5-44e8-9d7b-61fe9eb41a97_1220x434.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83151fba-a496-4e78-87a8-7f533b610d0c_1220x434.png&quot;,&quot;height&quot;:209,&quot;title&quot;:&quot;[ Insert title here ]&quot;,&quot;description&quot;:&quot;&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/tva4Q/1/" width="730" height="209" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>The PIQA score is the interesting one. GPT-2 at 62.5% on a 50% baseline is a larger gain than ARC-Easy&#8217;s 39.5% on a 25% baseline, in relative terms. That&#8217;s not because PIQA is easier &#8212; it&#8217;s because GPT-2&#8217;s training data (web text) is saturated with &#8220;how to&#8221; content about physical tasks. GPT-2 picked up enough statistical regularity about physical tasks from web text that it transfers to this benchmark.</p><p>HellaSwag at 31.1% is close to chance. That&#8217;s expected here: GPT-2 was trained to predict tokens, not to reason about what comes next in a scenario. HellaSwag was designed so that wrong answers score high on surface-level statistics &#8212; exactly the thing GPT-2 is good at &#8212; which is why small base completion models consistently score poorly on it.</p><div><hr></div><h2><strong>Step 9 &#8212; Compare model sizes</strong></h2><p>Edit the <code>MODEL</code> variable at the top of <code>eval_gpt2.py</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;36d2f367-afee-42b2-90b5-08f8e51ef6ac&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">MODEL = &#8220;gpt2&#8221;         # 124M parameters
MODEL = &#8220;gpt2-medium&#8221;  # 355M parameters
MODEL = &#8220;gpt2-large&#8221;   # 774M parameters
MODEL = &#8220;gpt2-xl&#8221;      # 1.5B parameters</code></pre></div><p>All three scores improve with size. Here are the actual numbers across all four variants (acc_norm, zero-shot):</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/JjhuC/2/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e51e1cb-ab14-4cf2-9bbc-72a1c97da14b_1220x470.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/51b85a8c-6975-41a0-b817-aea96fa2a5ad_1220x470.png&quot;,&quot;height&quot;:228,&quot;title&quot;:&quot;[ Insert title here ]&quot;,&quot;description&quot;:&quot;&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/JjhuC/2/" width="730" height="228" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>HellaSwag scales the most &#8212; +19.8pp from base to xl &#8212; consistent with the adversarial task benefiting more from additional capacity. PIQA shows diminishing returns: GPT-2&#8217;s web text training already gives it a strong foundation for physical intuition tasks, so additional parameters help less. ARC-Easy sits in the middle.</p><p>The script saves <code>results_&lt;model&gt;.json</code> after each run. Diff two JSON files to track which tasks regress after fine-tuning &#8212; that&#8217;s the main practical use of these baselines.</p><div><hr></div><h2><strong>Step 10 &#8212; Limits and contamination</strong></h2><p><strong>Benchmark contamination</strong> is worth knowing about before you trust leaderboard numbers. If a model&#8217;s training data contains benchmark test items &#8212; even partially &#8212; the scores overstate real capability. This is a genuine concern for models trained on large web crawls, many of which postdate these benchmarks. GPT-2 is old enough that contamination is less of a concern here.</p><p>The practical implication: public benchmarks are reliable for comparing models trained under similar conditions and for catching regressions. For product decisions, use held-out evals that match your actual use case &#8212; ones you control and the model has never seen.</p><div><hr></div><h2><strong>When to use these evals &#8212; and when not to</strong></h2><p></p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/krw7G/1/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1dde6280-e8ed-4229-a2eb-43646090934c_1220x600.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ef3a47df-b744-4c2c-b041-f63eb969a9d9_1220x600.png&quot;,&quot;height&quot;:294,&quot;title&quot;:&quot;[ Insert title here ]&quot;,&quot;description&quot;:&quot;&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/krw7G/1/" width="730" height="294" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p></p><p>My rule of thumb: if you can&#8217;t say what decision changes based on the eval result, you probably don&#8217;t need that eval yet.</p><div><hr></div><h2><strong>What&#8217;s next</strong></h2><p><strong>Part 2</strong> moves into open-ended generation evals. We&#8217;ll use a small instruction-tuned model, build a RAG pipeline, and measure it with RAGAS &#8212; scoring faithfulness, answer relevance, and context recall.</p><p><strong>Part 3</strong> compares the major eval frameworks side-by-side: lm-eval-harness, DeepEval, RAGAS, and Inspect.</p><div><hr></div><p><em>Full code: <a href="https://github.com/vishsangale/bonsai-llm/tree/main/posts/part1-llm-evals-intro">github.com/vishsangale/bonsai-llm/tree/main/posts/part1-llm-evals-intro</a></em></p>]]></content:encoded></item><item><title><![CDATA[Personalization at Bluesky]]></title><description><![CDATA[The past, present, and future of personalization of the Discover feed]]></description><link>https://recsysml.substack.com/p/personalization-at-bluesky</link><guid isPermaLink="false">https://recsysml.substack.com/p/personalization-at-bluesky</guid><dc:creator><![CDATA[Ian Wesley-Smith]]></dc:creator><pubDate>Mon, 23 Feb 2026 19:03:07 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ybxd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff590eeaf-58c9-4c4a-9760-0e8edebfee27_528x514.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>At <a href="https://bsky.app/">Bluesky</a>, we are building an open foundation for the social internet, where anyone can create a feed, such as the <a href="https://bsky.app/profile/did:plc:jfhpnnst6flqway4eaeqzj2a/feed/for-science">Science feed</a>, <a href="https://bsky.app/profile/spacecowboy17.bsky.social/feed/for-you">For You feed by spacecowboy</a>, or <a href="https://bsky.app/profile/did:plc:rea3amxwqfkfzhilivubtrib/feed/aaabfz334lr66">GLAMS feed</a>. We also aim to provide a great default Discover feed. This post discusses personalization of the Discover feed, from historical attempts to current deployment, and a path forward inspired by Pinterest&#8217;s work. If interested, come work with me at <a href="https://jobs.gem.com/bluesky/am9icG9zdDpJ8TCGYh93XAC00AkK4gXz">Bluesky</a>!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ybxd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff590eeaf-58c9-4c4a-9760-0e8edebfee27_528x514.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ybxd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff590eeaf-58c9-4c4a-9760-0e8edebfee27_528x514.png 424w, https://substackcdn.com/image/fetch/$s_!Ybxd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff590eeaf-58c9-4c4a-9760-0e8edebfee27_528x514.png 848w, https://substackcdn.com/image/fetch/$s_!Ybxd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff590eeaf-58c9-4c4a-9760-0e8edebfee27_528x514.png 1272w, https://substackcdn.com/image/fetch/$s_!Ybxd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff590eeaf-58c9-4c4a-9760-0e8edebfee27_528x514.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ybxd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff590eeaf-58c9-4c4a-9760-0e8edebfee27_528x514.png" width="528" height="514" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f590eeaf-58c9-4c4a-9760-0e8edebfee27_528x514.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:514,&quot;width&quot;:528,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:185213,&quot;alt&quot;:&quot;A screenshot of the Bluesky Discover feed.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/188820301?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff590eeaf-58c9-4c4a-9760-0e8edebfee27_528x514.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A screenshot of the Bluesky Discover feed." title="A screenshot of the Bluesky Discover feed." srcset="https://substackcdn.com/image/fetch/$s_!Ybxd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff590eeaf-58c9-4c4a-9760-0e8edebfee27_528x514.png 424w, https://substackcdn.com/image/fetch/$s_!Ybxd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff590eeaf-58c9-4c4a-9760-0e8edebfee27_528x514.png 848w, https://substackcdn.com/image/fetch/$s_!Ybxd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff590eeaf-58c9-4c4a-9760-0e8edebfee27_528x514.png 1272w, https://substackcdn.com/image/fetch/$s_!Ybxd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff590eeaf-58c9-4c4a-9760-0e8edebfee27_528x514.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An example of the Bluesky Discover feed</figcaption></figure></div><p>As the first MLE at Bluesky, I initially attempted a <a href="https://recsysml.substack.com/p/two-tower-models-for-retrieval-of">two tower model</a> but it failed to converge, possibly due to insufficient data or being a poor fit for Bluesky&#8217;s short-lifetime items and skewed interaction distributions. Bluesky was (and still is) a small team, so I couldn&#8217;t spend forever debugging this issue. Instead I switched to building a system that would generate post embeddings based on the content of a post, with the idea that I could build a personalization system on top of that.</p><p>Currently, posts are embedded using <a href="https://arxiv.org/abs/2301.12597">BLIP2</a>, a variant of CLIP, which powers our topic models (27 topics users select during onboarding). While this topic model is accurate it is also quite broad, which hurts the user experience. I&#8217;ve also run <a href="https://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html">HDBSCAN</a> over a sample of the post embedding space to generate ~600 clusters which provide finer grained grouping of content. By measuring a user&#8217;s interaction with content from these clusters or topics we have a rudimentary personalization system that can help users find content they might be interested in.</p><p>My goal is to substantially improve personalization of the Discover feed. After reviewing papers, I chose to investigate techniques from Pinterest, specifically their <a href="https://arxiv.org/abs/2007.03634">PinnerSage</a> paper. This choice was based on budget fit, simplicity, avoiding extensive fine-tuning, and the requirement to treat user and post representations separately. There are a lot of similarities between the papers published by Pinterest and Twitter, but I choose to use the Pinterest papers because they&#8217;ve continued publishing, providing a path to utilize more advanced models as the ML team at Bluesky grows.</p><h2>Bluesky is hiring!</h2><p>Speaking of growing the team, are you a mid-senior MLE with experience in recommender systems? Do you want to join a team laying the groundwork for how ML will operate at a fast growing social media platform? Do you want to increase your scope of work? Want to experiment with new, unconventional ideas? Think distributed social media is the way of the future? Then come work with me at <a href="https://jobs.gem.com/bluesky/am9icG9zdDpJ8TCGYh93XAC00AkK4gXz">Bluesky</a>!</p><h2>PinnerSage</h2><p>Published in 2020, <a href="https://arxiv.org/abs/2007.03634">PinnerSage</a> addresses the issue of a single user preference embedding failing to capture a user&#8217;s full range of interests, especially short and long-term interests. It does this by generating several (10-100) user preference embeddings via an offline path (last 90 days) and an online path (today&#8217;s interactions). This resulted in a 2% increase in user engagement propensity and a 4% increase in engagement volume in online A/B tests.</p><h3>How it works</h3><p>PinnerSage is a rather simple approach to the problem, with intentional design choices that match my own. They specifically mention that item embeddings should be fixed, which is a requirement for me.</p><h4>Step 1: Cluster User Interactions</h4><p>First, for a given user they take the last 90-days of item interactions (i.e. action pins) and gather the item embeddings. Next they cluster these embeddings using <a href="https://en.wikipedia.org/wiki/Ward%27s_method">Ward clustering</a>, generating a &#8216;small number&#8217; (10-100) of clusters for a user. Their specific Ward implementation is based on the <a href="https://en.wikipedia.org/wiki/Ward%27s_method#Lance%E2%80%93Williams_algorithms">Lance-Williams algorithm</a>, and has a complexity of <em>O(n^2)</em> where <em>n</em> is the number of items being clustered.</p><h4>Step 2: Calculate the Medoid</h4><p>Second, for each cluster, a medoid&#8212;an actual member of the cluster that minimizes the sum of the squared distances with other members&#8212;is calculated. This simplifies deployment by allowing Pinterest to reuse existing pin infrastructure.</p><h4>Step 3: Importance Scoring</h4><p>Finally, they calculate a user-cluster importance score. Since a user can have 10-100 clusters we need a way to choose which clusters to use during retrieval. They use a simple time decay average model: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Importance(C, \\lambda) = \\sum_{i \\in C} e^{-\\lambda(\\tau_{now} - \\tau_{[i]})}&quot;,&quot;id&quot;:&quot;DLPNRPHLTZ&quot;}" data-component-name="LatexBlockToDOM"></div><p><em>lambda</em> is a hyper-parameter that controls recency, with 0 ignoring time effects and 0.1 emphasizing recent interaction. Pinterest found 0.01 to be a good balance.</p><p>With these three steps we now have a set of per-user interest medoids (i.e. pins) and weights for how much a user interacts with those pins.</p><h3>Integrating with your Recommender System</h3><p>Applying this to retrieval is fairly straightforward. The medoids can be sampled, weighted by importance, and used as candidate sources for an ANN-based candidate generator. Pinterest sampled up to 3 medoids at a time, and applied additional (though unspecified) filtering to remove near duplicates and poor quality candidates.</p><p>One weakness with PinnerSage is the difficulty in using these user preference embeddings during ranking. Traditionally you create a feature for each item that is the similarity of that item&#8217;s embedding and the user preference embedding. With PinnerSage there are anywhere from 10-100 preference embeddings for each user, so it is unclear which of these embeddings you should choose. You could try using all of them, and taking the max score for a given item and the user preference embeddings, but this is expensive to do at runtime (i.e. 100 embeddings x 1,000 items = 100,000 ops). Another option is to take a weighted average of the user preference embeddings to combine them into a single user-preference embedding, but this naive approach will likely result in loss of accuracy due to smearing the users preferences.</p><p>The difficulty of integrating multiple user preference embeddings into ranking was a key motivator for <a href="http://arxiv.org/abs/2205.04507">PinnerFormer (Pancha et al., &#8220;PinnerFormer.&#8221;)</a>, which Pinterest developed to generate a single user preference embedding using Transformers to better capture user interests. We will discuss PinnerFormer in a future blogpost.</p><h3>Short Term Interests &amp; Item Embeddings</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!puhE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca0a2d8c-c0ac-4460-82e3-848b55c6a788_790x626.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!puhE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca0a2d8c-c0ac-4460-82e3-848b55c6a788_790x626.png 424w, https://substackcdn.com/image/fetch/$s_!puhE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca0a2d8c-c0ac-4460-82e3-848b55c6a788_790x626.png 848w, https://substackcdn.com/image/fetch/$s_!puhE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca0a2d8c-c0ac-4460-82e3-848b55c6a788_790x626.png 1272w, https://substackcdn.com/image/fetch/$s_!puhE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca0a2d8c-c0ac-4460-82e3-848b55c6a788_790x626.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!puhE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca0a2d8c-c0ac-4460-82e3-848b55c6a788_790x626.png" width="790" height="626" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ca0a2d8c-c0ac-4460-82e3-848b55c6a788_790x626.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:626,&quot;width&quot;:790,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;A diagram depicting the PinnerSage recommendation system architecture. It shows the batch and real-time systems, with the batch feeding into a Key-Value store while the real time feeds directly into the candidate generator.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A diagram depicting the PinnerSage recommendation system architecture. It shows the batch and real-time systems, with the batch feeding into a Key-Value store while the real time feeds directly into the candidate generator." title="A diagram depicting the PinnerSage recommendation system architecture. It shows the batch and real-time systems, with the batch feeding into a Key-Value store while the real time feeds directly into the candidate generator." srcset="https://substackcdn.com/image/fetch/$s_!puhE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca0a2d8c-c0ac-4460-82e3-848b55c6a788_790x626.png 424w, https://substackcdn.com/image/fetch/$s_!puhE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca0a2d8c-c0ac-4460-82e3-848b55c6a788_790x626.png 848w, https://substackcdn.com/image/fetch/$s_!puhE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca0a2d8c-c0ac-4460-82e3-848b55c6a788_790x626.png 1272w, https://substackcdn.com/image/fetch/$s_!puhE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca0a2d8c-c0ac-4460-82e3-848b55c6a788_790x626.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">PinnerSage architecture diagram from Pal et al., &#8220;PinnerSage.&#8221;</figcaption></figure></div><p>Earlier we alluded to an online system that captures short term interests. An event-based streaming system captures short-term interests by performing the same clustering and importance estimation steps on the twenty most recent actions since the last batch job. These results are combined with the batch results.</p><p>One thing not discussed in this paper is how item embeddings are generated. At the time of publication (2020), Pinterest used a sophisticated graph based embedding model called <a href="http://arxiv.org/abs/1706.02216">PinSage (Hamilton et al., &#8220;Inductive Representation Learning on Large Graphs.&#8221;)</a>. At BlueSky we are using BLIP2 to generate post embeddings. If you don&#8217;t already have an item embedding model then you can&#8217;t deploy PinnerSage.</p><h2>Conclusion</h2><p>This blog post presented an overview of PinnerSage, a clustering based approach to generating user preference embeddings while keeping item embeddings fixed. I also discussed a brief history of personalization at Bluesky, and provided my motivation for investigating PinnerSage. My current plans are to implement PinnerSage as a candidate generator, then move to PinnerFormer to generate a single user preference embedding for ranking. As we make progress on various parts of the stack we will share our work.</p><h2>Bibliography</h2><p>Pal, Aditya, Chantat Eksombatchai, Yitong Zhou, Bo Zhao, Charles Rosenberg, and Jure Leskovec. &#8220;PinnerSage: Multi-Modal User Embedding Framework for Recommendations at Pinterest.&#8221; <em>Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</em>, August 23, 2020, 2311&#8211;20.<a href="https://doi.org/10.1145/3394486.3403280"> https://doi.org/10.1145/3394486.3403280</a>.</p><p>Hamilton, William L., Rex Ying, and Jure Leskovec. &#8220;Inductive Representation Learning on Large Graphs.&#8221; <em>arXiv:1706.02216 [Cs, Stat]</em>, June 7, 2017.<a href="http://arxiv.org/abs/1706.02216"> http://arxiv.org/abs/1706.02216</a>.</p><p>Pancha, Nikil, Andrew Zhai, Jure Leskovec, and Charles Rosenberg. &#8220;PinnerFormer: Sequence Modeling for User Representation at Pinterest.&#8221; arXiv:2205.04507. Preprint, arXiv, May 9, 2022.<a href="http://arxiv.org/abs/2205.04507">http://arxiv.org/abs/2205.04507</a>.</p>]]></content:encoded></item><item><title><![CDATA[The Mathematics of Intelligence: A Deep Dive into LLM Training]]></title><description><![CDATA[Creating a state-of-the-art Large Language Model (LLM) is not a single act of training.]]></description><link>https://recsysml.substack.com/p/the-mathematics-of-intelligence-a</link><guid isPermaLink="false">https://recsysml.substack.com/p/the-mathematics-of-intelligence-a</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Sat, 31 Jan 2026 17:17:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-K4v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55251318-3711-47b6-b76c-2add5d32b2ff_1208x544.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Creating a state-of-the-art Large Language Model (LLM) is not a single act of training. It is a multi-stage evolution that transforms a raw neural network from a statistical pattern-matcher into a refined, reliable assistant.</p><p>This post breaks down the technical pipeline&#8212;from foundational knowledge acquisition to parameter-efficient fine-tuning with <strong>LoRA</strong>, and the critical &#8220;leash&#8221; provided by <strong>KL-Divergence</strong>.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading RecsysML + LLMs! Subscribe for free to receive new posts and support our work. If you want to write with us, we welcome that too!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>1. Pre-training: The Foundation of Knowledge</h3><p>In this first stage, the model absorbs the statistical structure of language. It is exposed to trillions of tokens with the objective of <strong>Next-Token Prediction</strong>. At this level, the model is learning the &#8220;world model&#8221;&#8212;facts, grammar, and reasoning&#8212;but it has no concept of a &#8220;user&#8221; or a &#8220;task.&#8221;</p><p><strong>The Logic:</strong> We measure how &#8220;surprised&#8221; the model is by the actual next word in a sentence. The goal is to minimize this surprise across the entire internet.</p><p><strong>The Math: Cross-Entropy Loss</strong> The model learns a probability distribution P over its vocabulary. We minimize the <strong>Negative Log-Likelihood (NLL)</strong>:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;L_{Pre} = -\\sum_{t=1}^{T} \\log P(w_t | w_{<t}; \\theta)&quot;,&quot;id&quot;:&quot;UDPWELZEKP&quot;}" data-component-name="LatexBlockToDOM"></div><ul><li><p><strong>&#952;:</strong> The massive parameter set of the base model.</p></li><li><p><strong>The Result:</strong> The model develops a &#8220;world model,&#8221; internalizing facts, grammar, and reasoning. However, it doesn&#8217;t yet know how to follow instructions.</p></li></ul><div><hr></div><h3>2. Instruction Tuning (SFT) &amp; LoRA</h3><p>To transform a knowledge base into an assistant, we use <strong>Supervised Fine-Tuning (SFT)</strong>. We provide the model with &#8220;Gold Standard&#8221; examples of how to follow a prompt. To make this efficient, we use <strong>LoRA (Low-Rank Adaptation)</strong>.</p><p><strong>The Logic:</strong> Instead of updating all 70+ billion parameters (which is slow and expensive), we freeze the original model and add tiny, specialized &#8220;adapter&#8221; matrices. These matrices are &#8220;Low-Rank,&#8221; meaning they compress the complex task of &#8220;being an assistant&#8221; into a much smaller mathematical space.</p><p><strong>The Math of LoRA:</strong> Instead of updating the full weight matrix W0&#8203;, we freeze it and train two small, low-rank matrices A and B. The rank r is much smaller than the original dimensions (r&#8810;d,k).</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;h = W_0 x + \\Delta W x = W_0 x + B(Ax)&quot;,&quot;id&quot;:&quot;IUPOGOPGMS&quot;}" data-component-name="LatexBlockToDOM"></div><p></p><p><strong>The Loss:</strong> The goal is to maximize the likelihood of the specific human-labeled response: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;L_{SFT} = -\\mathbb{E}_{(x,y)} [\\log P(y | x; \\theta)]&quot;,&quot;id&quot;:&quot;RMKLHVPEQN&quot;}" data-component-name="LatexBlockToDOM"></div><p></p><div><hr></div><h3>3. Reinforcement Learning (PPO): Joint Optimization</h3><p>Instruction tuning teaches the model the <strong>format</strong> of being an assistant, but <strong>Reinforcement Learning from Human Feedback (RLHF)</strong> teaches it <strong>quality and safety</strong>. In this stage, we often use a "Joint Optimization" where we update the <strong>same LoRA adapter</strong> from the SFT stage.</p><h4></h4><h4>A. The Bradley-Terry Reward Model</h4><p>We don&#8217;t just give the model a raw score. We use an &#8220;Advantage&#8221; calculation.</p><p><strong>The Logic:</strong> First a <strong>Reward Model</strong> model is trained using the human preference data on model responses, so that we can predict human preference on any given responses, not just where we have data.</p><p>The Reward Model is a static judge that gives a score.  However, a score of &#8220;7/10&#8221; is meaningless without context. Is 7 good for a complex coding prompt? Or is it bad for a simple &#8220;Hello&#8221;? But the <strong>Advantage</strong> measures how much <em>better</em> a specific answer was compared to what the model usually produces. This prevents the model from getting &#8220;lazy&#8221; and ensures it is constantly striving for a higher-than-average response.</p><p><strong>The Math:</strong> The Reward Model (r&#981;&#8203;) is trained to predict which answer humans prefer (y_w&#8203; vs y_l&#8203;): </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;P(y_w \\succ y_l | x) = \\frac{\\exp(r_\\phi(x, y_w))}{\\exp(r_\\phi(x, y_w)) + \\exp(r_\\phi(x, y_l))}&quot;,&quot;id&quot;:&quot;SSSYAJDYEX&quot;}" data-component-name="LatexBlockToDOM"></div><p>Then we use a <strong>Value Model</strong> (V) to estimate the &#8220;expected reward&#8221; for a prompt. The Advantage is the difference between the actual Reward we received and what the Value Model expected: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{A}_t = r_\\phi(x, y) - V(x)&quot;,&quot;id&quot;:&quot;NSJAFFCJJZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>if Advantage &gt;0, the model performed better than its own baseline, and we "push" the LoRA weights to make that response more likely.</p><h4>C. The PPO Clipped Objective (LPPO&#8203;)</h4><p>When we &#8220;push&#8221; the model to be better, we have to be careful not to push it so hard that it &#8220;breaks&#8221; and starts generating gibberish.</p><p><strong>The Logic:</strong> We use a <strong>Clipped Objective</strong>. This acts as a mathematical guardrail that prevents the LoRA adapter from changing too drastically in a single update. If the model wants to change its behavior by 50%, the clip function forces it to only change by a safe 10-20%.</p><p><strong>The Math:</strong></p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;L_{PPO} = \\mathbb{E}_t [\\min(r_t(\\theta) \\hat{A}_t, \\text{clip}(r_t(\\theta), 1-\\epsilon, 1+\\epsilon) \\hat{A}_t)]&quot;,&quot;id&quot;:&quot;AOYAUOGAYZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Where:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;r_t(\\theta) = \\frac{\\pi_\\theta(a_t|s_t)}{\\pi_{old}(a_t|s_t)}&quot;,&quot;id&quot;:&quot;LFZASNMUEA&quot;}" data-component-name="LatexBlockToDOM"></div><p><strong>&#1013;:</strong> A hyperparameter (usually 0.1 or 0.2) that limits how much the policy can change.</p><h4>C. The Joint Loss</h4><p>On the same adapter, we combine the RL objective with SFT-regularization and the KL-penalty:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; L_{Total} = L_{PPO} + \\alpha L_{SFT} - \\beta \\text{KL}(\\pi_\\theta || \\pi_{ref})&quot;,&quot;id&quot;:&quot;MWSRFULIHZ&quot;}" data-component-name="LatexBlockToDOM"></div><p></p><p><strong>The Logic:</strong> Here KL penalty, measures distance between predictions of new model with LORA and ensures that the model do not diverges much from pre-trained model that learns the language structure</p><div><hr></div><h3>4. Direct Preference Optimization (DPO): The Sequential Shortcut</h3><p><strong>DPO</strong> is a modern alternative that skips the Reward Model entirely. Unlike PPO, it is typically a <strong>sequential</strong> process regarding adapters.</p><p><strong>The Strategy:</strong></p><ol><li><p><strong>Reference:</strong> You freeze the SFT LoRA adapter as &#960;ref&#8203;.</p></li><li><p><strong>Policy:</strong> You train a <strong>new, separate LoRA adapter</strong> (&#960;&#952;&#8203;) that learns to maximize the log-ratio between preferred and unpreferred responses.</p></li></ol><p><strong>The Math:</strong></p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; L_{DPO} = -\\mathbb{E} \\left[ \\log \\sigma \\left( \\beta \\log \\frac{\\pi_\\theta(y_w|x)}{\\pi_{ref}(y_w|x)} - \\beta \\log \\frac{\\pi_\\theta(y_l|x)}{\\pi_{ref}(y_l|x)} \\right) \\right]&quot;,&quot;id&quot;:&quot;VWUPZLBGYU&quot;}" data-component-name="LatexBlockToDOM"></div><div><hr></div><h3>Summary Table: Adapter &amp; Loss Strategies</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-K4v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55251318-3711-47b6-b76c-2add5d32b2ff_1208x544.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-K4v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55251318-3711-47b6-b76c-2add5d32b2ff_1208x544.png 424w, https://substackcdn.com/image/fetch/$s_!-K4v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55251318-3711-47b6-b76c-2add5d32b2ff_1208x544.png 848w, https://substackcdn.com/image/fetch/$s_!-K4v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55251318-3711-47b6-b76c-2add5d32b2ff_1208x544.png 1272w, https://substackcdn.com/image/fetch/$s_!-K4v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55251318-3711-47b6-b76c-2add5d32b2ff_1208x544.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-K4v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55251318-3711-47b6-b76c-2add5d32b2ff_1208x544.png" width="1208" height="544" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/55251318-3711-47b6-b76c-2add5d32b2ff_1208x544.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:544,&quot;width&quot;:1208,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:78933,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/185125411?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55251318-3711-47b6-b76c-2add5d32b2ff_1208x544.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-K4v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55251318-3711-47b6-b76c-2add5d32b2ff_1208x544.png 424w, https://substackcdn.com/image/fetch/$s_!-K4v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55251318-3711-47b6-b76c-2add5d32b2ff_1208x544.png 848w, https://substackcdn.com/image/fetch/$s_!-K4v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55251318-3711-47b6-b76c-2add5d32b2ff_1208x544.png 1272w, https://substackcdn.com/image/fetch/$s_!-K4v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55251318-3711-47b6-b76c-2add5d32b2ff_1208x544.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div><hr></div><h3>Conclusion</h3><p>Modern LLM training is a balancing act. PPO uses <strong>Advantage</strong> and <strong>Clipping</strong> on a single LoRA adapter to incrementally move toward human preferences, while DPO uses a <strong>Log-Ratio</strong> on sequential adapters to simplify the math. In both cases, the goal is to maximize the delta between what the model <em>did</em> and what the human <em>wanted</em>.</p><p>Would you like me to help you draft a specific &#8220;How to implement this&#8221; section using a library like Hugging Face&#8217;s TRL (Transformer Reinforcement Learning)?</p><p></p><p><em><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading RecsysML + LLMs! Subscribe for free to receive new posts and support my work. If you want to write with us, we would love to collaborate!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Modernizing GPT-2: A Journey from 2019 to 2025]]></title><description><![CDATA[How we injected state-of-the-art features into a classic architecture and what we learned about scaling.]]></description><link>https://recsysml.substack.com/p/modernizing-gpt-2-a-journey-from</link><guid isPermaLink="false">https://recsysml.substack.com/p/modernizing-gpt-2-a-journey-from</guid><dc:creator><![CDATA[Vish Sangale]]></dc:creator><pubDate>Sat, 24 Jan 2026 15:01:44 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/183977232/9f7245593d5c4243f453439434fa9ee3.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<blockquote><p><a href="https://recsysml.substack.com/p/training-gpt-2-on-a-budget">Part I</a></p><p><a href="https://github.com/vishsangale/gpt-2">Link to GitHub Repository</a></p></blockquote><h2><strong>Introduction</strong></h2><blockquote><p>GPT-2 is a legendary model, effectively the &#8220;Hello World&#8221; of modern LLMs thanks to Andrej Karpathy&#8217;s nanoGPT. But in the fast-moving world of AI, 2019 might as well be ancient history.</p><p>We set out to answer a simple question: <strong>What happens if we take the classic GPT-2 architecture and inject the architectural improvements that power today&#8217;s leading models like Llama 3?</strong></p><p>This post details our journey of implementing RoPE, RMSNorm, and SwiGLU into GPT-2, the backward-compatibility challenges we solved, and the surprising results we found when testing on datasets of different sizes.</p></blockquote><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading RecsysML + LLMs! Subscribe for free to receive new posts early</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3><strong>1. The Upgrades: Why We Made Them</strong></h3><blockquote><p>We focused on three key modernizations that have become standard in post-2023 LLMs.</p><p><strong>&#128260; Rotary Positional Embeddings (RoPE)</strong></p></blockquote><ul><li><p><strong>The Old Way:</strong> Standard GPT-2 uses learned absolute positional embeddings. The model learns a unique vector for position 0, position 1, etc. This doesn&#8217;t scale well to longer contexts and fails to capture the relative distance between tokens effectively.</p></li><li><p><strong>The Upgrade:</strong> We replaced this with RoPE. Instead of adding a vector, we rotate the query and key vectors in the attention mechanism based on their position. This allows the model to naturally understand &#8220;token A is 5 steps before token B&#8221; regardless of where they appear in the sequence.</p></li></ul><blockquote><p><strong>&#128208; RMSNorm (Root Mean Square Normalization)</strong></p></blockquote><ul><li><p><strong>The Old Way:</strong> LayerNorm. It centers and scales the input.</p></li><li><p><strong>The Upgrade:</strong> RMSNorm. It skips the centering step and only re-scales. It&#8217;s computationally simpler and, empirically, often leads to more stable training at scale. It&#8217;s a small simplification that has helped models like Llama scale to massive sizes.</p></li></ul><blockquote><p><strong>&#129504; SwiGLU Activation</strong></p></blockquote><ul><li><p><strong>The Old Way:</strong> GeLU (Gaussian Error Linear Unit).</p></li><li><p><strong>The Upgrade:</strong> SwiGLU (Swish Gated Linear Unit). This is one of the most impactful changes. It essentially gives the Feed-Forward Network (FFN) a &#8220;gate&#8221; implementation, increasing the model&#8217;s capacity and expressivity. It requires slightly more parameters (due to the extra gate projection), but the performance capabilities per parameter are generally higher.</p></li></ul><div><hr></div><h3><strong>2. Challenge: The Backward Compatibility Trap</strong></h3><p>We didn&#8217;t just want a new model; we wanted a unified codebase. We needed to ensure that we could still load old, vanilla GPT-2 checkpoints.</p><pre><code><code>if config.use_rmsnorm:
    self.ln1 = RMSNorm(config.n_embd)
else:
    self.ln1 = nn.LayerNorm(config.n_embd)
</code></code></pre><p>We even wrote a </p><div><hr></div><h3><strong>3. Experiment 1: The Overfitting Trap (Tiny Shakespeare)</strong></h3><p>Our first test was on the classic Tiny Shakespeare dataset. We anticipated the modernized model would crush the baseline.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DUkC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F280e711f-6297-4143-8bb3-924d86a22a7d_647x143.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DUkC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F280e711f-6297-4143-8bb3-924d86a22a7d_647x143.png 424w, https://substackcdn.com/image/fetch/$s_!DUkC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F280e711f-6297-4143-8bb3-924d86a22a7d_647x143.png 848w, https://substackcdn.com/image/fetch/$s_!DUkC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F280e711f-6297-4143-8bb3-924d86a22a7d_647x143.png 1272w, https://substackcdn.com/image/fetch/$s_!DUkC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F280e711f-6297-4143-8bb3-924d86a22a7d_647x143.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DUkC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F280e711f-6297-4143-8bb3-924d86a22a7d_647x143.png" width="647" height="143" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/280e711f-6297-4143-8bb3-924d86a22a7d_647x143.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:143,&quot;width&quot;:647,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14516,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://vishsangale.substack.com/i/181739736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F280e711f-6297-4143-8bb3-924d86a22a7d_647x143.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!DUkC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F280e711f-6297-4143-8bb3-924d86a22a7d_647x143.png 424w, https://substackcdn.com/image/fetch/$s_!DUkC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F280e711f-6297-4143-8bb3-924d86a22a7d_647x143.png 848w, https://substackcdn.com/image/fetch/$s_!DUkC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F280e711f-6297-4143-8bb3-924d86a22a7d_647x143.png 1272w, https://substackcdn.com/image/fetch/$s_!DUkC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F280e711f-6297-4143-8bb3-924d86a22a7d_647x143.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong>What happened?</strong></p><div><hr></div><h3><strong>4. Experiment 2: Redemption (FineWeb)</strong></h3><p>To prove the architecture works, we needed a dataset that could withstand the power of the modernized model. We switched to FineWeb, a high-quality, massive web dataset.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_6ba!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f10d872-7c49-4081-a141-a13bcf52dab8_809x451.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_6ba!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f10d872-7c49-4081-a141-a13bcf52dab8_809x451.png 424w, https://substackcdn.com/image/fetch/$s_!_6ba!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f10d872-7c49-4081-a141-a13bcf52dab8_809x451.png 848w, https://substackcdn.com/image/fetch/$s_!_6ba!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f10d872-7c49-4081-a141-a13bcf52dab8_809x451.png 1272w, https://substackcdn.com/image/fetch/$s_!_6ba!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f10d872-7c49-4081-a141-a13bcf52dab8_809x451.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_6ba!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f10d872-7c49-4081-a141-a13bcf52dab8_809x451.png" width="809" height="451" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6f10d872-7c49-4081-a141-a13bcf52dab8_809x451.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:451,&quot;width&quot;:809,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50060,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://vishsangale.substack.com/i/181739736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f10d872-7c49-4081-a141-a13bcf52dab8_809x451.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!_6ba!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f10d872-7c49-4081-a141-a13bcf52dab8_809x451.png 424w, https://substackcdn.com/image/fetch/$s_!_6ba!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f10d872-7c49-4081-a141-a13bcf52dab8_809x451.png 848w, https://substackcdn.com/image/fetch/$s_!_6ba!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f10d872-7c49-4081-a141-a13bcf52dab8_809x451.png 1272w, https://substackcdn.com/image/fetch/$s_!_6ba!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f10d872-7c49-4081-a141-a13bcf52dab8_809x451.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Training Loss</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rIRR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42acd906-7b1a-4be6-aca7-abf8dd9fc02d_647x106.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rIRR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42acd906-7b1a-4be6-aca7-abf8dd9fc02d_647x106.png 424w, https://substackcdn.com/image/fetch/$s_!rIRR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42acd906-7b1a-4be6-aca7-abf8dd9fc02d_647x106.png 848w, https://substackcdn.com/image/fetch/$s_!rIRR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42acd906-7b1a-4be6-aca7-abf8dd9fc02d_647x106.png 1272w, https://substackcdn.com/image/fetch/$s_!rIRR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42acd906-7b1a-4be6-aca7-abf8dd9fc02d_647x106.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rIRR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42acd906-7b1a-4be6-aca7-abf8dd9fc02d_647x106.png" width="647" height="106" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/42acd906-7b1a-4be6-aca7-abf8dd9fc02d_647x106.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:106,&quot;width&quot;:647,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:10289,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://vishsangale.substack.com/i/181739736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42acd906-7b1a-4be6-aca7-abf8dd9fc02d_647x106.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!rIRR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42acd906-7b1a-4be6-aca7-abf8dd9fc02d_647x106.png 424w, https://substackcdn.com/image/fetch/$s_!rIRR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42acd906-7b1a-4be6-aca7-abf8dd9fc02d_647x106.png 848w, https://substackcdn.com/image/fetch/$s_!rIRR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42acd906-7b1a-4be6-aca7-abf8dd9fc02d_647x106.png 1272w, https://substackcdn.com/image/fetch/$s_!rIRR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42acd906-7b1a-4be6-aca7-abf8dd9fc02d_647x106.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Validation Loss</figcaption></figure></div><p>On the larger dataset, the overfitting vanished. The modernized model leveraged its superior architecture to learn more generalized patterns, achieving an </p><h4><strong>Seeing is Believing</strong></h4><p>We generated samples from the FineWeb models starting with </p><div><hr></div><h3><strong>5. Experiment 2: The Ablation Study (Who contributed what?)</strong></h3><p>We also ran a quick ablation to see which feature contributes most to early convergence (at Step 100).</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GYLa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a4b45a5-32d6-4dfa-8766-35bcd104ef89_647x228.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GYLa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a4b45a5-32d6-4dfa-8766-35bcd104ef89_647x228.png 424w, https://substackcdn.com/image/fetch/$s_!GYLa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a4b45a5-32d6-4dfa-8766-35bcd104ef89_647x228.png 848w, https://substackcdn.com/image/fetch/$s_!GYLa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a4b45a5-32d6-4dfa-8766-35bcd104ef89_647x228.png 1272w, https://substackcdn.com/image/fetch/$s_!GYLa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a4b45a5-32d6-4dfa-8766-35bcd104ef89_647x228.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GYLa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a4b45a5-32d6-4dfa-8766-35bcd104ef89_647x228.png" width="647" height="228" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8a4b45a5-32d6-4dfa-8766-35bcd104ef89_647x228.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:228,&quot;width&quot;:647,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26199,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://vishsangale.substack.com/i/181739736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a4b45a5-32d6-4dfa-8766-35bcd104ef89_647x228.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!GYLa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a4b45a5-32d6-4dfa-8766-35bcd104ef89_647x228.png 424w, https://substackcdn.com/image/fetch/$s_!GYLa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a4b45a5-32d6-4dfa-8766-35bcd104ef89_647x228.png 848w, https://substackcdn.com/image/fetch/$s_!GYLa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a4b45a5-32d6-4dfa-8766-35bcd104ef89_647x228.png 1272w, https://substackcdn.com/image/fetch/$s_!GYLa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a4b45a5-32d6-4dfa-8766-35bcd104ef89_647x228.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><div><hr></div><h3><strong>Conclusion</strong></h3><p>Modernizing legacy architectures isn&#8217;t just about pasting in new code. It&#8217;s about understanding the relationship between model expressivity and data scale.</p><ul><li><p><strong>RoPE + SwiGLU + RMSNorm</strong> makes the model a more efficient learner.</p></li><li><p>On small data, this efficiency manifests as <strong>overfitting</strong>.</p></li><li><p>On large data, it manifests as <strong>superior performance and generalization</strong>.</p></li></ul><p>We now have a GPT-2 codebase that is backward compatible with 2019 checkpoints but capable of 2025 performance when fed the right data.</p><p></p><p><em><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of or attributable to their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</em></p>]]></content:encoded></item><item><title><![CDATA[A framework for ML Design Interviews]]></title><description><![CDATA[Sharing our ML design framework that has been very successful in FAANG interview]]></description><link>https://recsysml.substack.com/p/a-framework-for-ml-design-interviews</link><guid isPermaLink="false">https://recsysml.substack.com/p/a-framework-for-ml-design-interviews</guid><dc:creator><![CDATA[Gaurav Chakravorty]]></dc:creator><pubDate>Sat, 10 Jan 2026 15:02:45 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/fd4e5c2d-31fd-4565-83c8-3bfd10c4921d_1456x648.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;067ec062-1e70-4776-9804-9636eeb6417c&quot;,&quot;duration&quot;:null}"></div><p>Over time, we have developed a framework for approaching ML Design interviews. Using this framework has had a high success rate. (11+ successful offers at E9 to E6 at Meta/Google/etc, 1 failure). Having a framework allows you to (a) ensure that you pace yourself to cover all key elements  (b) have a coherent narrative in your delivery and (c) helps the interviewer know what you would have covered if you had more time.</p><p>Please <strong>message us on substack</strong> and we will share the framework with you. (We don&#8217;t want to post it to avoid it being used by too many ppl).</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading RecsysML + LLMs! Subscribe for free to receive new insights and learnings from state of the art tech company practitioners</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[The Case for RL-Aligned Ranking in RecSys]]></title><description><![CDATA[Why recsys industry is stuck on probability estimation, and how RLHF is the missing link.]]></description><link>https://recsysml.substack.com/p/stop-predicting-ctr-start-optimizing</link><guid isPermaLink="false">https://recsysml.substack.com/p/stop-predicting-ctr-start-optimizing</guid><dc:creator><![CDATA[Gaurav Chakravorty]]></dc:creator><pubDate>Sat, 03 Jan 2026 11:02:22 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/180256469/125c4b6ad9a54e97dc24ff704494a798.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>LLMs and Recommender systems, like the ones used in <a href="https://recsysml.substack.com/p/personalized-short-video-recommender">video recommendation</a> and <a href="https://recsysml.substack.com/p/friend-recommendation-retrieval-in">friend recommendation</a>, might seem very different to most but in this post we compare them to show they are surprisingly similar. We highlight one key opportunity for recsys community to improve. Under the surface they are a similar problem:</p><blockquote><p><strong>Given a context, choose the next action that maximizes value.</strong><br>Action can be &#8220;Next token&#8221; for LLMs and &#8220;Next item/action&#8221; for RecSys.</p></blockquote><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading RecsysML + LLMs! Subscribe for free to receive novel learnings and ideas weeks before we share on Linkedin</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>1. Retrieval &#8776; Pretraining+SFT. Ranking is missing Reward optimization.</h2><p>LLMs are developed using </p><ol><li><p>pretraining using next token prediction loss on a lot of data not specific to a domain.</p></li><li><p>fine tuned still using next token prediction loss but only on high quality data to improve for the domain, typically at smaller learning rates.</p></li><li><p>optimized using Reinforcement Learning to maximize expected reward (e.g. using <a href="https://www.youtube.com/watch?v=tqrcjHuNdmQ">Policy Gradient</a>).</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0U39!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ba3987d-3930-4211-8de2-0a6ab9684c7b_1324x436.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0U39!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ba3987d-3930-4211-8de2-0a6ab9684c7b_1324x436.png 424w, https://substackcdn.com/image/fetch/$s_!0U39!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ba3987d-3930-4211-8de2-0a6ab9684c7b_1324x436.png 848w, https://substackcdn.com/image/fetch/$s_!0U39!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ba3987d-3930-4211-8de2-0a6ab9684c7b_1324x436.png 1272w, https://substackcdn.com/image/fetch/$s_!0U39!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ba3987d-3930-4211-8de2-0a6ab9684c7b_1324x436.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0U39!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ba3987d-3930-4211-8de2-0a6ab9684c7b_1324x436.png" width="1200" height="395.16616314199393" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1ba3987d-3930-4211-8de2-0a6ab9684c7b_1324x436.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:436,&quot;width&quot;:1324,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:138009,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/180256469?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ba3987d-3930-4211-8de2-0a6ab9684c7b_1324x436.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0U39!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ba3987d-3930-4211-8de2-0a6ab9684c7b_1324x436.png 424w, https://substackcdn.com/image/fetch/$s_!0U39!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ba3987d-3930-4211-8de2-0a6ab9684c7b_1324x436.png 848w, https://substackcdn.com/image/fetch/$s_!0U39!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ba3987d-3930-4211-8de2-0a6ab9684c7b_1324x436.png 1272w, https://substackcdn.com/image/fetch/$s_!0U39!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ba3987d-3930-4211-8de2-0a6ab9684c7b_1324x436.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Retrieval in recsys is done by finding user and item embeddings to maximize the probability of next item interacted by the user. <a href="https://recsysml.substack.com/p/recsys-retrieval-llm-pretrainingsft">This post</a> shows that Pretraining/SFT = Recsys Retrieval and how Semantic ID + clustering are blurring them even more.</p><h2>2. Reward optimization in LLMs vs &#8220;Ranking&#8221; in RecSys</h2><p>This is where the two worlds <strong>spiritually reconnect</strong> but <strong>architecturally diverge</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XtnG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ed9e4b-1b1e-4385-8f59-38f0fd2f2c65_928x426.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XtnG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ed9e4b-1b1e-4385-8f59-38f0fd2f2c65_928x426.png 424w, https://substackcdn.com/image/fetch/$s_!XtnG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ed9e4b-1b1e-4385-8f59-38f0fd2f2c65_928x426.png 848w, https://substackcdn.com/image/fetch/$s_!XtnG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ed9e4b-1b1e-4385-8f59-38f0fd2f2c65_928x426.png 1272w, https://substackcdn.com/image/fetch/$s_!XtnG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ed9e4b-1b1e-4385-8f59-38f0fd2f2c65_928x426.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XtnG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ed9e4b-1b1e-4385-8f59-38f0fd2f2c65_928x426.png" width="928" height="426" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/11ed9e4b-1b1e-4385-8f59-38f0fd2f2c65_928x426.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:426,&quot;width&quot;:928,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:102353,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/180256469?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ed9e4b-1b1e-4385-8f59-38f0fd2f2c65_928x426.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XtnG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ed9e4b-1b1e-4385-8f59-38f0fd2f2c65_928x426.png 424w, https://substackcdn.com/image/fetch/$s_!XtnG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ed9e4b-1b1e-4385-8f59-38f0fd2f2c65_928x426.png 848w, https://substackcdn.com/image/fetch/$s_!XtnG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ed9e4b-1b1e-4385-8f59-38f0fd2f2c65_928x426.png 1272w, https://substackcdn.com/image/fetch/$s_!XtnG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11ed9e4b-1b1e-4385-8f59-38f0fd2f2c65_928x426.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>Note: RL-Aligned vs. Generative RecSys</strong> It is important to distinguish this proposal from &#8220;Generative Recommendation&#8221; (where an LLM directly generates Item IDs as tokens). What I am proposing here is an evolution of the training paradigm, not a replacement of the inference engine. We move from minimizing classification error to maximizing policy reward. It just changes the training loss without any degradation of inference latency or increase in inference cost.</p><p></p><h2>3. LLM Alignment (RLHF etc) uses a reward model to maximize user value</h2><ul><li><p>Train a <strong>reward model</strong> that scores outputs.</p></li><li><p>Train the <strong>policy</strong> (the LLM) to maximize that reward using some form of policy gradient.</p></li><li><p>Use KL-regularization to keep the policy safe, stable, and within distribution.</p></li></ul><p>The key point: LLMs <strong>actively optimize</strong> against a reward model.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\max_{\\theta} E_{y \\sim \\pi_{\\theta}(\\cdot|x)}[R(x,y)]&quot;,&quot;id&quot;:&quot;CSGUUGOZKY&quot;}" data-component-name="LatexBlockToDOM"></div><p>Training learns to find parameters &#952;&#8203; such that the probability of generating a token &#960;&#952;&#8203;(y|x) is highest for pairs (x, y) &#8203;which will lead to highest reward under the reward model provided to it.</p><h2>4. RecSys Ranking model is a probability estimator</h2><p>See <a href="https://recsysml.substack.com/p/ranking-models-explained-deep-dive">this post for a deep dive on Ranking models in recsys</a>. &#8220;Ranking&#8221; models are, as of today, trained to predict a <em>vector of probabilities</em> like:</p><ul><li><p>p(click)</p></li><li><p>p(engagement)</p></li><li><p>p(return next day)</p></li></ul><p>On top of these predictions, engineers write a <strong>hand-coded value function</strong>, for example: Value/Reward = a * p(click)+ b * p(engage) + c * p(return next day)</p><p>This is a modular, transparent, and auditable solution that enables rapid, component-wise experimentation and deployment.</p><p>But critically:</p><blockquote><p><strong>RecSys ranking models do not optimize the value function.<br>They only predict the labels that go into it.</strong></p></blockquote><p>There is <em>no policy gradient step</em>. No RLHF stage.<br>No optimization w.r.t. actual business or user value.</p><p>This is a <strong>fundamental architectural gap</strong>.</p><p>And unlike LLMs, RecSys systems operate in a ~live feedback loop:</p><ul><li><p>The policy (Ranking estimator + Value model) determines the user experience,</p></li><li><p>the experience determines user actions,</p></li><li><p>those actions populate the training data,</p></li><li><p>and the model is trained on that data again.</p></li></ul><p>Yet RecSys systems still treat the &#8220;ranking&#8221; model as a <em>static classifier</em> rather than as a <em>policy</em>.</p><h2>5. The Opportunity: RecSys Needs Its &#8220;RLHF Moment&#8221;</h2><p>The RecSys world already has all the ingredients LLMs needed for RLHF:</p><ul><li><p>a probability estimator (the ranking model)</p></li><li><p>a scalar value model (the downstream business/value estimator)</p></li><li><p>logged human preference data</p></li><li><p>a feedback loop</p></li><li><p>constraints on drift and safety (analogous to KL-regularization in LLMs)</p></li></ul><p>But RecSys stops short of the final step:</p><blockquote><p><strong>Treating the ranking model as a policy and training it to maximize reward.</strong></p></blockquote><p>Imagine a RecSys training pipeline where:</p><ol><li><p>The <strong>value model becomes the &#8220;reward model&#8221;</strong> (both inference reward model and training reward model).</p></li><li><p>The ranking model is updated to maximize this reward:</p><ol><li><p>The value model score for each item is computed using task predictions. This is an invocation of the &#8220;inference reward model&#8221;</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;s_i = \\sum\\limits_{t=1}^{T}\\text{vm}_t * p_{\\theta, i}(\\text{t})&quot;,&quot;id&quot;:&quot;YUJUCQRBLZ&quot;}" data-component-name="LatexBlockToDOM"></div></li><li><p>From this we compute the probability of this item being ranked first (<a href="https://www.statisticshowto.com/plackett-luce-model/">Plackett-Luce model</a>). &#8216;N&#8217; refers to the number of items being ranked.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\pi_\\theta(i|x)= \\frac{e^{s_i}}{\\sum\\limits_{i=1}^{N} e^{s_j}}&quot;,&quot;id&quot;:&quot;HKXQZUGJXY&quot;}" data-component-name="LatexBlockToDOM"></div></li><li><p>Optionally but recommended to compute the probability of the logging policy ranking this item first. You will need to log the probabilities estimated by that model for this. If you don&#8217;t have this then you can assume \pi-beta(i|x) = 1</p></li><li><p>Compute the inverse propensity estimate:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\rho_i = \\frac{\\pi_{\\theta}(i|x)}{\\pi_{\\beta}(i|x)}&quot;,&quot;id&quot;:&quot;AGYWPMHBFD&quot;}" data-component-name="LatexBlockToDOM"></div></li><li><p>Compute an &#8220;observed reward&#8221;. We can use the &#8220;value model&#8221; as this training reward model. For instance, this could be</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;r_i = \\sum\\limits_{t=1}^{T}\\text{vm}_t * \\text{user_action}_{i}(\\text{t})&quot;,&quot;id&quot;:&quot;KBYLXLHMUM&quot;}" data-component-name="LatexBlockToDOM"></div></li><li><p>Add an RL loss which when minimized helps us learn a ranking model that maximizes the reward.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{L}_{\\text{RL}}(i) = - \\rho_i \\cdot r_i&quot;,&quot;id&quot;:&quot;QCLPFKQGNZ&quot;}" data-component-name="LatexBlockToDOM"></div></li><li><p>Try improvements like GRPO (<a href="https://arxiv.org/abs/2402.03300">DeepSeekMath</a>) / ECPO (<a href="https://arxiv.org/abs/2506.13695v3">OneRec</a>) / GBPO (<a href="https://arxiv.org/abs/2508.20900">OneRec-V2</a>) / CISPO(<a href="https://arxiv.org/abs/2506.13585">Minimax-M1</a>) to improve the Off-Policy estimate of the reward. These will no doubt improve the variance of &#8216;&#961;&#8217;.</p></li><li><p>Note: The only term in this loss that is affected by model parameters are the predictions, same as current ranking state. The gradient flows back through the probability estimator $p_\theta(t)$.</p></li></ol></li></ol><h3>Why is this better?</h3><ol><li><p>The system optimizes the actual long-term metric end-to-end.</p></li><li><p>Training is paying more attention to instances that will actually deliver metrics instead of being more accurate in low ROI parts of the training data.</p></li><li><p>This will open up a lever for growth for your recsys, which is to improve the training reward model.</p></li></ol><p><strong>&#8220;Ranking&#8221; would finally become reward optimization.</strong></p><h2>6. Code implementation</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8UHA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b03e54-aa79-47e8-bf74-9639b72d6c49_1928x1508.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8UHA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b03e54-aa79-47e8-bf74-9639b72d6c49_1928x1508.png 424w, https://substackcdn.com/image/fetch/$s_!8UHA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b03e54-aa79-47e8-bf74-9639b72d6c49_1928x1508.png 848w, https://substackcdn.com/image/fetch/$s_!8UHA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b03e54-aa79-47e8-bf74-9639b72d6c49_1928x1508.png 1272w, https://substackcdn.com/image/fetch/$s_!8UHA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b03e54-aa79-47e8-bf74-9639b72d6c49_1928x1508.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8UHA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b03e54-aa79-47e8-bf74-9639b72d6c49_1928x1508.png" width="596" height="466.239010989011" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e3b03e54-aa79-47e8-bf74-9639b72d6c49_1928x1508.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1139,&quot;width&quot;:1456,&quot;resizeWidth&quot;:596,&quot;bytes&quot;:323940,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/180256469?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b03e54-aa79-47e8-bf74-9639b72d6c49_1928x1508.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8UHA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b03e54-aa79-47e8-bf74-9639b72d6c49_1928x1508.png 424w, https://substackcdn.com/image/fetch/$s_!8UHA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b03e54-aa79-47e8-bf74-9639b72d6c49_1928x1508.png 848w, https://substackcdn.com/image/fetch/$s_!8UHA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b03e54-aa79-47e8-bf74-9639b72d6c49_1928x1508.png 1272w, https://substackcdn.com/image/fetch/$s_!8UHA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3b03e54-aa79-47e8-bf74-9639b72d6c49_1928x1508.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://github.com/gauravchak/reward_maximizing_ranking/tree/main&quot;,&quot;text&quot;:&quot;See code here&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://github.com/gauravchak/reward_maximizing_ranking/tree/main"><span>See code here</span></a></p><h2>7. Closing Thoughts</h2><p>LLMs and RecSys systems share a deeper architectural similarity than most people realize.<br>Pretraining mirrors retrieval.<br>Policy sampling at inference mirrors top-K ranking.<br>Reward models mirror value models.</p><p>But LLMs have unlocked remarkable capabilities through <strong>policy-gradient-based preference optimization</strong>,<br>while RecSys still primarily relies on <strong>probability estimation + handcrafted value models</strong>.</p><p>The RecSys field is on the cusp of the same evolution.</p><blockquote><p><strong>The next major leap in recommender systems will come when ranking models shift from estimating probabilities to maximizing reward&#8212;just like modern LLMs.</strong></p></blockquote><p>We already have retrieval.<br>We already have value models.<br>We already have logged user preferences.<br>We already have constraints and safety layers.</p><p>All that&#8217;s missing is the optimization layer.</p><p>RecSys is ready for its RLHF moment.</p><p></p><p></p><p><em><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</em></p>]]></content:encoded></item><item><title><![CDATA[Learning ML by doing Part1 | GPT-2]]></title><description><![CDATA[Replicating the 124M parameter model on a single consumer GPU.]]></description><link>https://recsysml.substack.com/p/training-gpt-2-on-a-budget</link><guid isPermaLink="false">https://recsysml.substack.com/p/training-gpt-2-on-a-budget</guid><dc:creator><![CDATA[Vish Sangale]]></dc:creator><pubDate>Sat, 20 Dec 2025 14:02:40 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/182108517/34a78d925a82866290b3bd768fed4154.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><strong><a href="https://github.com/vishsangale/gpt-2">Link to GitHub Repository</a></strong></p><div><hr></div><blockquote><p><em>&#8220;What I cannot create, I do not understand.&#8221; &#8212; Richard Feynman</em></p></blockquote><p>In the age of massive API-based LLMs, the art of training your own model from scratch can feel like lost knowledge. It is often assumed that you need a cluster of H100s to do anything meaningful. But I wanted to challenge that assumption.</p><p>My goal was simple yet ambitious: <strong>Replicate the 124M parameter GPT-2 Small model from scratch on a single consumer GPU (RTX 5080) and engineer it to run as fast as possible.</strong></p><p>This post describes the journey from a blank Python file to a highly optimized training pipeline that processes over <strong>92,000 tokens per second</strong>&#8212;a <strong>3.1x speedup</strong> over the baseline implementation.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Subscribe for free to receive new posts weeks in advance of Linkedin posts</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>1. The Architecture: What is GPT-2?</h2><p>At its core, <a href="https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf">GPT-2</a> is a <strong>Decoder-only Transformer</strong>. Unlike BERT (which uses an Encoder) or T5 (Encoder-Decoder), GPT-2 is built to predict the <em>next token</em> in a sequence. This autoregressive property is what makes it a &#8220;Generative&#8221; model.</p><p>Our implementation (<code>model.py</code>) mirrors the original OpenAI specifications for the 124M model:</p><ul><li><p><strong>Parameters:</strong> ~124 Million</p></li><li><p><strong>Layers:</strong> 12 Transformer Blocks</p></li><li><p><strong>Attention Heads:</strong> 12</p></li><li><p><strong>Embedding Dimension:</strong> 768</p></li><li><p><strong>Context Length:</strong> 1024 tokens</p></li><li><p><strong>Tokenizer:</strong> Byte Pair Encoding (BPE) using OpenAI&#8217;s <code>tiktoken</code> library.</p></li></ul><h3>Key Components</h3><ol><li><p><strong>Causal Self-Attention:</strong> The heart of the model. It allows each token to attend to previous tokens but <em>masks</em> future tokens so the model can&#8217;t &#8220;cheat&#8221; by seeing the answer.</p></li><li><p><strong>Learned Positional Embeddings:</strong> Since Transformers process tokens in parallel, they have no inherent sense of order. We learn a vector for each position (0 to 1023) to give the model spatial awareness.</p></li><li><p><strong>Weight Tying:</strong> A clever memory-saving trick where the embedding layer weights are reused for the final output projection.</p></li></ol><h2>2. The Implementation</h2><p>We started with a clean slate. The dataset of choice was <strong>TinyShakespeare</strong>, a classic character modeling benchmark that fits easily into memory but is complex enough to learn grammar and structure.</p><p>Everything was written in pure PyTorch. The training loop was standard:</p><ol><li><p>Sample a batch of text.</p></li><li><p>Forward pass (compute logits).</p></li><li><p>Compute Cross-Entropy Loss against the targets (shifted input).</p></li><li><p>Backward pass (compute gradients).</p></li><li><p>Optimizer step (update weights).</p></li></ol><h3>The &#8220;Naive&#8221; Baseline</h3><p>My first attempt was functional but fragile.</p><ul><li><p><strong>Batch Size:</strong> 4 (The GPU ran out of memory at 8 or 12).</p></li><li><p><strong>Throughput:</strong> ~29,000 tokens/second.</p></li><li><p><strong>GPU Utilization:</strong> Spiky and inefficient.</p></li></ul><p>It worked, but it was painfully slow. At this rate, convergence would take days.</p><h2>3. Engineering the Speedup &#128640;</h2><p>I didn&#8217;t want to wait days. I wanted to engineer my way out of the bottleneck. Here is how we optimized the pipeline to achieve a <strong>3.1x speedup</strong>.</p><h3>Flash Attention 2 &#9889;</h3><p>The standard attention mechanism requires calculating an N&#215;N matrix (Attention Scores). For long sequences, this consumes massive amounts of VRAM (O(<code>N^2</code>) memory complexity).</p><p>We switched to <code>torch.nn.functional.scaled_dot_product_attention</code>, which leverages <strong>Flash Attention 2</strong>. This fused kernel computes attention without materializing the full matrix in high-bandwidth memory.</p><ul><li><p><strong>Result:</strong> Memory usage plummeted. I could instantly double the batch size from 4 to 8, and later 16.</p></li></ul><h3>Vocabulary Padding &#128207;</h3><p>The standard GPT-2 vocabulary size is <strong>50,257</strong>. This is an odd number that plays poorly with GPU hardware, which prefers powers of 2 (or multiples of 64/128) for efficient tiling.</p><ul><li><p><strong>Fix:</strong> We padded the vocabulary to <strong>50,304</strong> (the nearest multiple of 64).</p></li><li><p><strong>Result:</strong> The final linear projection layer became significantly faster due to memory alignment.</p></li></ul><h3>Feeding the Beast (Data Loading) &#127869;&#65039;</h3><p>Profiling via <code>nvidia-smi</code> showed the GPU dropping to 0% utilization periodically. This meant the GPU was starving&#8212;waiting for the CPU to load the next batch of data.</p><ul><li><p><strong>Fix:</strong> We implemented a custom <code>GPT2Dataset</code> and used PyTorch&#8217;s <code>DataLoader</code> with <code>num_workers=4</code> and <code>pin_memory=True</code>.</p></li><li><p><strong>Result:</strong> Asynchronous data prefetching kept the GPU pinned at 99% usage. Throughput jumped to <strong>63k tokens/sec</strong>.</p></li></ul><h3>Fused AdamW &#128293;</h3><p>The AdamW optimizer updates all 124 million parameters. Doing this sequentially in Python is slow due to interpreter overhead.</p><ul><li><p><strong>Fix:</strong> We set <code>fused=True</code> in <code>torch.optim.AdamW</code>.</p></li><li><p><strong>Result:</strong> The entire optimizer step is launched as a single CUDA kernel, removing CPU overhead.</p></li></ul><h3><code>torch.compile</code> (The Final Boss) &#128736;&#65039;</h3><p>PyTorch 2.0 introduced <code>torch.compile</code>, which JIT-compiles your model into optimized kernels (Triton). It fuses operations (like LayerNorm + Linear) to reduce memory access.</p><ul><li><p><strong>Result:</strong> This provided the final massive boost, pushing us over <strong>92,000 tokens/sec</strong>.</p></li></ul><h2>4. The Results &#128200;</h2><p>After all optimizations, the training run was stable and lightning-fast.</p><p><strong>Loss Curve:</strong> Over 150 iterations, we can see loss droping continuously and shows model converging.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HM91!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024979a-3383-4f6d-9124-6c2f1634f35a_400x367.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HM91!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024979a-3383-4f6d-9124-6c2f1634f35a_400x367.png 424w, https://substackcdn.com/image/fetch/$s_!HM91!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024979a-3383-4f6d-9124-6c2f1634f35a_400x367.png 848w, https://substackcdn.com/image/fetch/$s_!HM91!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024979a-3383-4f6d-9124-6c2f1634f35a_400x367.png 1272w, https://substackcdn.com/image/fetch/$s_!HM91!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024979a-3383-4f6d-9124-6c2f1634f35a_400x367.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HM91!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024979a-3383-4f6d-9124-6c2f1634f35a_400x367.png" width="400" height="367" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6024979a-3383-4f6d-9124-6c2f1634f35a_400x367.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:367,&quot;width&quot;:400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19456,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://vishsangale.substack.com/i/181642131?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024979a-3383-4f6d-9124-6c2f1634f35a_400x367.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!HM91!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024979a-3383-4f6d-9124-6c2f1634f35a_400x367.png 424w, https://substackcdn.com/image/fetch/$s_!HM91!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024979a-3383-4f6d-9124-6c2f1634f35a_400x367.png 848w, https://substackcdn.com/image/fetch/$s_!HM91!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024979a-3383-4f6d-9124-6c2f1634f35a_400x367.png 1272w, https://substackcdn.com/image/fetch/$s_!HM91!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6024979a-3383-4f6d-9124-6c2f1634f35a_400x367.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Performance Breakdown:</strong></p><h3>What did it learn?</h3><p>After training for just a few hours, the model learned to generate Shakespearean-style text. It captures the vocabulary, the archaic grammar, and the dramatic structure of plays (Speakers, dialogue, stage directions).</p><p><strong>Sample Output:</strong></p><pre><code><code>KING RICHARD II:
What implies this silence?
Tell me, what is the matter?

BOLINGBROKE:
My lord, I have acceptance of your grace,
And with a soul of love, I do beseech needed
remedy.
</code></code></pre><h2>Next Steps:</h2><p>As a next step, I am going to change the architecture to use recent advancements</p><h3><strong>1. Rotary Positional Embeddings (RoPE) &#128260;</strong></h3><ul><li><p><strong>What it replaces</strong>: Learned Absolute Positional Embeddings (<code>nn.Embedding(block_size, n_embd)</code>).</p></li><li><p><strong>Benefits</strong>:</p><ul><li><p>Better generalization to sequence lengths longer than seen during training.</p></li><li><p>Relative position encoding property (tokens know how far apart they are).</p></li><li><p>Standard in Llama, PaLM, Mistral.</p></li></ul></li><li><p><strong>Implementation</strong>: Requires modifying</p><p><strong>CausalSelfAttention</strong> to rotate Q and K vectors.</p></li></ul><h3><strong>2. RMSNorm (Root Mean Square Normalization) &#128207;</strong></h3><ul><li><p><strong>What it replaces</strong>: <code>nn.LayerNorm</code>.</p></li><li><p><strong>Benefits</strong>:</p><ul><li><p>Computationally cheaper (re-scaling invariance, no mean calculation).</p></li><li><p>Often leads to slightly better training stability.</p></li></ul></li><li><p><strong>Implementation</strong>: Drop-in replacement for LayerNorm.</p></li></ul><h3><strong>3. SwiGLU Activation &#9889;</strong></h3><ul><li><p><strong>What it replaces</strong>: <code>GELU</code> (Gaussian Error Linear Unit).</p></li><li><p><strong>Benefits</strong>:</p><ul><li><p>Demonstrated better performance than GELU/ReLU in compute-matched experiments (PaLM paper).</p></li></ul></li><li><p><strong>Implementation</strong>: Changes the MLP structure from <code>x -&gt; gelu(x * W_1) W_2</code></p><p> to </p><p>This adds parameters, so we usually reduce the hidden dimension from <code>4d</code> to <code>8/3d </code>(or similar) to keep parameter count roughly the same.</p></li></ul><h3><strong>4. Grouped Query Attention (GQA) &#127950;&#65039;</strong></h3><ul><li><p><strong>What it replaces</strong>: Multi-Head Attention (MHA).</p></li><li><p><strong>Benefits</strong>:</p><ul><li><p>Massively reduces KV cache size during interference.</p></li><li><p>Faster inference decoding speed.</p></li><li><p>Slightly degrades performance compared to MHA, but huge efficiency win.</p></li></ul></li><li><p><strong>Implementation</strong>: Sharing Key/Value heads across multiple Query heads.</p></li></ul><h3><strong>5. Mixture of Experts (MoE) &#129504;</strong></h3><ul><li><p><strong>What it replaces</strong>: Dense MLP layers.</p></li><li><p><strong>Benefits</strong>:</p><ul><li><p>Scale total parameters (capacity) without increasing compute per token (only top-k experts active).</p></li><li><p>&#8220;Sparse&#8221; model.</p></li></ul></li><li><p><strong>Complexity</strong>: High. Requires complex routing logic and balancing loss to prevent expert collapse. Might be overkill for a small 124M experiment, but fun to try.</p></li></ul><h2>Takeaway</h2><p>You don&#8217;t always need more GPUs. Sometimes you just need better engineering.</p><p>By respecting the hardware&#8212;aligning memory, fusing kernels, and asynchronous loading&#8212;we squeezed 300% more performance out of the same card.</p><p></p><p>Check out the full code and try it yourself: <strong><a href="https://github.com/vishsangale/gpt-2">View Project on GitHub</a></strong></p><p>Connect with Vish on <a href="https://www.linkedin.com/in/vishsangale/">LinkedIn</a></p><p></p><p><em><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of or attributable to their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</em></p>]]></content:encoded></item><item><title><![CDATA[Recsys Retrieval ~ LLM Pretraining/SFT]]></title><description><![CDATA[Showing equivalence of Embedding based retrieval in recommender systems and next token prediction based pretraining and supervised fine tuning in LLMs, and how semantic/cluster ids are helping further]]></description><link>https://recsysml.substack.com/p/recsys-retrieval-llm-pretrainingsft</link><guid isPermaLink="false">https://recsysml.substack.com/p/recsys-retrieval-llm-pretrainingsft</guid><dc:creator><![CDATA[Gaurav Chakravorty]]></dc:creator><pubDate>Sun, 30 Nov 2025 22:49:58 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/180265550/c3948b30337fec1c988161207a46387e.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>LLM pretraining is next-token prediction over ~100k vocabulary tokens.<br>RecSys retrieval is next-item prediction over millions of candidates. Otherwise they are the same mathematically.</p><p>Both systems:</p><ul><li><p>embed the context into a vector</p></li><li><p>compute dot-products between this vector and a large embedding matrix</p></li><li><p>produce a distribution over discrete items/tokens with softmax or sampled softmax</p></li><li><p>train using cross-entropy</p></li></ul><p>In fact, if you look at the final linear layer of an LLM, it <em>is</em> the token embedding matrix.<br>The model outputs logits by:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{logit}_i = h^\\top W_i&quot;,&quot;id&quot;:&quot;ZHLBHFUNHH&quot;}" data-component-name="LatexBlockToDOM"></div><ol><li><p>h: embedding the context into a D dimensional matrix.</p></li><li><p>W [D, V] : Last layer of the MLP that produces [V] outputs, corresponding to the probabilities of the V possible tokens.</p></li></ol><p>Hence the logit (~ log probability) of token i in an LLM is just a function of the dot product of vectors &#8216;h&#8217; and &#8216;W[:, i]&#8217;. </p><p>If you think of W[:, i] as the item i&#8217;s embedding, this is exactly how <a href="https://recsysml.substack.com/p/scalable-embedding-based-retrieval">embedding based retrieval</a>, a.k.a. &#8220;<a href="https://recsysml.substack.com/p/two-tower-models-for-retrieval-of">Two Tower Model</a>&#8221;, computes similarity to candidates.</p><h2>Same loss function - just applied based on corpus size</h2><p>Loss</p><ol><li><p>Estimate the probability of each valid candidate (token or item in batch). Here V is the number of valid candidates during training. The expression below is also called &#8220;softmax&#8221;. See equation 2 in <a href="https://arxiv.org/pdf/1310.4546">&#8220;word2vec&#8221; paper Mikolov et al. 2013</a></p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;P(i) = \\frac{e^{\\text{logit}_i}}{\\sum_{j=1}^{V} e^{\\text{logit}_j}}&quot;,&quot;id&quot;:&quot;DHHJTQIDFX&quot;}" data-component-name="LatexBlockToDOM"></div></li><li><p>Loss is how different these probabilities are from the ground truth observation (the next token in the training data, or the item the user interacted with). This is also called &#8220;classification loss&#8221;.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{L}(i) = -\\log\\left(\\frac{e^{\\text{logit}_i}}{\\sum_{j=1}^{V} e^{\\text{logit}_j}}\\right)&quot;,&quot;id&quot;:&quot;ZCPLBRNMYP&quot;}" data-component-name="LatexBlockToDOM"></div></li></ol><p>LLMs can compute this loss since the number of valid tokens, V, is about 100k and this is small enough for modern GPUs to do the operations above. For industrial recommenders, with 100 million+ valid items to recommend, this is infeasible.</p><p>Mikolov et al. proposed two solutions for this:</p><ol><li><p>Hierarchical softmax</p></li><li><p>Sampled softmax: This is mostly what is used in recsys retrieval due to good results and training efficiency.</p></li></ol><h2>Semantic IDs</h2><p>Recent efforts using &#8220;semantic IDs&#8221; are trying to bring these even closer. &#8220;<a href="https://recsysml.substack.com/p/how-to-implement-generative-retrieval">Generative retrieval</a>&#8221; efforts like <a href="https://arxiv.org/abs/2305.05065">TIGER</a> try to bridge this gap with full softmax using hierarchical clustering.</p><p><a href="https://arxiv.org/abs/2509.03746">Another Google Research paper</a> shows that clustering is the key and not inference time generation. It finds high information density pathways that allows retrieval to discard large irrelevant areas of the item corpus. Adding full softmax over item clusters in the current retrieval pipeline produces results as good or better than generative retrieval while being an order of magnitude faster.</p><p></p><h2>Prior posts to understand generative retrieval and semantic IDs</h2><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;e239703f-77ab-4af4-8d98-594d9a3918ee&quot;,&quot;caption&quot;:&quot;Improving Recsys with GenAI&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;How to implement Generative Retrieval&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5668214,&quot;name&quot;:&quot;Gaurav Chakravorty&quot;,&quot;bio&quot;:&quot;- Applied ML in Recommender systems (Facebook / Instagram, Google, Discord)\n- 20 years in Applied ML&quot;,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f40461d9-d0cc-4a2b-bc68-d46e2c022079_401x401.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:347886544,&quot;name&quot;:&quot;Samson Komo&quot;,&quot;bio&quot;:&quot;I geek on RecSys&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/22214e93-54fd-4828-9d13-d76e1ee66166_144x144.png&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://samsonkomo.substack.com/subscribe?&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://samsonkomo.substack.com&quot;,&quot;primaryPublicationName&quot;:&quot;Samson Komo&quot;,&quot;primaryPublicationId&quot;:5110422}],&quot;post_date&quot;:&quot;2025-06-05T14:30:25.993Z&quot;,&quot;cover_image&quot;:&quot;https://substack-video.s3.amazonaws.com/video_upload/post/164981049/f0dd71d3-9b65-4fd9-85db-402d73c4a9ea/transcoded-373017.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://recsysml.substack.com/p/how-to-implement-generative-retrieval&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:&quot;f0dd71d3-9b65-4fd9-85db-402d73c4a9ea&quot;,&quot;id&quot;:164981049,&quot;type&quot;:&quot;podcast&quot;,&quot;reaction_count&quot;:20,&quot;comment_count&quot;:7,&quot;publication_id&quot;:274781,&quot;publication_name&quot;:&quot;Applied ML | Recommender systems&quot;,&quot;publication_logo_url&quot;:&quot;&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;a189b209-d9a5-47fb-b802-f46f469ce169&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Building Generative Friend Recommendations&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5668214,&quot;name&quot;:&quot;Gaurav Chakravorty&quot;,&quot;bio&quot;:&quot;- Applied ML in Recommender systems (Facebook / Instagram, Google, Discord)\n- 20 years in Applied ML&quot;,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f40461d9-d0cc-4a2b-bc68-d46e2c022079_401x401.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-08-29T23:10:30.276Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/youtube/w_728,c_limit/gc0Jfq3njV8&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://recsysml.substack.com/p/building-generative-friend-recommendations&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:171806043,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:16,&quot;comment_count&quot;:0,&quot;publication_id&quot;:274781,&quot;publication_name&quot;:&quot;Applied ML | Recommender systems&quot;,&quot;publication_logo_url&quot;:&quot;&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;250008c2-03fe-44f6-bf8f-3cac1e0fd2d6&quot;,&quot;caption&quot;:&quot;The intent of the post is to explain the ranking model in detail to tee up future posts explaining how this should change learning from LLM advances.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Ranking Models explained: Deep Dive into RecSys Architecture (Features, Embeddings, &amp; Attention)&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5668214,&quot;name&quot;:&quot;Gaurav Chakravorty&quot;,&quot;bio&quot;:&quot;- Applied ML in Recommender systems (Facebook / Instagram, Google, Discord)\n- 20 years in Applied ML&quot;,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f40461d9-d0cc-4a2b-bc68-d46e2c022079_401x401.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-30T19:48:02.356Z&quot;,&quot;cover_image&quot;:&quot;https://substack-video.s3.amazonaws.com/video_upload/post/180303168/41cd1fdc-9ad5-40fd-ba4d-97350d857d4d/transcoded-00001.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://recsysml.substack.com/p/ranking-models-explained-deep-dive&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:&quot;41cd1fdc-9ad5-40fd-ba4d-97350d857d4d&quot;,&quot;id&quot;:180303168,&quot;type&quot;:&quot;podcast&quot;,&quot;reaction_count&quot;:5,&quot;comment_count&quot;:1,&quot;publication_id&quot;:274781,&quot;publication_name&quot;:&quot;Applied ML | Recommender systems&quot;,&quot;publication_logo_url&quot;:&quot;&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p><em><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</em></p>]]></content:encoded></item><item><title><![CDATA[Ranking Models explained: Deep Dive into RecSys Architecture (Features, Embeddings, & Attention)]]></title><description><![CDATA[Watch now (18 mins) | What happens post retrieval in a recommender system]]></description><link>https://recsysml.substack.com/p/ranking-models-explained-deep-dive</link><guid isPermaLink="false">https://recsysml.substack.com/p/ranking-models-explained-deep-dive</guid><dc:creator><![CDATA[Gaurav Chakravorty]]></dc:creator><pubDate>Sun, 30 Nov 2025 19:48:02 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/180303168/94884a7e8795f21bfb4b816c899c4643.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>The intent of the post is to explain the ranking model in detail to tee up future posts explaining how this should change learning from LLM advances.</p><h2>High level structure</h2><p>Retrieval &#8212;&gt; a relatively small set of candidates &#8212;&gt; [ Ranking + Value Model ] &#8212;&gt; Sorted order presented to the user.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Tmlh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3833c13b-ddda-4ded-bc17-acd777cd4718_1026x1056.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Tmlh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3833c13b-ddda-4ded-bc17-acd777cd4718_1026x1056.png 424w, https://substackcdn.com/image/fetch/$s_!Tmlh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3833c13b-ddda-4ded-bc17-acd777cd4718_1026x1056.png 848w, https://substackcdn.com/image/fetch/$s_!Tmlh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3833c13b-ddda-4ded-bc17-acd777cd4718_1026x1056.png 1272w, https://substackcdn.com/image/fetch/$s_!Tmlh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3833c13b-ddda-4ded-bc17-acd777cd4718_1026x1056.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Tmlh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3833c13b-ddda-4ded-bc17-acd777cd4718_1026x1056.png" width="410" height="421.98830409356725" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3833c13b-ddda-4ded-bc17-acd777cd4718_1026x1056.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1056,&quot;width&quot;:1026,&quot;resizeWidth&quot;:410,&quot;bytes&quot;:108284,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/180303168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3833c13b-ddda-4ded-bc17-acd777cd4718_1026x1056.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Tmlh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3833c13b-ddda-4ded-bc17-acd777cd4718_1026x1056.png 424w, https://substackcdn.com/image/fetch/$s_!Tmlh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3833c13b-ddda-4ded-bc17-acd777cd4718_1026x1056.png 848w, https://substackcdn.com/image/fetch/$s_!Tmlh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3833c13b-ddda-4ded-bc17-acd777cd4718_1026x1056.png 1272w, https://substackcdn.com/image/fetch/$s_!Tmlh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3833c13b-ddda-4ded-bc17-acd777cd4718_1026x1056.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So &#8220;Ranking&#8221; = Estimator model + VM which:</p><ol><li><p>compute estimates of probabilities of user actions / user experience labels on presenting this item</p></li><li><p>use a <a href="https://recsysml.substack.com/p/declarative-value-model-tuning?utm_source=publication-search">Value model</a> to compute a weighted sum of these probabilities into a single score and rank with it.</p></li></ol><h2>Ranking model architecture</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W1wO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa05818ea-46a8-4537-a531-dfc93f60794e_1853x1430.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W1wO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa05818ea-46a8-4537-a531-dfc93f60794e_1853x1430.png 424w, https://substackcdn.com/image/fetch/$s_!W1wO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa05818ea-46a8-4537-a531-dfc93f60794e_1853x1430.png 848w, https://substackcdn.com/image/fetch/$s_!W1wO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa05818ea-46a8-4537-a531-dfc93f60794e_1853x1430.png 1272w, https://substackcdn.com/image/fetch/$s_!W1wO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa05818ea-46a8-4537-a531-dfc93f60794e_1853x1430.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W1wO!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa05818ea-46a8-4537-a531-dfc93f60794e_1853x1430.png" width="1280" height="988.1318681318681" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a05818ea-46a8-4537-a531-dfc93f60794e_1853x1430.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:1124,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1280,&quot;bytes&quot;:204128,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/180303168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa05818ea-46a8-4537-a531-dfc93f60794e_1853x1430.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!W1wO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa05818ea-46a8-4537-a531-dfc93f60794e_1853x1430.png 424w, https://substackcdn.com/image/fetch/$s_!W1wO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa05818ea-46a8-4537-a531-dfc93f60794e_1853x1430.png 848w, https://substackcdn.com/image/fetch/$s_!W1wO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa05818ea-46a8-4537-a531-dfc93f60794e_1853x1430.png 1272w, https://substackcdn.com/image/fetch/$s_!W1wO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa05818ea-46a8-4537-a531-dfc93f60794e_1853x1430.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Explanation of terms:</p><ol><li><p>&#8220;float features&#8221; are typically single values, like number of mutual friends between viewer and target in <a href="https://recsysml.substack.com/p/friend-recommendation-retrieval-in">friend recommendations</a> or cosine similarity between graph neural network embeddings of viewer and target. These are also sometimes called &#8220;dense features&#8221;.</p></li><li><p>&#8220;embedding features&#8221; are full embeddings, like 128 floating points of a 128 dimension graph neural network embedding. The reason these are not just treated as &#8220;float features&#8221; is because processing them in the model as the vector/tensor leads to better model accuracy.</p></li><li><p>&#8220;sparse features&#8221; can be either class-based features like category-id of a song in music recommendation or also lists of ids like &#8220;last N user ids messaged by the user&#8221; if they are summed up into a single embedding and not retained as a list.</p></li><li><p>&#8220;User history sequences&#8221; are typically lists of ids like &#8220;last N user ids messaged by the user&#8221; but the entire sequence is retained and available to the model.</p></li><li><p>The &#8220;Interaction Arch&#8221; can be thought of as processing the non sequence features into a &#8220;user static pathway&#8221; the borrow the term from <a href="https://arxiv.org/pdf/2506.13695">OneRec Technical Report</a>.</p></li><li><p>To read the dimensions in the image above, for example [B, E, D] refers to [batch size, number of embeddings, D is a fixed number that is shared by most embeddings similar to d_model in LLMs or <a href="https://arxiv.org/pdf/2506.13695">OneRec</a>.</p></li></ol><p></p><h1>Prior posts on ranking</h1><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;4bd94174-0cd4-4a95-bfde-405cabeb6f09&quot;,&quot;caption&quot;:&quot;We show the importance of calibration in ranking models and how to implement it efficiently.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Ranking model calibration in recommender systems&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5668214,&quot;name&quot;:&quot;Gaurav Chakravorty&quot;,&quot;bio&quot;:&quot;- Applied ML in Recommender systems (Facebook / Instagram, Google, Discord)\n- 20 years in Applied ML&quot;,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f40461d9-d0cc-4a2b-bc68-d46e2c022079_401x401.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:1502916,&quot;name&quot;:&quot;Marc Ferradou&quot;,&quot;bio&quot;:&quot;Recsys + neovim = &#10084;&#65039;&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552c29ae-6319-4174-a8f6-84c5998eca15_512x512.png&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null,&quot;primaryPublicationSubscribeUrl&quot;:&quot;https://codingisfun.substack.com/subscribe?&quot;,&quot;primaryPublicationUrl&quot;:&quot;https://codingisfun.substack.com&quot;,&quot;primaryPublicationName&quot;:&quot;Coding is Fun&quot;,&quot;primaryPublicationId&quot;:2670520}],&quot;post_date&quot;:&quot;2024-06-09T00:27:32.696Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338ecd74-c686-4d41-908a-1f72249165c7_1428x1236.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://recsysml.substack.com/p/ranking-model-calibration-in-recommender&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:144999165,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:22,&quot;comment_count&quot;:2,&quot;publication_id&quot;:274781,&quot;publication_name&quot;:&quot;Applied ML | Recommender systems&quot;,&quot;publication_logo_url&quot;:&quot;&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;fef6e231-757b-41d5-96fa-f0abbe93e8f1&quot;,&quot;caption&quot;:&quot;- Jointly with Ameya Raul author of Conformity-Aware Multi-task Ranking&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Reducing selection bias / popularity bias in ranking&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5668214,&quot;name&quot;:&quot;Gaurav Chakravorty&quot;,&quot;bio&quot;:&quot;- Applied ML in Recommender systems (Facebook / Instagram, Google, Discord)\n- 20 years in Applied ML&quot;,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f40461d9-d0cc-4a2b-bc68-d46e2c022079_401x401.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null},{&quot;id&quot;:197346736,&quot;name&quot;:&quot;Ameya Raul&quot;,&quot;bio&quot;:&quot;Ranking engineer in Video recommendations at Meta&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f88810c-f5a7-4bd0-870c-e1c58aca7201_144x144.png&quot;,&quot;is_guest&quot;:true,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-01-20T15:15:12.475Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F385c0174-b4ce-49f9-a60c-79d5afa8821f_1542x1120.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://recsysml.substack.com/p/reducing-selection-bias-popularity&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:140439645,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:0,&quot;publication_id&quot;:274781,&quot;publication_name&quot;:&quot;Applied ML | Recommender systems&quot;,&quot;publication_logo_url&quot;:&quot;&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;c2481555-c9b7-4c7b-886b-690ac471fe28&quot;,&quot;caption&quot;:&quot;Summary&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Does your model get better at task T when you rank by estimated probability p(T) ?&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5668214,&quot;name&quot;:&quot;Gaurav Chakravorty&quot;,&quot;bio&quot;:&quot;- Applied ML in Recommender systems (Facebook / Instagram, Google, Discord)\n- 20 years in Applied ML&quot;,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f40461d9-d0cc-4a2b-bc68-d46e2c022079_401x401.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-12-22T19:38:09.490Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/youtube/w_728,c_limit/IHH47nZ7FZU&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://recsysml.substack.com/p/does-your-model-get-better-at-task&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:139870906,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:6,&quot;comment_count&quot;:0,&quot;publication_id&quot;:274781,&quot;publication_name&quot;:&quot;Applied ML | Recommender systems&quot;,&quot;publication_logo_url&quot;:&quot;&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><p></p><p><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</p>]]></content:encoded></item><item><title><![CDATA[Building Generative Friend Recommendations]]></title><description><![CDATA[OneRec has established a blueprint for Generative Video Recs. This post shows what to change for Generative Friend Recommendations]]></description><link>https://recsysml.substack.com/p/building-generative-friend-recommendations</link><guid isPermaLink="false">https://recsysml.substack.com/p/building-generative-friend-recommendations</guid><dc:creator><![CDATA[Gaurav Chakravorty]]></dc:creator><pubDate>Fri, 29 Aug 2025 23:10:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/gc0Jfq3njV8" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;def92794-7e21-402c-a1b7-0bf22f40c4ff&quot;,&quot;duration&quot;:null}"></div><p>Main difference between <a href="https://recsysml.substack.com/p/personalized-short-video-recommender">recommending videos</a> and <a href="https://recsysml.substack.com/p/friend-recommendation-retrieval-in">recommending friends</a> is that in friend-recs we are working with a 1000 times less positive signal and most of it delayed. </p><p>Out of the five parts of <a href="https://recsysml.substack.com/p/how-to-implement-generative-retrieval">generative recs</a>: (a) Semantic Embeddings (b) Tokenization (c) Modeling (d) Training Losses for modeling (e) Reward modeling<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> illustrated in <a href="https://arxiv.org/abs/2506.13695">OneRec Technical Report</a>, <strong>Semantic Embeddings</strong> and <strong>Training Losses</strong> are chiefly the ones that need to be changed when building generative recommendations to <a href="https://recsysml.substack.com/p/friend-recommendation-retrieval-in">recommend friends</a> like &#8220;people you may know&#8221;. <br>(Yes, this is a simplified view, but one to help you get to good enough MVP)</p><p><strong>Outline:</strong> In the rest of the post:</p><ol><li><p>We explain the seismic shift happening in recommender systems industry after OneRec paper and why.</p></li><li><p>A high level summary of OneRec</p></li><li><p>Which parts need to be changed for Generative Friend Recs and why</p></li></ol><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Applied ML | Recommender systems! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div id="youtube2-gc0Jfq3njV8" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;gc0Jfq3njV8&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/gc0Jfq3njV8?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>^ video explaining the post and presenting slides</p><p></p><h2>Generative recs is changing the world of recsys</h2><p><a href="https://arxiv.org/abs/2506.13695">OneRec</a> achieves lower system complexity, better app stay time and lower organizational investment using generative recommendations.</p><p>To elaborate, <a href="https://arxiv.org/abs/2305.05065">Youtube&#8217;s TIGER paper</a> demonstrated that if we can represent the videos to be recommended in under 100K tokens<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>, we can use <a href="https://recsysml.substack.com/p/how-to-implement-generative-retrieval">LLM models to recommend</a>. <a href="https://arxiv.org/abs/2506.13695">OneRec</a> goes one step further and builds a reward model that enables them to replace the <a href="https://recsysml.substack.com/p/early-stage-ranking-in-recommender">entire recommender and not just retrieval</a>. Now they can retire their multi-stage recommender system and just use this LLM-like model end to end to directly output a list of recommendation items.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bfAR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58024f06-6d59-4c71-aa65-185628115b87_2076x692.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bfAR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58024f06-6d59-4c71-aa65-185628115b87_2076x692.png 424w, https://substackcdn.com/image/fetch/$s_!bfAR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58024f06-6d59-4c71-aa65-185628115b87_2076x692.png 848w, https://substackcdn.com/image/fetch/$s_!bfAR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58024f06-6d59-4c71-aa65-185628115b87_2076x692.png 1272w, https://substackcdn.com/image/fetch/$s_!bfAR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58024f06-6d59-4c71-aa65-185628115b87_2076x692.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bfAR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58024f06-6d59-4c71-aa65-185628115b87_2076x692.png" width="1456" height="485" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58024f06-6d59-4c71-aa65-185628115b87_2076x692.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:485,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:108034,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/171806043?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58024f06-6d59-4c71-aa65-185628115b87_2076x692.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bfAR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58024f06-6d59-4c71-aa65-185628115b87_2076x692.png 424w, https://substackcdn.com/image/fetch/$s_!bfAR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58024f06-6d59-4c71-aa65-185628115b87_2076x692.png 848w, https://substackcdn.com/image/fetch/$s_!bfAR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58024f06-6d59-4c71-aa65-185628115b87_2076x692.png 1272w, https://substackcdn.com/image/fetch/$s_!bfAR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58024f06-6d59-4c71-aa65-185628115b87_2076x692.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 1: From-To states of OneRec</figcaption></figure></div><p>Most bigtech recsys teams are making generative recommendations their big bet. Reasons:</p><ol><li><p>OneRec changes recsys from the traditional multi-stage process to a single stage process which makes organizational investment much more streamlined. You don&#8217;t need separate <a href="https://recsysml.substack.com/p/two-tower-models-for-retrieval-of">retrieval</a>, <a href="https://recsysml.substack.com/p/early-stage-ranking-in-recommender">early stage ranking</a> and final stage ranking and <a href="https://recsysml.substack.com/p/declarative-value-model-tuning">value modeling</a> and list generation (aka post ranking) teams now if OneRec can do it all.</p></li><li><p>This makes the recommendation system amenable to be &#8220;driven&#8221; by product. Imagine you don&#8217;t want viral clickbaity videos. It would be a costly multi-team effort to do so earlier. But with OneRec specifying it in the reward works. Imagine you want the recommender system to drive more Daily Active Users instead of sessions (or app opens), you can do that by shaping the reward.</p></li><li><p>The single most effective ML strategy has been metaphorically to get on trains others have built for you and to ride it close to your destination. The LLM world is building a train, with optimized kernels and infrastructure. If recsys can ride it then it can unlock a step change, as demonstrated by OneRec&#8217;s results.</p></li></ol><h2>High level summary of OneRec</h2><p>Let&#8217;s start with an overall schematic not including reward modeling since that is not changing for Friend-Recs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3wcw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41396c-0019-43dd-ae5a-e8ca245487f6_1834x1090.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3wcw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41396c-0019-43dd-ae5a-e8ca245487f6_1834x1090.png 424w, https://substackcdn.com/image/fetch/$s_!3wcw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41396c-0019-43dd-ae5a-e8ca245487f6_1834x1090.png 848w, https://substackcdn.com/image/fetch/$s_!3wcw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41396c-0019-43dd-ae5a-e8ca245487f6_1834x1090.png 1272w, https://substackcdn.com/image/fetch/$s_!3wcw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41396c-0019-43dd-ae5a-e8ca245487f6_1834x1090.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3wcw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41396c-0019-43dd-ae5a-e8ca245487f6_1834x1090.png" width="1456" height="865" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d41396c-0019-43dd-ae5a-e8ca245487f6_1834x1090.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:865,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:165356,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/171806043?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41396c-0019-43dd-ae5a-e8ca245487f6_1834x1090.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3wcw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41396c-0019-43dd-ae5a-e8ca245487f6_1834x1090.png 424w, https://substackcdn.com/image/fetch/$s_!3wcw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41396c-0019-43dd-ae5a-e8ca245487f6_1834x1090.png 848w, https://substackcdn.com/image/fetch/$s_!3wcw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41396c-0019-43dd-ae5a-e8ca245487f6_1834x1090.png 1272w, https://substackcdn.com/image/fetch/$s_!3wcw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d41396c-0019-43dd-ae5a-e8ca245487f6_1834x1090.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 2: Process all we know about the actor into a summary. Then the decoder attends on this summary and starts generating tokens. All we need now is a way to tokenize the recommendable items (videos / friends / shopping items whatever) into tokens.</figcaption></figure></div><p>Parts of OneRec</p><ol><li><p>Tokenization of recommendable items to make a &#8220;vocabulary&#8221; of a few thousand items for LLM-like generation to reliably train.</p><ol><li><p>Semantic Embeddings: This a sort of high dimensional post code for each recommendable item which captures similarity in this domain.</p></li><li><p>Multi-stage &#8220;coarse-to-fine&#8221; clustering to create tokens of the item. This is similar to an e-commerce product catalog or the Yahoo.com homepage in 2000s!</p></li><li><p><em>Note that if your recsys has less than 100K items, you can skip this step and just use ids as tokens.</em></p></li></ol></li><li><p>How to summarize user history (a.k.a. &#8220;Encoder&#8221;)</p><ol><li><p>There is no use of tokens in OneRec&#8217;s encoder<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>. In some ways it is actually simpler than <a href="https://arxiv.org/abs/2402.17152">HSTU</a> by Zhai et al, a state of the art user history encoder from Meta.</p></li><li><p>What I appreciate is how they have designed it using the standard LLM block of Self-Attention &#8212;&gt; FeedForward block, enabling common Triton kernels and infra optimization from LLMs to be used.</p></li></ol></li><li><p>How to generate (a.k.a. &#8220;Decoder&#8221;)</p><ol><li><p>This uses the summary of step 2 using Cross-Attention and a decoder block to generate the tokens of the recommended item one token at a time.</p></li></ol></li></ol><p>Please note that this is not enough to produce high quality recommendations. The below quote from OneRec</p><blockquote><p>" The pre-trained model only fits the distribution of the exposed item space through next token prediction, and the exposed items are obtained from the past traditional recommendation system. "</p></blockquote><p>indicates that reward modeling and preference alignment are critical. We are not going further into them in this post because they stay largely the same in Friend-Recs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-RCD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99a660f-9311-4e82-8986-44c44dcbe2a9_1300x1114.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-RCD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99a660f-9311-4e82-8986-44c44dcbe2a9_1300x1114.png 424w, https://substackcdn.com/image/fetch/$s_!-RCD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99a660f-9311-4e82-8986-44c44dcbe2a9_1300x1114.png 848w, https://substackcdn.com/image/fetch/$s_!-RCD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99a660f-9311-4e82-8986-44c44dcbe2a9_1300x1114.png 1272w, https://substackcdn.com/image/fetch/$s_!-RCD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99a660f-9311-4e82-8986-44c44dcbe2a9_1300x1114.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-RCD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99a660f-9311-4e82-8986-44c44dcbe2a9_1300x1114.png" width="1300" height="1114" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f99a660f-9311-4e82-8986-44c44dcbe2a9_1300x1114.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1114,&quot;width&quot;:1300,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:172849,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/171806043?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99a660f-9311-4e82-8986-44c44dcbe2a9_1300x1114.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-RCD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99a660f-9311-4e82-8986-44c44dcbe2a9_1300x1114.png 424w, https://substackcdn.com/image/fetch/$s_!-RCD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99a660f-9311-4e82-8986-44c44dcbe2a9_1300x1114.png 848w, https://substackcdn.com/image/fetch/$s_!-RCD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99a660f-9311-4e82-8986-44c44dcbe2a9_1300x1114.png 1272w, https://substackcdn.com/image/fetch/$s_!-RCD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99a660f-9311-4e82-8986-44c44dcbe2a9_1300x1114.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 3: High level components of Generative recommendation</figcaption></figure></div><p></p><h2>How Generative Friend-Recs differs from Generative Video-Recs.</h2><h3>The process of creating Semantic Embeddings is different because the &#8220;item&#8221; is also a &#8220;user&#8221;.</h3><p><strong>Closeness is 3-way and not 2-way</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cmeK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fcb86b-60c7-48a6-92b1-6968543ac64b_694x602.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cmeK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fcb86b-60c7-48a6-92b1-6968543ac64b_694x602.png 424w, https://substackcdn.com/image/fetch/$s_!cmeK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fcb86b-60c7-48a6-92b1-6968543ac64b_694x602.png 848w, https://substackcdn.com/image/fetch/$s_!cmeK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fcb86b-60c7-48a6-92b1-6968543ac64b_694x602.png 1272w, https://substackcdn.com/image/fetch/$s_!cmeK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fcb86b-60c7-48a6-92b1-6968543ac64b_694x602.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cmeK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fcb86b-60c7-48a6-92b1-6968543ac64b_694x602.png" width="356" height="308.80691642651294" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85fcb86b-60c7-48a6-92b1-6968543ac64b_694x602.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:602,&quot;width&quot;:694,&quot;resizeWidth&quot;:356,&quot;bytes&quot;:53431,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/171806043?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fcb86b-60c7-48a6-92b1-6968543ac64b_694x602.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cmeK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fcb86b-60c7-48a6-92b1-6968543ac64b_694x602.png 424w, https://substackcdn.com/image/fetch/$s_!cmeK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fcb86b-60c7-48a6-92b1-6968543ac64b_694x602.png 848w, https://substackcdn.com/image/fetch/$s_!cmeK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fcb86b-60c7-48a6-92b1-6968543ac64b_694x602.png 1272w, https://substackcdn.com/image/fetch/$s_!cmeK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fcb86b-60c7-48a6-92b1-6968543ac64b_694x602.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 4: When actor positively interacts with a new item, we can infer similarity of this new item to a previous item (recently) interacted by the same user.</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!erLc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22eec902-876e-45cf-a6e2-f2876e3d9c58_704x618.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!erLc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22eec902-876e-45cf-a6e2-f2876e3d9c58_704x618.png 424w, https://substackcdn.com/image/fetch/$s_!erLc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22eec902-876e-45cf-a6e2-f2876e3d9c58_704x618.png 848w, https://substackcdn.com/image/fetch/$s_!erLc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22eec902-876e-45cf-a6e2-f2876e3d9c58_704x618.png 1272w, https://substackcdn.com/image/fetch/$s_!erLc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22eec902-876e-45cf-a6e2-f2876e3d9c58_704x618.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!erLc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22eec902-876e-45cf-a6e2-f2876e3d9c58_704x618.png" width="368" height="323.04545454545456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/22eec902-876e-45cf-a6e2-f2876e3d9c58_704x618.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:618,&quot;width&quot;:704,&quot;resizeWidth&quot;:368,&quot;bytes&quot;:61180,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/171806043?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22eec902-876e-45cf-a6e2-f2876e3d9c58_704x618.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!erLc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22eec902-876e-45cf-a6e2-f2876e3d9c58_704x618.png 424w, https://substackcdn.com/image/fetch/$s_!erLc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22eec902-876e-45cf-a6e2-f2876e3d9c58_704x618.png 848w, https://substackcdn.com/image/fetch/$s_!erLc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22eec902-876e-45cf-a6e2-f2876e3d9c58_704x618.png 1272w, https://substackcdn.com/image/fetch/$s_!erLc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22eec902-876e-45cf-a6e2-f2876e3d9c58_704x618.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 5: When the actor makes a new &#8220;friend&#8221;, we can add similarity loss for not just previous friend of actor to new fried but also actor to new friend. Actor, and friends are all the same types of entities.</figcaption></figure></div><p></p><p><strong>Grounding in demographic features / entities relevant to use case.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YQfV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf241683-4a42-4586-8889-be4f42a6a1b3_1252x750.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YQfV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf241683-4a42-4586-8889-be4f42a6a1b3_1252x750.png 424w, https://substackcdn.com/image/fetch/$s_!YQfV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf241683-4a42-4586-8889-be4f42a6a1b3_1252x750.png 848w, https://substackcdn.com/image/fetch/$s_!YQfV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf241683-4a42-4586-8889-be4f42a6a1b3_1252x750.png 1272w, https://substackcdn.com/image/fetch/$s_!YQfV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf241683-4a42-4586-8889-be4f42a6a1b3_1252x750.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YQfV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf241683-4a42-4586-8889-be4f42a6a1b3_1252x750.png" width="1252" height="750" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df241683-4a42-4586-8889-be4f42a6a1b3_1252x750.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:750,&quot;width&quot;:1252,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:112262,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/171806043?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf241683-4a42-4586-8889-be4f42a6a1b3_1252x750.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YQfV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf241683-4a42-4586-8889-be4f42a6a1b3_1252x750.png 424w, https://substackcdn.com/image/fetch/$s_!YQfV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf241683-4a42-4586-8889-be4f42a6a1b3_1252x750.png 848w, https://substackcdn.com/image/fetch/$s_!YQfV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf241683-4a42-4586-8889-be4f42a6a1b3_1252x750.png 1272w, https://substackcdn.com/image/fetch/$s_!YQfV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf241683-4a42-4586-8889-be4f42a6a1b3_1252x750.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 6: Before applying collaborative loss, OneRec starts with a representation grounded in content understanding (Blue part of the image). Green shows that the features and auxiliary losses in social recommendations are likely to be different, requiring the training of a Social Foundation Model for embeddings before applying the 3-way closeness loss of Fig 5.</figcaption></figure></div><p></p><h3>Training losses - Basically more losses per training example since we have 1000X fewer examples</h3><p><strong>Problems:</strong></p><ol><li><p>Unlike video recs where platforms have 10+ billion video watches to train from every day, friend recommendations are fewer. </p></li><li><p>We have two embeddings to learn: item id embedding and semantic id (STU) embedding. We need more signal.</p></li></ol><p><strong>Solution for problem #2</strong>, we add two other losses. Fig 7 &#8212;&gt; Fig 8.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q3MQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb228ed39-9277-4e21-8290-a276fa2a8a81_1084x1060.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q3MQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb228ed39-9277-4e21-8290-a276fa2a8a81_1084x1060.png 424w, https://substackcdn.com/image/fetch/$s_!q3MQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb228ed39-9277-4e21-8290-a276fa2a8a81_1084x1060.png 848w, https://substackcdn.com/image/fetch/$s_!q3MQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb228ed39-9277-4e21-8290-a276fa2a8a81_1084x1060.png 1272w, https://substackcdn.com/image/fetch/$s_!q3MQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb228ed39-9277-4e21-8290-a276fa2a8a81_1084x1060.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q3MQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb228ed39-9277-4e21-8290-a276fa2a8a81_1084x1060.png" width="636" height="621.9188191881918" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b228ed39-9277-4e21-8290-a276fa2a8a81_1084x1060.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1060,&quot;width&quot;:1084,&quot;resizeWidth&quot;:636,&quot;bytes&quot;:79406,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/171806043?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb228ed39-9277-4e21-8290-a276fa2a8a81_1084x1060.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!q3MQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb228ed39-9277-4e21-8290-a276fa2a8a81_1084x1060.png 424w, https://substackcdn.com/image/fetch/$s_!q3MQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb228ed39-9277-4e21-8290-a276fa2a8a81_1084x1060.png 848w, https://substackcdn.com/image/fetch/$s_!q3MQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb228ed39-9277-4e21-8290-a276fa2a8a81_1084x1060.png 1272w, https://substackcdn.com/image/fetch/$s_!q3MQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb228ed39-9277-4e21-8290-a276fa2a8a81_1084x1060.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 7: Like LLMs the main loss in OneRec is the classification loss of the ground truth tokens.</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xVjG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb78164a6-616c-4a78-87fe-d0ddd2622816_1456x1184.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xVjG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb78164a6-616c-4a78-87fe-d0ddd2622816_1456x1184.png 424w, https://substackcdn.com/image/fetch/$s_!xVjG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb78164a6-616c-4a78-87fe-d0ddd2622816_1456x1184.png 848w, https://substackcdn.com/image/fetch/$s_!xVjG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb78164a6-616c-4a78-87fe-d0ddd2622816_1456x1184.png 1272w, https://substackcdn.com/image/fetch/$s_!xVjG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb78164a6-616c-4a78-87fe-d0ddd2622816_1456x1184.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xVjG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb78164a6-616c-4a78-87fe-d0ddd2622816_1456x1184.png" width="1456" height="1184" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b78164a6-616c-4a78-87fe-d0ddd2622816_1456x1184.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1184,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:156848,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/171806043?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb78164a6-616c-4a78-87fe-d0ddd2622816_1456x1184.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xVjG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb78164a6-616c-4a78-87fe-d0ddd2622816_1456x1184.png 424w, https://substackcdn.com/image/fetch/$s_!xVjG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb78164a6-616c-4a78-87fe-d0ddd2622816_1456x1184.png 848w, https://substackcdn.com/image/fetch/$s_!xVjG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb78164a6-616c-4a78-87fe-d0ddd2622816_1456x1184.png 1272w, https://substackcdn.com/image/fetch/$s_!xVjG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb78164a6-616c-4a78-87fe-d0ddd2622816_1456x1184.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 8: We can also augment with two other losses, which albeit easier than token generation loss can help in improving representation learning.</figcaption></figure></div><p><strong>Summary:</strong></p><ol><li><p>True token generation loss is the classification loss for the target.</p></li><li><p>In-batch softmax loss from the codebook embeddings of the target can help separate true positives from weak negatives.</p></li><li><p>In-batch softmax loss from the id embeddings of the target can help in providing addition signal for id-representation learning. These embeddings are use in the encoder.</p></li></ol><p><strong>Solution for problem #1</strong>, Pretraining from user history splices and compute potentially all three losses for this spliced target. Schematic in Fig 9 and losses below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DOcF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ce121c7-973c-40b3-9299-d32785e64bb2_1188x596.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DOcF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ce121c7-973c-40b3-9299-d32785e64bb2_1188x596.png 424w, https://substackcdn.com/image/fetch/$s_!DOcF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ce121c7-973c-40b3-9299-d32785e64bb2_1188x596.png 848w, https://substackcdn.com/image/fetch/$s_!DOcF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ce121c7-973c-40b3-9299-d32785e64bb2_1188x596.png 1272w, https://substackcdn.com/image/fetch/$s_!DOcF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ce121c7-973c-40b3-9299-d32785e64bb2_1188x596.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DOcF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ce121c7-973c-40b3-9299-d32785e64bb2_1188x596.png" width="1188" height="596" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0ce121c7-973c-40b3-9299-d32785e64bb2_1188x596.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:596,&quot;width&quot;:1188,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:100293,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/171806043?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ce121c7-973c-40b3-9299-d32785e64bb2_1188x596.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DOcF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ce121c7-973c-40b3-9299-d32785e64bb2_1188x596.png 424w, https://substackcdn.com/image/fetch/$s_!DOcF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ce121c7-973c-40b3-9299-d32785e64bb2_1188x596.png 848w, https://substackcdn.com/image/fetch/$s_!DOcF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ce121c7-973c-40b3-9299-d32785e64bb2_1188x596.png 1272w, https://substackcdn.com/image/fetch/$s_!DOcF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ce121c7-973c-40b3-9299-d32785e64bb2_1188x596.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 9 shows additional losses we can extract from user history to learn from each data point in the context of the social graph. While the model is training from a new social connection it has the incentive to stay grounded in previously learned social connections.</figcaption></figure></div><p></p><h4>1. Temporal Autoregressive Loss L_{AR} - (true token generation loss for spliced history)</h4><ul><li><p><strong>What it does</strong>: Predicts the next <strong>STU tokens</strong> (social tokenized user IDs) of the <strong>target user</strong> at time t+1, given the actor&#8217;s history up to t.</p></li><li><p><strong>How</strong>:</p><ul><li><p>Encoder produces h_i (hidden state for actor&#8217;s history prefix up to step i).</p></li><li><p>A classification head: <code>Linear(d_model, Codebook_size C)</code> outputs logits for each of the M STU tokens of the target user.</p></li><li><p>Compute cross-entropy between predicted logits and the actual target tokens.</p></li></ul></li></ul><p>&#128073; <strong>Trains</strong>:</p><ul><li><p>Encoder parameters.</p></li><li><p>The classification heads.</p></li></ul><div><hr></div><h4>2. Target&#8217;s codebook embeddings : L_{code}</h4><ul><li><p><strong>What it does</strong>: Encourages the <strong>semantic codebook embeddings Z</strong> to represent users well in a contrastive sense.</p></li><li><p><strong>How</strong>:</p><ul><li><p>For the true target user at step i+1, we take its STU token embeddings z_{t_{i+1}} (e.g., by summing/averaging its M codebook vectors).</p></li><li><p>Compare with h_i using an InfoNCE / sampled softmax style loss against in-batch negatives.</p></li></ul></li></ul><p>&#128073; <strong>Trains</strong>:</p><ul><li><p>Encoder (so h_i is predictive).</p></li><li><p>Codebook embeddings Z directly.</p></li></ul><div><hr></div><h4>3. Target&#8217;s id embeddings : L_{target}</h4><ul><li><p><strong>What it does</strong>: Makes the encoder&#8217;s hidden state h_i useful for directly predicting the <strong>continuous target embeddings e_{t_{i+1}}</strong> (from the user embedding table).</p></li><li><p><strong>How</strong>:</p><ul><li><p>Contrastive similarity between h_i and the &#8220;target&#8221; embedding e_{t_{i+1}}, with in-batch negatives.</p></li></ul></li></ul><p>&#128073; <strong>Trains</strong>:</p><ul><li><p>Encoder (so its hidden states align with ground-truth future interactions).</p></li><li><p>Continuous embedding table {e_j} for all users.</p></li></ul><div><hr></div><p><strong>Summary of Solution to Problem #1:</strong></p><ul><li><p><strong>L_{AR}:</strong> trains M classification heads + encoder (discrete token prediction).</p></li><li><p><strong>L_{code}:</strong> trains encoder + codebook embeddings Z.</p></li><li><p><strong>L_{target}:</strong> trains encoder + continuous target embeddings {e_i}.</p></li></ul><p></p><p><strong>Conclusion</strong>: It&#8217;s a no brainer to invest in generative recs given the results from OneRec and others. This post shows one approach to building generative recommendations in a social recommendation use case. Hope it helps you in your use case.</p><p></p><p><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I did not mention &#8220;Reward Modeling&#8221; since it is different between any two recsys, even two video recommenders, since it is a representation of each product&#8217;s market fit. So the difference is not because of this being a friend recommender system.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Why? Why can&#8217;t we use a billion sized vocabulary? Short answer is, even if we could find enough GPUs for it, training will overfit and produce poor recommendations.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>Disclaimer: As elaborated in <a href="https://arxiv.org/pdf/2506.13695">Table 12 of OneRec</a>, while generative OneRec is an improvement over multi-stage recsys, OneRec with Reward model which means using OneRec as a candidate generator is currently much better. It is still open research to make the generative model competent enough to not need a Reward model layer after it.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>This is a simplification. As shown in section 4.2.2 of <a href="https://arxiv.org/pdf/2506.13695">OneRec</a>, they have also experimented with representing the history in terms of semantic ids and not item ids. They are actually seeing improved results but this will be packaged in a separate publication. We choose to call User Encoder unchanged since evidently in OneRec using semantic ids in User Encoder is not essential and this helps us hone in on the core innovation.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Using RL to maximize ad revenue without retention tradeoffs]]></title><description><![CDATA[Using Contextual Bandits and Predictive Modeling to Optimize Personalized Ad-Placement Policies]]></description><link>https://recsysml.substack.com/p/reinforcement-learning-for-balancing</link><guid isPermaLink="false">https://recsysml.substack.com/p/reinforcement-learning-for-balancing</guid><dc:creator><![CDATA[Gaurav Chakravorty]]></dc:creator><pubDate>Sun, 24 Aug 2025 14:04:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!I6Pm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9514f3ef-f317-451d-9d62-b320451fef32_1774x1192.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;38c8ca7d-a39a-476f-8981-bdb54beea9e7&quot;,&quot;duration&quot;:null}"></div><p><a href="https://www.investors.com/news/advertising-industry-to-hit-1-trillion-dominated-by-the-new-big-5/">Global Ad industry hits $1 trillion in revenue</a>. In this blog, we will learn how to maximize ad revenue with minimal impact to retention and engagement.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!I6Pm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9514f3ef-f317-451d-9d62-b320451fef32_1774x1192.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!I6Pm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9514f3ef-f317-451d-9d62-b320451fef32_1774x1192.png 424w, https://substackcdn.com/image/fetch/$s_!I6Pm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9514f3ef-f317-451d-9d62-b320451fef32_1774x1192.png 848w, https://substackcdn.com/image/fetch/$s_!I6Pm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9514f3ef-f317-451d-9d62-b320451fef32_1774x1192.png 1272w, https://substackcdn.com/image/fetch/$s_!I6Pm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9514f3ef-f317-451d-9d62-b320451fef32_1774x1192.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!I6Pm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9514f3ef-f317-451d-9d62-b320451fef32_1774x1192.png" width="1456" height="978" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9514f3ef-f317-451d-9d62-b320451fef32_1774x1192.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:978,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:115716,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!I6Pm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9514f3ef-f317-451d-9d62-b320451fef32_1774x1192.png 424w, https://substackcdn.com/image/fetch/$s_!I6Pm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9514f3ef-f317-451d-9d62-b320451fef32_1774x1192.png 848w, https://substackcdn.com/image/fetch/$s_!I6Pm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9514f3ef-f317-451d-9d62-b320451fef32_1774x1192.png 1272w, https://substackcdn.com/image/fetch/$s_!I6Pm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9514f3ef-f317-451d-9d62-b320451fef32_1774x1192.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In a content platform like YouTube, deciding whether to show an ad as the very first piece of content is a subtle but critical problem. The trade-off is intuitive:</p><ul><li><p><strong>If you show an ad first:</strong> You gain immediate ad revenue, but you risk users dropping off or engaging less with the subsequent content.</p></li><li><p><strong>If you don&#8217;t show an ad first:</strong> You keep the user more engaged (likely increasing future content consumption and potentially downstream revenue), but you sacrifice the immediate revenue opportunity of that first impression.</p></li></ul><p>In this post, we&#8217;ll explore how to frame this decision using a data-driven approach. We&#8217;ll start from a simplified viewpoint&#8212;just deciding <strong>whether</strong> to show an ad first&#8212;and build up to a strategy that integrates both engagement-based revenue modeling and personalized ad revenue predictions. We&#8217;ll then discuss how a contextual bandit approach can be applied to learn these policies from historical data.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work. My only motivation is to help ppl learn what could be of use to them. Feedback welcome!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2><strong>#1: Blending Approach &#8212;&gt; Engagement Loss improvement</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zazr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce46df23-0918-4e20-bd86-abfa3c5a7777_1428x790.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zazr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce46df23-0918-4e20-bd86-abfa3c5a7777_1428x790.png 424w, https://substackcdn.com/image/fetch/$s_!zazr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce46df23-0918-4e20-bd86-abfa3c5a7777_1428x790.png 848w, https://substackcdn.com/image/fetch/$s_!zazr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce46df23-0918-4e20-bd86-abfa3c5a7777_1428x790.png 1272w, https://substackcdn.com/image/fetch/$s_!zazr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce46df23-0918-4e20-bd86-abfa3c5a7777_1428x790.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zazr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce46df23-0918-4e20-bd86-abfa3c5a7777_1428x790.png" width="1428" height="790" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ce46df23-0918-4e20-bd86-abfa3c5a7777_1428x790.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:790,&quot;width&quot;:1428,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:100412,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zazr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce46df23-0918-4e20-bd86-abfa3c5a7777_1428x790.png 424w, https://substackcdn.com/image/fetch/$s_!zazr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce46df23-0918-4e20-bd86-abfa3c5a7777_1428x790.png 848w, https://substackcdn.com/image/fetch/$s_!zazr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce46df23-0918-4e20-bd86-abfa3c5a7777_1428x790.png 1272w, https://substackcdn.com/image/fetch/$s_!zazr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce46df23-0918-4e20-bd86-abfa3c5a7777_1428x790.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 1 : Blending just treats ads and content as options and chooses the one with the highest value</figcaption></figure></div><p>A first, naive approach might consider the expected revenue from showing an ad in the first position versus the expected long-term engagement if we skip that ad. We could imagine we have two signals:</p><ol><li><p><strong>Expected Ad Revenue (E[rev|u, ad]):</strong> A personalized model that, given the user and session context, estimates the immediate revenue we would earn by showing an ad. This might be well-approximated by an advanced personalized ads model trained on historical ad-serving data.</p></li><li><p><strong>Expected Engagement:</strong> A measure or score indicating how much content consumption (views, watch time, etc.) we expect if we place content first. We assume we already have a way to translate engagement into revenue via a function <code>f_eng_to_rev(engagement)</code>, which converts user engagement into an expected revenue value (for example, predicting future watch-time-based monetization).</p></li></ol><p>In a simple scenario, if we had some expected engagement value for showing content and some expected engagement value for not showing content, we might attempt to &#8220;blend&#8221; them with the direct ad revenue to decide. However, it&#8217;s not straightforward: the presence of an initial ad does not necessarily mean all engagement is lost&#8212;it just might reduce it. What we really need is the <strong>expected reduction in engagement</strong> caused by showing the ad.</p><blockquote><p>For example, if for some deeply satisfied users their engagement will be mostly unaffected by showing the ad, it should be fine to show the ad.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5rgY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21950079-0c1b-46f2-91fa-72d5fceaedc2_994x816.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5rgY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21950079-0c1b-46f2-91fa-72d5fceaedc2_994x816.png 424w, https://substackcdn.com/image/fetch/$s_!5rgY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21950079-0c1b-46f2-91fa-72d5fceaedc2_994x816.png 848w, https://substackcdn.com/image/fetch/$s_!5rgY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21950079-0c1b-46f2-91fa-72d5fceaedc2_994x816.png 1272w, https://substackcdn.com/image/fetch/$s_!5rgY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21950079-0c1b-46f2-91fa-72d5fceaedc2_994x816.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5rgY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21950079-0c1b-46f2-91fa-72d5fceaedc2_994x816.png" width="492" height="403.8953722334004" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21950079-0c1b-46f2-91fa-72d5fceaedc2_994x816.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:816,&quot;width&quot;:994,&quot;resizeWidth&quot;:492,&quot;bytes&quot;:81160,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5rgY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21950079-0c1b-46f2-91fa-72d5fceaedc2_994x816.png 424w, https://substackcdn.com/image/fetch/$s_!5rgY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21950079-0c1b-46f2-91fa-72d5fceaedc2_994x816.png 848w, https://substackcdn.com/image/fetch/$s_!5rgY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21950079-0c1b-46f2-91fa-72d5fceaedc2_994x816.png 1272w, https://substackcdn.com/image/fetch/$s_!5rgY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21950079-0c1b-46f2-91fa-72d5fceaedc2_994x816.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 2: We think it is better to look at the expected engagement loss of the ad and compare it to the revenue gained.</figcaption></figure></div><p></p><p>Let&#8217;s say we have estimated the expected engagement loss if we show the ad first. Our decision rule could look like this:</p><blockquote><p>Show Ad if and only if: (engagement_revenue_conversion * expected engagement loss) &lt; (expected revenue of showing an ad)</p></blockquote><p>or mathematically</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Show\\ Ad : f_{eng\\_to\\_rev}(E[eng\\_loss|u, ad]) < E[rev|u, ad]&quot;,&quot;id&quot;:&quot;WTBTOYRFAV&quot;}" data-component-name="LatexBlockToDOM"></div><p>This equation says: &#8220;If the immediate ad revenue gain from showing the ad exceeds the revenue-equivalent cost of reduced engagement, then show the ad.&#8221; It relies on three components that can be learned independently:</p><ul><li><p><strong>E[rev|u, ad]:</strong> The expected incremental ad revenue from showing the ad first for a given user/session.</p></li><li><p><strong>f_eng_to_rev:</strong> The function that converts changes in engagement into revenue terms.</p></li><li><p><strong>Expected engagement loss:</strong> Learned from data, comparing engagement outcomes between sessions where an ad was shown first and sessions where it was not.</p></li></ul><p></p><h2><strong>#2 Bridging Theory and Practice: Two Approaches in Our GitHub Repository</strong></h2><p>In our GitHub repository, <a href="https://github.com/gauravchak/ad-placement-rl">https://github.com/gauravchak/ad-placement-rl</a>, we demonstrate two different approaches to optimizing session revenue directly: a <strong>contextual bandit</strong> method and a <strong>reinforcement learning (policy gradient)</strong> method. In both cases, the setup is the same: at each session (context), we must make a binary decision&#8212;whether or not to show an ad&#8212;and we then observe a numerical reward. This reward is a combined metric that integrates both immediate session-level revenue and the engagement-based revenue equivalent.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;net\\_session\\_reward = session\\_revenue + f_{eng\\_to\\_rev}(session\\_engagement)&quot;,&quot;id&quot;:&quot;STLVSMRRQK&quot;}" data-component-name="LatexBlockToDOM"></div><p></p><p>Notably, these approaches can benefit from additional features in the user context. For example, even before we integrate the personalized ad estimator&#8217;s output (<code>E[rev|u, ad]</code>) into the reward function, we can include it as part of <code>user_features</code>. By doing so, both the contextual bandit and the policy gradient models can leverage this personalized signal at inference time, potentially improving decision-making and anticipating the value of showing an ad first.</p><p><strong>Building steerability in net reward</strong></p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;net\\_session\\_reward = session\\_revenue + \\textbf{dial} * f_{eng\\_to\\_rev}(session\\_engagement)&quot;,&quot;id&quot;:&quot;VIKZWUCVMX&quot;}" data-component-name="LatexBlockToDOM"></div><p>There might be times in the year, like say thanksgiving, when your business wants to prioritize revenue. It might help to build a dial for that.</p><h3><strong>2a: Optimizing session net reward Contextual Bandits</strong></h3><p>Code Snippet on whether to show ad:</p><pre><code>def should_show_ad(reward_model_ad, reward_model_no_ad, user_features):
    """
    Given trained reward models and user_features, return a decision:
    show_ad = 1 if predicted_reward_if_ad &gt; predicted_reward_if_no_ad else 0
    """
    pred_net_reward_if_ad, pred_net_reward_if_no_ad = expected_reward(
        reward_model_ad=reward_model_ad,
        reward_model_no_ad=reward_model_no_ad,
        user_features=user_features)

    # True if predicted net reward of ad action is higher
    return (pred_net_reward_if_ad &gt; pred_net_reward_if_no_ad)
</code></pre><p>This contextual bandit approach trains separate models to predict the expected reward under each action. By comparing these predictions, the policy picks the action that yields the higher expected combined value at inference time. It&#8217;s conceptually simple and can be effectively trained on logged data, making it a practical first step toward data-driven ad placement decisions.</p><p></p><h3>2b <strong>REINFORCE and Policy Gradients</strong></h3><p>While contextual bandits are a simple and effective way to learn a policy directly from logged data, they still treat each decision as a one-step problem. Another approach, inspired by reinforcement learning (RL), is to use <strong>policy gradient methods</strong> such as REINFORCE. </p><p>A great tutorial of using Deep RL for a binary decision is Andrej&#8217;s talk below on Pong</p><div id="youtube2-tqrcjHuNdmQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;tqrcjHuNdmQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/tqrcjHuNdmQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><strong>Conceptual Overview</strong><br>In the REINFORCE framework, rather than training separate models to predict the reward for each action and then comparing them, you directly parameterize a probabilistic policy that decides how likely it is to show an ad or not. For each training example, you know which action was taken and what the resulting reward was. The goal is to adjust the policy parameters to increase the probability of actions that led to higher rewards and decrease the probability of actions that led to lower rewards.</p><p>By doing this, REINFORCE naturally fits the problem of deciding whether to show an ad first: you have a distribution over actions, and you tune it to favor whichever action yields better long-term value. If showing ads first consistently produces higher combined revenue (ad revenue plus engagement-based returns), the policy naturally shifts towards showing the ad. If it reduces future engagement too much, the policy learns to refrain from showing the ad.</p><p><strong>Why REINFORCE?</strong></p><ul><li><p><strong>Direct Optimization:</strong> REINFORCE directly optimizes the expected reward of the policy. Instead of separately learning a model for each action and then deriving a policy, you adjust the policy parameters to maximize the observed rewards.</p></li><li><p><strong>Built-in Exploration:</strong><br>By modeling a probability distribution over actions (instead of choosing actions deterministically), the policy naturally explores different decisions. This stochasticity can help the algorithm discover better policies that might be missed by purely greedy strategies. By outputting probabilities over actions, the policy gradient approach allows you to sample actions according to those probabilities, rather than always taking the single top-scoring action. This means the model inherently tries different actions over time, which can lead to discovering better strategies than a purely greedy (deterministic) approach would.</p></li><li><p><strong>Generalization to RL Settings:</strong> Although we&#8217;re currently working in a single-step contextual bandit setting, policy gradient methods can easily generalize to multi-step reinforcement learning problems. This opens the door to modeling scenarios where the consequences of showing (or not showing) an ad extend beyond the first position, or even the current session.</p></li></ul><p><strong>Pros and Cons of REINFORCE vs. Contextual Bandits</strong></p><ul><li><p><strong>Pros (REINFORCE):</strong></p><ul><li><p>Directly learns a stochastic policy that can be generalized to more complex sequential or multi-step decision-making scenarios.</p></li><li><p>Conceptually simple: the update rule is just scaling the log probability of the chosen action by the reward.</p></li></ul></li><li><p><strong>Cons (REINFORCE):</strong></p><ul><li><p>High variance: The basic REINFORCE update can be noisy and may require techniques like baselines or variance reduction to stabilize training.</p></li><li><p>On-policy: The method is conceptually on-policy, meaning it learns best when data is collected from its own evolving policy. Using strictly logged offline data (collected by a different policy) can introduce bias unless steps are taken to correct it.</p></li></ul></li><li><p><strong>Pros (Contextual Bandits):</strong></p><ul><li><p>Simpler, more direct learning: Estimate the reward for each action and pick the best one. Straightforward modeling from logged data.</p></li><li><p>Lower variance estimates: Predictive models for each action can produce more stable estimates with offline data.</p></li></ul></li><li><p><strong>Cons (Contextual Bandits):</strong></p><ul><li><p>Limited to single-step decisions: The approach doesn&#8217;t naturally extend to sequential decision-making.</p></li><li><p>Requires a separate model for each action or a common model architecture that outputs multiple action values.</p></li></ul></li></ul><h2>#3 - Integrating the Ad Revenue Estimator <strong>(E[rev|u, ad]) </strong>deeply</h2><p>In Part 2, we&#8217;ve discussed incorporating <code>E[rev|u, ad]</code> as an input feature. This is helpful, but we can push the idea further by explicitly decomposing the reward. Instead of lumping all revenue and engagement value into a single session label, we can separate the immediate revenue from showing an ad (predicted by <code>E[rev|u, ad]</code>) from the delayed, engagement-based revenue. In other words, we estimate:</p><ul><li><p><strong>Immediate Reward (if ad is shown):</strong> <code>E[rev|u, ad]</code>&#8212;our personalized ad revenue estimate.</p></li><li><p><strong>Delayed Reward (if ad is shown):</strong> Session value minus the immediate ad revenue, converted into engagement-equivalent terms.</p></li></ul><p>At inference time, this lets us approximate the decision boundary more explicitly. We compare the sum of the immediate and delayed rewards for showing the ad against the expected session value if we do not show the ad:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;P(show\\_ad) = \\sigma (\n    (E[rev | u, ad] + E(delayed\\_reward | show\\_ad))\n    - E(net\\_session\\_revenue | no\\_show\\_ad)\n)\n&quot;,&quot;id&quot;:&quot;ZRWPTOCCFF&quot;}" data-component-name="LatexBlockToDOM"></div><p>This approach leverages the fact that our immediate revenue estimates are likely of higher accuracy and reduces the complexity of what we must learn as a &#8220;residual&#8221; (delayed) effect. It ties together the personalized ad estimator with a session-level RL framework, enabling more <strong>accurate</strong> and <strong>modular</strong> policy decisions.</p><h1>Summary</h1><ol><li><p>We show how to use RL to learn when to show an ad.</p></li><li><p>The framework extends to other positions, not just first position. The RL policy encapsulates exploration to enable continuously improvement.</p></li><li><p>We do so in a way that maximally uses our high accuracy ad revenue estimator model. Thus we have set up the problem in a modular fashion with multiple teams working in synergy.</p></li></ol><p></p><p><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</p>]]></content:encoded></item><item><title><![CDATA[How to implement Generative Retrieval]]></title><description><![CDATA[GenAI meets recommender systems]]></description><link>https://recsysml.substack.com/p/how-to-implement-generative-retrieval</link><guid isPermaLink="false">https://recsysml.substack.com/p/how-to-implement-generative-retrieval</guid><dc:creator><![CDATA[Gaurav Chakravorty]]></dc:creator><pubDate>Thu, 05 Jun 2025 14:30:25 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/164981049/aa24ca5118f90d625067bf026601221c.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<h3>Improving Recsys with GenAI</h3><p>We're excited about the potential of Large Language Models (LLMs) in recommender systems, given their high accuracy in multiple domains. Building on this potential, we'll explore how to harness LLMs for recommendation tasks.</p><h3>The Challenge of Using LLMs in RecSys</h3><p>One key challenge is tokenizing billions of recommendable items, making it hard to apply LLMs directly. If only we could break it into fewer tokens like LLMs do to long words like "&#8220;happiness&#8221;. Once we have a vocabulary of meaningful tokens that reliably describe interaction probabilities, we can leverage LLM machinery for prediction.</p><h3>Proposed Solution: Generative Retrieval</h3><p>The paper <a href="https://arxiv.org/abs/2305.05065">Generative Retrieval</a> generates "semantic" embeddings using RQVAE (a type of vector quantized variational autoencoder), enabling LLMs to learn meaningful item representations. By creating semantic embeddings where similar items are closer together, one can generate semantic IDs that capture nuanced relationships between items.</p><h3>Showing You How to Implement It</h3><p>To make this approach more accessible, <a href="https://www.linkedin.com/in/sam-komo-5247a494/">Samson Komo</a> has prepared for you:</p><p>A video tutorial walking through paper code and colab: </p><div id="youtube2-OE5iJcFLS7o" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;OE5iJcFLS7o&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/OE5iJcFLS7o?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>GitHub repo: <a href="https://github.com/komosam/Generative-Retrieval">https://github.com/komosam/Generative-Retrieval</a></p><p></p><h3>Street Cred of the Approach</h3><p>Generative retrieval has already shown impressive results, with over 40% share of retrieval in some state-of-the-art video and ad recommender systems. By implementing this method, you can unlock more accurate and diverse recommendations.</p><p></p><h3>Conclusion</h3><p>By implementing generative retrieval, you can tap into the power of LLMs for recommendation tasks. Explore our resources to get started and discover how this approach can enhance your recommender systems.</p><p></p><p><em><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</em></p>]]></content:encoded></item><item><title><![CDATA[Attention Explained: When to use Self, Graph, and Target-Aware Attention]]></title><description><![CDATA[Unlocking the Power of AI: A Beginner's Guide to Attention Architectures]]></description><link>https://recsysml.substack.com/p/attention-explained-when-to-use-self</link><guid isPermaLink="false">https://recsysml.substack.com/p/attention-explained-when-to-use-self</guid><dc:creator><![CDATA[Gaurav Chakravorty]]></dc:creator><pubDate>Sun, 25 May 2025 15:30:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mKpe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0799e93-a753-45f4-894f-ae8673444fd2_1600x957.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>TL;DR:</p><ol><li><p><strong>Self-attention</strong> summarizes information from a list (e.g., recent videos watched or chatbot text) to create a relevant summary.</p></li><li><p><strong>Graph attention</strong> understands relationships within a network (e.g., social circles in a social network).</p></li><li><p><strong>Target-aware attention</strong> evaluates the relevance of items being ranked to a user's history or query.</p></li></ol><p><a href="https://arxiv.org/abs/1706.03762">Attention</a> is a powerful tool in AI, but its applications and types can be confusing. In this article, we'll break down three common attention architectures - self-attention, graph attention, and target-aware attention - and explore their use cases and strengths.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Applied ML | Recommender systems! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>Basic building block of attention</h2><p>The unit shown below in Fig 1, finds a weighted sum of the input sequence (aka &#8220;keys&#8221;) using the query embedding. One way to think about it is to find a summary in the keys that is most relevant to the query.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mKpe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0799e93-a753-45f4-894f-ae8673444fd2_1600x957.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mKpe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0799e93-a753-45f4-894f-ae8673444fd2_1600x957.png 424w, https://substackcdn.com/image/fetch/$s_!mKpe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0799e93-a753-45f4-894f-ae8673444fd2_1600x957.png 848w, https://substackcdn.com/image/fetch/$s_!mKpe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0799e93-a753-45f4-894f-ae8673444fd2_1600x957.png 1272w, https://substackcdn.com/image/fetch/$s_!mKpe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0799e93-a753-45f4-894f-ae8673444fd2_1600x957.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mKpe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0799e93-a753-45f4-894f-ae8673444fd2_1600x957.png" width="1456" height="871" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0799e93-a753-45f4-894f-ae8673444fd2_1600x957.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:871,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mKpe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0799e93-a753-45f4-894f-ae8673444fd2_1600x957.png 424w, https://substackcdn.com/image/fetch/$s_!mKpe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0799e93-a753-45f4-894f-ae8673444fd2_1600x957.png 848w, https://substackcdn.com/image/fetch/$s_!mKpe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0799e93-a753-45f4-894f-ae8673444fd2_1600x957.png 1272w, https://substackcdn.com/image/fetch/$s_!mKpe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0799e93-a753-45f4-894f-ae8673444fd2_1600x957.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 1: Basic building. block of attention</figcaption></figure></div><h2>Self attention</h2><p>In self attention we generate an equal number of embeddings as the input. We do that by taking each of the inputs as a query. So post a layer of attention, each item in the list is replaced by a sort of smoothened version of it. (Fig 2)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Drke!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29331a-57ee-47f5-8847-151672c3212e_1220x436.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Drke!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29331a-57ee-47f5-8847-151672c3212e_1220x436.png 424w, https://substackcdn.com/image/fetch/$s_!Drke!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29331a-57ee-47f5-8847-151672c3212e_1220x436.png 848w, https://substackcdn.com/image/fetch/$s_!Drke!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29331a-57ee-47f5-8847-151672c3212e_1220x436.png 1272w, https://substackcdn.com/image/fetch/$s_!Drke!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29331a-57ee-47f5-8847-151672c3212e_1220x436.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Drke!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29331a-57ee-47f5-8847-151672c3212e_1220x436.png" width="1220" height="436" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a29331a-57ee-47f5-8847-151672c3212e_1220x436.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:436,&quot;width&quot;:1220,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Drke!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29331a-57ee-47f5-8847-151672c3212e_1220x436.png 424w, https://substackcdn.com/image/fetch/$s_!Drke!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29331a-57ee-47f5-8847-151672c3212e_1220x436.png 848w, https://substackcdn.com/image/fetch/$s_!Drke!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29331a-57ee-47f5-8847-151672c3212e_1220x436.png 1272w, https://substackcdn.com/image/fetch/$s_!Drke!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a29331a-57ee-47f5-8847-151672c3212e_1220x436.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 2: Each item is used as a query in self attention. Hence the output is the same length as input.</figcaption></figure></div><p>In certain scenarios like language and content recommender systems, where positions matter to the relevance of different items, positional encoding is also useful to find better attention weights. (Fig 3)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!St_m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc05c90e7-3088-4d23-9a47-62cd1d3ce2be_1208x568.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!St_m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc05c90e7-3088-4d23-9a47-62cd1d3ce2be_1208x568.png 424w, https://substackcdn.com/image/fetch/$s_!St_m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc05c90e7-3088-4d23-9a47-62cd1d3ce2be_1208x568.png 848w, https://substackcdn.com/image/fetch/$s_!St_m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc05c90e7-3088-4d23-9a47-62cd1d3ce2be_1208x568.png 1272w, https://substackcdn.com/image/fetch/$s_!St_m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc05c90e7-3088-4d23-9a47-62cd1d3ce2be_1208x568.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!St_m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc05c90e7-3088-4d23-9a47-62cd1d3ce2be_1208x568.png" width="1208" height="568" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c05c90e7-3088-4d23-9a47-62cd1d3ce2be_1208x568.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:568,&quot;width&quot;:1208,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!St_m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc05c90e7-3088-4d23-9a47-62cd1d3ce2be_1208x568.png 424w, https://substackcdn.com/image/fetch/$s_!St_m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc05c90e7-3088-4d23-9a47-62cd1d3ce2be_1208x568.png 848w, https://substackcdn.com/image/fetch/$s_!St_m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc05c90e7-3088-4d23-9a47-62cd1d3ce2be_1208x568.png 1272w, https://substackcdn.com/image/fetch/$s_!St_m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc05c90e7-3088-4d23-9a47-62cd1d3ce2be_1208x568.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 3 positional encoding is added to item encoding</figcaption></figure></div><p>Code implementation</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h9hl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F225a4c42-a9c0-460d-b973-7a92c06c9da7_1098x1000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h9hl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F225a4c42-a9c0-460d-b973-7a92c06c9da7_1098x1000.png 424w, https://substackcdn.com/image/fetch/$s_!h9hl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F225a4c42-a9c0-460d-b973-7a92c06c9da7_1098x1000.png 848w, https://substackcdn.com/image/fetch/$s_!h9hl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F225a4c42-a9c0-460d-b973-7a92c06c9da7_1098x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!h9hl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F225a4c42-a9c0-460d-b973-7a92c06c9da7_1098x1000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h9hl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F225a4c42-a9c0-460d-b973-7a92c06c9da7_1098x1000.png" width="1098" height="1000" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/225a4c42-a9c0-460d-b973-7a92c06c9da7_1098x1000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1000,&quot;width&quot;:1098,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!h9hl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F225a4c42-a9c0-460d-b973-7a92c06c9da7_1098x1000.png 424w, https://substackcdn.com/image/fetch/$s_!h9hl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F225a4c42-a9c0-460d-b973-7a92c06c9da7_1098x1000.png 848w, https://substackcdn.com/image/fetch/$s_!h9hl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F225a4c42-a9c0-460d-b973-7a92c06c9da7_1098x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!h9hl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F225a4c42-a9c0-460d-b973-7a92c06c9da7_1098x1000.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Graph Attention</h2><p>Graph Attention is similar to Fig 2. As in, in each layer of attention, the embeddings of a node are updated using embeddings of items of the neighbors and their own. However the formulation is slightly different. Graph Attention Transformers <a href="https://arxiv.org/abs/2305.16102">decelerate over-smoothing</a> by giving the node&#8217;s previous value as an input to the next layer neural network.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pn3R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F989e0d63-db80-4b1a-97f5-0dc7f7bf3e77_1600x993.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pn3R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F989e0d63-db80-4b1a-97f5-0dc7f7bf3e77_1600x993.png 424w, https://substackcdn.com/image/fetch/$s_!pn3R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F989e0d63-db80-4b1a-97f5-0dc7f7bf3e77_1600x993.png 848w, https://substackcdn.com/image/fetch/$s_!pn3R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F989e0d63-db80-4b1a-97f5-0dc7f7bf3e77_1600x993.png 1272w, https://substackcdn.com/image/fetch/$s_!pn3R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F989e0d63-db80-4b1a-97f5-0dc7f7bf3e77_1600x993.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pn3R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F989e0d63-db80-4b1a-97f5-0dc7f7bf3e77_1600x993.png" width="1456" height="904" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/989e0d63-db80-4b1a-97f5-0dc7f7bf3e77_1600x993.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:904,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pn3R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F989e0d63-db80-4b1a-97f5-0dc7f7bf3e77_1600x993.png 424w, https://substackcdn.com/image/fetch/$s_!pn3R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F989e0d63-db80-4b1a-97f5-0dc7f7bf3e77_1600x993.png 848w, https://substackcdn.com/image/fetch/$s_!pn3R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F989e0d63-db80-4b1a-97f5-0dc7f7bf3e77_1600x993.png 1272w, https://substackcdn.com/image/fetch/$s_!pn3R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F989e0d63-db80-4b1a-97f5-0dc7f7bf3e77_1600x993.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 4: Graph Attention</figcaption></figure></div><p>Code implementation</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0LqY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3efb3b-44af-4355-8e85-c637f586f265_1152x806.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0LqY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3efb3b-44af-4355-8e85-c637f586f265_1152x806.png 424w, https://substackcdn.com/image/fetch/$s_!0LqY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3efb3b-44af-4355-8e85-c637f586f265_1152x806.png 848w, https://substackcdn.com/image/fetch/$s_!0LqY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3efb3b-44af-4355-8e85-c637f586f265_1152x806.png 1272w, https://substackcdn.com/image/fetch/$s_!0LqY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3efb3b-44af-4355-8e85-c637f586f265_1152x806.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0LqY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3efb3b-44af-4355-8e85-c637f586f265_1152x806.png" width="1152" height="806" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a3efb3b-44af-4355-8e85-c637f586f265_1152x806.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:806,&quot;width&quot;:1152,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:130718,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/164251342?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3efb3b-44af-4355-8e85-c637f586f265_1152x806.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0LqY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3efb3b-44af-4355-8e85-c637f586f265_1152x806.png 424w, https://substackcdn.com/image/fetch/$s_!0LqY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3efb3b-44af-4355-8e85-c637f586f265_1152x806.png 848w, https://substackcdn.com/image/fetch/$s_!0LqY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3efb3b-44af-4355-8e85-c637f586f265_1152x806.png 1272w, https://substackcdn.com/image/fetch/$s_!0LqY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a3efb3b-44af-4355-8e85-c637f586f265_1152x806.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zSdQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9702eff0-8f15-4dea-b7ad-7858629cd4d3_1000x1108.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zSdQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9702eff0-8f15-4dea-b7ad-7858629cd4d3_1000x1108.png 424w, https://substackcdn.com/image/fetch/$s_!zSdQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9702eff0-8f15-4dea-b7ad-7858629cd4d3_1000x1108.png 848w, https://substackcdn.com/image/fetch/$s_!zSdQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9702eff0-8f15-4dea-b7ad-7858629cd4d3_1000x1108.png 1272w, https://substackcdn.com/image/fetch/$s_!zSdQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9702eff0-8f15-4dea-b7ad-7858629cd4d3_1000x1108.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zSdQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9702eff0-8f15-4dea-b7ad-7858629cd4d3_1000x1108.png" width="1000" height="1108" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9702eff0-8f15-4dea-b7ad-7858629cd4d3_1000x1108.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1108,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:418446,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://recsysml.substack.com/i/164251342?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9702eff0-8f15-4dea-b7ad-7858629cd4d3_1000x1108.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zSdQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9702eff0-8f15-4dea-b7ad-7858629cd4d3_1000x1108.png 424w, https://substackcdn.com/image/fetch/$s_!zSdQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9702eff0-8f15-4dea-b7ad-7858629cd4d3_1000x1108.png 848w, https://substackcdn.com/image/fetch/$s_!zSdQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9702eff0-8f15-4dea-b7ad-7858629cd4d3_1000x1108.png 1272w, https://substackcdn.com/image/fetch/$s_!zSdQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9702eff0-8f15-4dea-b7ad-7858629cd4d3_1000x1108.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Target-aware attention</h2><p>In ranking applications, we try to evaluate the probability of successful outcome with a number of options, aka &#8220;targets&#8221;. A successful application of attention in ranking stems from using attention with the target as the query and user&#8217;s history or query text sequence as keys.</p><p>In contrast to self-attention, target-aware attention uses a specific target as the query to compute attention weights over a sequence of items.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hsye!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8eadfd-a362-474c-b3c1-81cebe351d2d_1518x1192.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hsye!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8eadfd-a362-474c-b3c1-81cebe351d2d_1518x1192.png 424w, https://substackcdn.com/image/fetch/$s_!hsye!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8eadfd-a362-474c-b3c1-81cebe351d2d_1518x1192.png 848w, https://substackcdn.com/image/fetch/$s_!hsye!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8eadfd-a362-474c-b3c1-81cebe351d2d_1518x1192.png 1272w, https://substackcdn.com/image/fetch/$s_!hsye!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8eadfd-a362-474c-b3c1-81cebe351d2d_1518x1192.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hsye!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8eadfd-a362-474c-b3c1-81cebe351d2d_1518x1192.png" width="1456" height="1143" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9b8eadfd-a362-474c-b3c1-81cebe351d2d_1518x1192.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1143,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hsye!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8eadfd-a362-474c-b3c1-81cebe351d2d_1518x1192.png 424w, https://substackcdn.com/image/fetch/$s_!hsye!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8eadfd-a362-474c-b3c1-81cebe351d2d_1518x1192.png 848w, https://substackcdn.com/image/fetch/$s_!hsye!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8eadfd-a362-474c-b3c1-81cebe351d2d_1518x1192.png 1272w, https://substackcdn.com/image/fetch/$s_!hsye!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b8eadfd-a362-474c-b3c1-81cebe351d2d_1518x1192.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 5: The item being ranked is the query of the attention module. For instance, while ranking a video in a video recommender, the target video&#8217;s embedding is the query, the user&#8217;s history is the sequence (or &#8220;keys&#8221;)</figcaption></figure></div><p>Code implementation</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SZI2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0de8f4fb-8be0-40dc-a325-53d3ad94c5c1_1140x982.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SZI2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0de8f4fb-8be0-40dc-a325-53d3ad94c5c1_1140x982.png 424w, https://substackcdn.com/image/fetch/$s_!SZI2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0de8f4fb-8be0-40dc-a325-53d3ad94c5c1_1140x982.png 848w, https://substackcdn.com/image/fetch/$s_!SZI2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0de8f4fb-8be0-40dc-a325-53d3ad94c5c1_1140x982.png 1272w, https://substackcdn.com/image/fetch/$s_!SZI2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0de8f4fb-8be0-40dc-a325-53d3ad94c5c1_1140x982.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SZI2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0de8f4fb-8be0-40dc-a325-53d3ad94c5c1_1140x982.png" width="1140" height="982" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0de8f4fb-8be0-40dc-a325-53d3ad94c5c1_1140x982.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:982,&quot;width&quot;:1140,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SZI2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0de8f4fb-8be0-40dc-a325-53d3ad94c5c1_1140x982.png 424w, https://substackcdn.com/image/fetch/$s_!SZI2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0de8f4fb-8be0-40dc-a325-53d3ad94c5c1_1140x982.png 848w, https://substackcdn.com/image/fetch/$s_!SZI2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0de8f4fb-8be0-40dc-a325-53d3ad94c5c1_1140x982.png 1272w, https://substackcdn.com/image/fetch/$s_!SZI2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0de8f4fb-8be0-40dc-a325-53d3ad94c5c1_1140x982.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Conclusion</h2><p>Whether it's processing sequences with self-attention, modeling relationships with graph attention, or ranking items with target-aware attention, each mechanism offers unique strengths. Use what is most applicable or a combination as needed. If you want to talk about your use case and what might fit the best, please reach out to us.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Applied ML | Recommender systems! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p>Prior posts on recsys stacks that can use attention:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;eede9593-983b-4a20-9e64-123a70a454e7&quot;,&quot;caption&quot;:&quot;The first tech stack you should build today for personalized recommendations is retrieval using two tower models[1, 2] and ranking on top of it. In this article we will learn about two-tower models and ranking will be covered in a future post. Using two tower models has helped leading tech companies improve the quality of their recommendations, online a&#8230;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Two tower models for retrieval of recommendations&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5668214,&quot;name&quot;:&quot;Gaurav Chakravorty&quot;,&quot;bio&quot;:&quot;- Applied ML in Recommender systems (Facebook / Instagram, Google, Discord)\n- 20 years in Applied ML&quot;,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f40461d9-d0cc-4a2b-bc68-d46e2c022079_401x401.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2021-02-19T14:16:01.838Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1124e0c7-447f-4d94-8148-5fcb26c25d63_1686x1001.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://recsysml.substack.com/p/two-tower-models-for-retrieval-of&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:32375010,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:16,&quot;comment_count&quot;:1,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Applied ML | Recommender systems&quot;,&quot;publication_logo_url&quot;:&quot;&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;c44c1b8a-d4ba-4b2d-ae1d-c3dc44eae1d6&quot;,&quot;caption&quot;:&quot;Why is an Early Ranker needed&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Early (Stage) Ranking in recommender systems&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5668214,&quot;name&quot;:&quot;Gaurav Chakravorty&quot;,&quot;bio&quot;:&quot;- Applied ML in Recommender systems (Facebook / Instagram, Google, Discord)\n- 20 years in Applied ML&quot;,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f40461d9-d0cc-4a2b-bc68-d46e2c022079_401x401.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-11-26T22:04:01.098Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338dae91-781f-45c4-89d2-7575998a6095_1612x786.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://recsysml.substack.com/p/early-stage-ranking-in-recommender&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:135985725,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:23,&quot;comment_count&quot;:3,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Applied ML | Recommender systems&quot;,&quot;publication_logo_url&quot;:&quot;&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p><p><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</p>]]></content:encoded></item><item><title><![CDATA[Scalable Embedding based retrieval for target side value]]></title><description><![CDATA[Addressing Scalability Challenges in Two-Sided Embedding based Recommendations]]></description><link>https://recsysml.substack.com/p/scalable-embedding-based-retrieval</link><guid isPermaLink="false">https://recsysml.substack.com/p/scalable-embedding-based-retrieval</guid><dc:creator><![CDATA[Gaurav Chakravorty]]></dc:creator><pubDate>Sat, 17 May 2025 17:54:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f8ec41a-f34a-48e4-a177-bcfdd3e4102a_898x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Friending recommendations in a social network have been known to deliver value by both finding targets that drive viewers to visit the app (&#8220;viewer visitation&#8221;) and initiating connections from viewers that lead targets to visit the app (&#8220;target visitation&#8221;).</p><p>In the post <a href="https://recsysml.substack.com/p/friend-recommendation-retrieval-in">friend recommendation retrieval in a social network</a> we have also covered embedding based retrieval. In this post we will focus on scalability and target side value.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Applied ML | Recommender systems! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Scalability</h2><p>As noted in the <a href="https://recsysml.substack.com/p/friend-recommendation-retrieval-in">previous post</a> one of the most challenging aspects of embedding based retrieval is scalability since the query and candidate sets are 5Billion+ for large social networks.</p><p><strong>Key insight:</strong> The retention value of people/friending recommendations is highest for users who are still building meaningful connections, i.e. &#8220;graph builders&#8221;.</p><p>We will use this below to engineer a low capacity high ROI retrieval system.</p><h2>Target side value</h2><p>Traditionally you will see <a href="https://recsysml.substack.com/p/two-tower-models-for-retrieval-of">two tower models</a> only talk about viewer &#8594; best recommendations because many of these were written to maximize viewer value. You will see a diagram like below:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M3Rv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd922d31-ce9c-4931-8818-9cf5d9711edb_758x1164.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M3Rv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd922d31-ce9c-4931-8818-9cf5d9711edb_758x1164.png 424w, https://substackcdn.com/image/fetch/$s_!M3Rv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd922d31-ce9c-4931-8818-9cf5d9711edb_758x1164.png 848w, https://substackcdn.com/image/fetch/$s_!M3Rv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd922d31-ce9c-4931-8818-9cf5d9711edb_758x1164.png 1272w, https://substackcdn.com/image/fetch/$s_!M3Rv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd922d31-ce9c-4931-8818-9cf5d9711edb_758x1164.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M3Rv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd922d31-ce9c-4931-8818-9cf5d9711edb_758x1164.png" width="465" height="714.0633245382586" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd922d31-ce9c-4931-8818-9cf5d9711edb_758x1164.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1164,&quot;width&quot;:758,&quot;resizeWidth&quot;:465,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M3Rv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd922d31-ce9c-4931-8818-9cf5d9711edb_758x1164.png 424w, https://substackcdn.com/image/fetch/$s_!M3Rv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd922d31-ce9c-4931-8818-9cf5d9711edb_758x1164.png 848w, https://substackcdn.com/image/fetch/$s_!M3Rv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd922d31-ce9c-4931-8818-9cf5d9711edb_758x1164.png 1272w, https://substackcdn.com/image/fetch/$s_!M3Rv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd922d31-ce9c-4931-8818-9cf5d9711edb_758x1164.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 1: Schematic of two tower inference for viewer value. It uses K-nearest-neighbors to find recommendations using the query embedding.</figcaption></figure></div><p></p><p>Even when accounting for candidate side value like the <a href="https://recsysml.substack.com/p/personalized-short-video-recommender">multi-stage system in short-video</a>, the goal is to handle uncertainty of newer candidates. It is not to deliver value to candidates at the same level of importance as viewers.</p><p>If targets are as important, one option is to run a similar query with each target &#8230;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1V9n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f8ec41a-f34a-48e4-a177-bcfdd3e4102a_898x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1V9n!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f8ec41a-f34a-48e4-a177-bcfdd3e4102a_898x1024.png 424w, https://substackcdn.com/image/fetch/$s_!1V9n!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f8ec41a-f34a-48e4-a177-bcfdd3e4102a_898x1024.png 848w, https://substackcdn.com/image/fetch/$s_!1V9n!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f8ec41a-f34a-48e4-a177-bcfdd3e4102a_898x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!1V9n!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f8ec41a-f34a-48e4-a177-bcfdd3e4102a_898x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1V9n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f8ec41a-f34a-48e4-a177-bcfdd3e4102a_898x1024.png" width="594" height="677.3452115812918" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f8ec41a-f34a-48e4-a177-bcfdd3e4102a_898x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:898,&quot;resizeWidth&quot;:594,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1V9n!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f8ec41a-f34a-48e4-a177-bcfdd3e4102a_898x1024.png 424w, https://substackcdn.com/image/fetch/$s_!1V9n!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f8ec41a-f34a-48e4-a177-bcfdd3e4102a_898x1024.png 848w, https://substackcdn.com/image/fetch/$s_!1V9n!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f8ec41a-f34a-48e4-a177-bcfdd3e4102a_898x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!1V9n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f8ec41a-f34a-48e4-a177-bcfdd3e4102a_898x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 2: Parallel K-Nearest-Neighbors (KNN) for queries and items to find best recs for both sets.</figcaption></figure></div><p></p><p>&#8230; and flip to add the targets to recommended lists for each viewer.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oMPN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062b0fe2-ee66-46ed-9298-4b066e722c02_822x824.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oMPN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062b0fe2-ee66-46ed-9298-4b066e722c02_822x824.png 424w, https://substackcdn.com/image/fetch/$s_!oMPN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062b0fe2-ee66-46ed-9298-4b066e722c02_822x824.png 848w, https://substackcdn.com/image/fetch/$s_!oMPN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062b0fe2-ee66-46ed-9298-4b066e722c02_822x824.png 1272w, https://substackcdn.com/image/fetch/$s_!oMPN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062b0fe2-ee66-46ed-9298-4b066e722c02_822x824.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oMPN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062b0fe2-ee66-46ed-9298-4b066e722c02_822x824.png" width="442" height="443.07542579075425" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/062b0fe2-ee66-46ed-9298-4b066e722c02_822x824.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:824,&quot;width&quot;:822,&quot;resizeWidth&quot;:442,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oMPN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062b0fe2-ee66-46ed-9298-4b066e722c02_822x824.png 424w, https://substackcdn.com/image/fetch/$s_!oMPN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062b0fe2-ee66-46ed-9298-4b066e722c02_822x824.png 848w, https://substackcdn.com/image/fetch/$s_!oMPN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062b0fe2-ee66-46ed-9298-4b066e722c02_822x824.png 1272w, https://substackcdn.com/image/fetch/$s_!oMPN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F062b0fe2-ee66-46ed-9298-4b066e722c02_822x824.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 3: Since viewers receive the recommendations, we still have to flip the output of target KNN so that we can find the viewers to whom we should recommend these targets.</figcaption></figure></div><p></p><p>By now we have the basic argument in place. What is missing is scalability. It is not easy within the capacity, latency and time budget of modern social networks to run 5B+ KNN queries with 5B+ candidates. What we could do is to find the cohort we want to focus on and run the KNNs for only the cohorts most in need of good recommendations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SHtT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9623f659-3cbc-4b5d-b2c7-ed7c018d2035_1254x1166.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SHtT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9623f659-3cbc-4b5d-b2c7-ed7c018d2035_1254x1166.png 424w, https://substackcdn.com/image/fetch/$s_!SHtT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9623f659-3cbc-4b5d-b2c7-ed7c018d2035_1254x1166.png 848w, https://substackcdn.com/image/fetch/$s_!SHtT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9623f659-3cbc-4b5d-b2c7-ed7c018d2035_1254x1166.png 1272w, https://substackcdn.com/image/fetch/$s_!SHtT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9623f659-3cbc-4b5d-b2c7-ed7c018d2035_1254x1166.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SHtT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9623f659-3cbc-4b5d-b2c7-ed7c018d2035_1254x1166.png" width="1254" height="1166" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9623f659-3cbc-4b5d-b2c7-ed7c018d2035_1254x1166.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1166,&quot;width&quot;:1254,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SHtT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9623f659-3cbc-4b5d-b2c7-ed7c018d2035_1254x1166.png 424w, https://substackcdn.com/image/fetch/$s_!SHtT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9623f659-3cbc-4b5d-b2c7-ed7c018d2035_1254x1166.png 848w, https://substackcdn.com/image/fetch/$s_!SHtT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9623f659-3cbc-4b5d-b2c7-ed7c018d2035_1254x1166.png 1272w, https://substackcdn.com/image/fetch/$s_!SHtT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9623f659-3cbc-4b5d-b2c7-ed7c018d2035_1254x1166.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 4: For scalability, we can independently limit queries to viewers and targets who most need people recommendations.</figcaption></figure></div><p></p><h2>Algorithm: Maximizing target participation.</h2><p>One part of this that we hand waved over above is how to &#8220;flip&#8221; the list of tgt &#8594; [list of recommended viewers] to viewer &#8594; list of targets.</p><p>A naive solution would be to just do it in Presto, etc., but what potential problems could arise with this approach?</p><ol><li><p>Viewer flooding: Some viewers are recommended too many graph-building targets. This might degrade their experience and cause recommendation blindness (similar to <a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43887.pdf">ad blindness</a>).</p></li><li><p>Target-starvation: Some targets might receive low &#8220;attention&#8221; since they are only in lists of viewers with too many graph-building targets.</p></li></ol><p>This is a variant of the &#8220;Set-Cover&#8221; problem which is <a href="https://www.cs.cornell.edu/courses/cs482/2007su/NPComplete.pdf">NP-complete</a>. However, we will show below what is a decent approximation under large numbers.</p><p></p><p><strong>Solution:</strong> Let&#8217;s say your target-side KNN produces a table named <em>tgt_to_top_ten_viewers_knn_table</em>, which has &#8220;target_id&#8221; and &#8220;viewers&#8221;, an array of viewer_ids. We can use the below algorithm to maximize the number of targets that are present in some viewer&#8217;s list and also get attention from them. The basic idea is to ensure that each viewer does not get more than (say) 3 graph-building targets.</p><pre><code>-- Calculate the count of targets each viewer is associated with
WITH viewer_target_count AS (
  SELECT 
    viewer_id,
    COUNT(*) AS viewer_app_count
  FROM (
    -- Expand the tgt_to_top_ten_viewers_knn_table
    -- into a table with target_id and viewer_id columns
    SELECT 
      target_id,
      viewer_id
    FROM tgt_to_top_ten_viewers_knn_table
    CROSS JOIN UNNEST(viewers) AS t(viewer_id)
  ) t
  GROUP BY viewer_id
),
-- Expand the tgt_to_top_ten_viewers_knn_table 
-- into a table with target_id and viewer_id columns
expanded_table AS (
  SELECT 
    target_id,
    viewer_id
  FROM tgt_to_top_ten_viewers_knn_table
  CROSS JOIN UNNEST(viewers) AS t(viewer_id)
),
-- Select targets for each viewer with a probability 
-- that ensures each viewer is associated with 
-- approximately 3 targets
selected_targets AS (
  SELECT 
    et.target_id,
    et.viewer_id,
    vtc.viewer_app_count
  FROM expanded_table et
  JOIN viewer_target_count vtc ON et.viewer_id = vtc.viewer_id
  WHERE RAND() &lt; LEAST(1, 3.0 / vtc.viewer_app_count)  -- Probability of selection decreases as viewer_app_count increases
)
-- Group the selected targets by viewer_id and 
-- aggregate the target_ids
SELECT 
  viewer_id,
  ARRAY_AGG(DISTINCT target_id) AS target_ids
FROM selected_targets
GROUP BY viewer_id;</code></pre><p></p><h2>Conclusion</h2><p>Model based retrieval is powerful in social network recommendations like Linkedin, Snap and Facebook. Noting that these recommendations deliver value to both viewers and the targets recommended can maximize retention for the app. In the article we propose approaches to deliver such viewer and target side value while handling scalability challenges of searching large sets of users.</p><p></p><p>We hope this spurs your imagination the next time you are thinking about building model based retrieval.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Applied ML | Recommender systems! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p><em><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</em></p>]]></content:encoded></item><item><title><![CDATA[Friend Recommendation Retrieval in a social network]]></title><description><![CDATA[From Graph Search to Deep Neural Two-Tower Models]]></description><link>https://recsysml.substack.com/p/friend-recommendation-retrieval-in</link><guid isPermaLink="false">https://recsysml.substack.com/p/friend-recommendation-retrieval-in</guid><dc:creator><![CDATA[Gaurav Chakravorty]]></dc:creator><pubDate>Sun, 24 Nov 2024 16:31:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mMxl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7ed596d-22f0-42d7-8838-691bb914a94a_796x666.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><strong>Introduction</strong></h3><p>Social networking platforms, e.g. <a href="https://www.linkedin.com/blog/engineering/recommendations/building-a-large-scale-recommendation-system-people-you-may-know">LinkedIn:PYMK</a> and <a href="https://transparency.meta.com/features/explaining-ranking/ig-suggested-accounts/">IG:SA</a>, help users find friends on the platform and forge meaningful connections. Traditionally, an effective approach for friend recommendations has been to suggest "friends of friends" i.e. using mutual connections as a proxy for relevance. This article begins by exploring graph search as a way to implement this baseline and traces the evolution of friend recommendation systems beyond this foundational approach. It concludes with a discussion of <a href="https://github.com/gauravchak/two_tower_models">Two Tower</a> models, which leverage compute power and data to drive greater accuracy in retrieval.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Applied ML | Recommender systems! Subscribe for free to receive ideas on the state of the art in AI / ML.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h3><strong>Graph Search: The First Principles Approach</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mMxl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7ed596d-22f0-42d7-8838-691bb914a94a_796x666.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mMxl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7ed596d-22f0-42d7-8838-691bb914a94a_796x666.png 424w, https://substackcdn.com/image/fetch/$s_!mMxl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7ed596d-22f0-42d7-8838-691bb914a94a_796x666.png 848w, https://substackcdn.com/image/fetch/$s_!mMxl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7ed596d-22f0-42d7-8838-691bb914a94a_796x666.png 1272w, https://substackcdn.com/image/fetch/$s_!mMxl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7ed596d-22f0-42d7-8838-691bb914a94a_796x666.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mMxl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7ed596d-22f0-42d7-8838-691bb914a94a_796x666.png" width="433" height="362.28391959798995" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7ed596d-22f0-42d7-8838-691bb914a94a_796x666.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:666,&quot;width&quot;:796,&quot;resizeWidth&quot;:433,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mMxl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7ed596d-22f0-42d7-8838-691bb914a94a_796x666.png 424w, https://substackcdn.com/image/fetch/$s_!mMxl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7ed596d-22f0-42d7-8838-691bb914a94a_796x666.png 848w, https://substackcdn.com/image/fetch/$s_!mMxl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7ed596d-22f0-42d7-8838-691bb914a94a_796x666.png 1272w, https://substackcdn.com/image/fetch/$s_!mMxl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7ed596d-22f0-42d7-8838-691bb914a94a_796x666.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 1: Showing two hop paths from C. First hop nodes are colored purple and Second hop are colored blue. The number of 2 hop paths converging on each node is written alongside the node. Here all paths have been weighted the same. We can consider an extension where paths are weighted lower if they travel through nodes with high degree. For instance, weight of C,A,J could be &#8531; and weight of C,E,D could be &#8533; based on the degree of the 1-hop node. There are other approaches to weighting as well. In <a href="https://tkipf.github.io/graph-convolutional-networks/">Kipf et al. 2017</a> they show 1/sqrt(degree) to work well.</figcaption></figure></div><p>To implement &#8220;Friends of Friends,&#8221; a scalable approach, e.g. <a href="https://research.facebook.com/publications/unicorn-a-system-for-searching-the-social-graph/">FB:Unicorn</a>, has been to create a graph datastore of relationships that enables counting the number of two-hop paths between a source user and other nodes. Ranking nodes by the number of two-hop paths is akin to ranking by mutual friend count, providing a basic but efficient recommendation mechanism.</p><h3><strong>Improving Graph Search with Weighted Paths</strong></h3><p>Besides scaling up one can use weighted paths in the graph between viewer and recommended friend to refine results. By assigning weights to edges (friendship connections) and nodes (users) based on various attributes, platforms can calculate a "path score" for potential connections. These scores are derived from the cumulative weight of paths ending at a given node, allowing the platform to rank potential connections by relevance. (<a href="https://link.springer.com/article/10.1007/s11432-017-9243-7">Gong et al. 2017</a>)</p><p>For weighting, one might consider incorporating factors like interaction frequency of the users in the edge, connection recency (how recently that connection was made), common interests, or shared communities (for example - we may assign a &#8531; edge weight multiplier if two users have had shared a conversation in the last 7 days, effectively increasing the likelihood of sourcing this candidate). The hypothesis here is that a user with multiple paths of highly weighted connections is more likely to be a relevant friend recommendation. Different weighting hypotheses can be validated through online experimentation, making this approach more robust. This weighted-path method aligns with traditional graph search while enhancing it with more granular social signals.<br><br></p><h3><strong>Embedding-Based Approaches: DeepWalk and Node2Vec</strong></h3><p>Credit to where it is due. Two seminal papers that demonstrated the feasibility of embedding techniques are <a href="https://arxiv.org/abs/1403.6652">DeepWalk</a>[4] and <a href="https://arxiv.org/abs/1607.00653">Node2Vec</a>[5]. These methods sample paths and learn latent representations (embeddings) of nodes based on their connectivity patterns, capturing the structure of the network more efficiently than traditional graph search. This embedding-based approach effectively reconstructs paths in the graph, with similar embeddings indicating a higher likelihood of connection. Sharing two images of DeepWalk below to build intuition.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jUOL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a6f387-a90c-4647-83ec-7f1006749242_1390x1126.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jUOL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a6f387-a90c-4647-83ec-7f1006749242_1390x1126.png 424w, https://substackcdn.com/image/fetch/$s_!jUOL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a6f387-a90c-4647-83ec-7f1006749242_1390x1126.png 848w, https://substackcdn.com/image/fetch/$s_!jUOL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a6f387-a90c-4647-83ec-7f1006749242_1390x1126.png 1272w, https://substackcdn.com/image/fetch/$s_!jUOL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a6f387-a90c-4647-83ec-7f1006749242_1390x1126.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jUOL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a6f387-a90c-4647-83ec-7f1006749242_1390x1126.png" width="457" height="370.2028776978417" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/26a6f387-a90c-4647-83ec-7f1006749242_1390x1126.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1126,&quot;width&quot;:1390,&quot;resizeWidth&quot;:457,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jUOL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a6f387-a90c-4647-83ec-7f1006749242_1390x1126.png 424w, https://substackcdn.com/image/fetch/$s_!jUOL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a6f387-a90c-4647-83ec-7f1006749242_1390x1126.png 848w, https://substackcdn.com/image/fetch/$s_!jUOL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a6f387-a90c-4647-83ec-7f1006749242_1390x1126.png 1272w, https://substackcdn.com/image/fetch/$s_!jUOL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26a6f387-a90c-4647-83ec-7f1006749242_1390x1126.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5S8I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb955266f-b7a7-44b6-a09e-e7731496f034_1474x618.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5S8I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb955266f-b7a7-44b6-a09e-e7731496f034_1474x618.png 424w, https://substackcdn.com/image/fetch/$s_!5S8I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb955266f-b7a7-44b6-a09e-e7731496f034_1474x618.png 848w, https://substackcdn.com/image/fetch/$s_!5S8I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb955266f-b7a7-44b6-a09e-e7731496f034_1474x618.png 1272w, https://substackcdn.com/image/fetch/$s_!5S8I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb955266f-b7a7-44b6-a09e-e7731496f034_1474x618.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5S8I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb955266f-b7a7-44b6-a09e-e7731496f034_1474x618.png" width="727" height="304.58104395604397" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b955266f-b7a7-44b6-a09e-e7731496f034_1474x618.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:610,&quot;width&quot;:1456,&quot;resizeWidth&quot;:727,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5S8I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb955266f-b7a7-44b6-a09e-e7731496f034_1474x618.png 424w, https://substackcdn.com/image/fetch/$s_!5S8I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb955266f-b7a7-44b6-a09e-e7731496f034_1474x618.png 848w, https://substackcdn.com/image/fetch/$s_!5S8I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb955266f-b7a7-44b6-a09e-e7731496f034_1474x618.png 1272w, https://substackcdn.com/image/fetch/$s_!5S8I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb955266f-b7a7-44b6-a09e-e7731496f034_1474x618.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figures 2 &amp; 3 from <a href="https://arxiv.org/abs/1403.6652">DeepWalk paper</a> show how they are building intuition from paths to clusters to embeddings.</figcaption></figure></div><p>DeepWalk, which uses random walks on the graph to generate training samples, laid the foundation for this approach. Node2Vec extends it by enabling biased sampling, which helps control the balance between exploring global and local structures. These methods allow platforms to efficiently compute similarity scores between users, which are then used to make friend recommendations.</p><h3><strong>Clustering with Spectral Analysis</strong></h3><p>Spectral clustering on the friendship graph is another approach to friend recommendations, particularly useful for identifying groups of users likely to share interests or connections. In this method, clusters are formed based on dense areas in the graph, often revealing latent communities. Nodes with many paths ending on them, but without direct connections, are likely in the same cluster as the source node, making them promising friend recommendations.</p><p>Though historically computationally complex, some recent approaches to spectral clustering have been more efficient: AROPE, NetSMF, ProNE ([8], [9], [10] respectively). For example, AROPE is computationally linear with respect to graph size. NetSMF claims to have taken &#8220;only&#8221; 24 hours to train on a network of 10s of millions of nodes.</p><p>Spectral clustering can yield high-quality recommendations by grouping users into meaningful social clusters, but due to its high computational complexity it is no longer popular for generating production-scale friending recommendations. It is although very useful to reduce network interference bias in A/B tests for friending recommendation models - core idea is to leverage spectral clustering to identify dense user-groups and assign the entire group to either treatment or control [<a href="https://arxiv.org/pdf/1903.08755">14</a>]. Running A/B tests at a coarser granularity (at a user-group level instead of at a user level) helps reduce network interference in friending systems.</p><p>Challenges remain though; at the billion node scale, these approaches could still be problematic and model fine tuning with updated data could be challenging.</p><h3><strong>Two Tower Models: An Extension of Clustering Approaches</strong></h3><p>Two Tower models ([<a href="https://github.com/gauravchak/two_tower_models">7</a>], [<a href="https://arxiv.org/abs/2407.13218">1</a>], [<a href="https://www.linkedin.com/blog/engineering/recommendations/candidate-generation-in-a-large-scale-graph-recommendation-system-people-you-may-know">2</a>]), also the workhorse in ads or video recommendations, are a natural evolution from clustering techniques, as they aim to learn representations for each user independently, which are then used to predict likely friend connections. In this architecture, one "tower" processes the features of the viewing user (e.g., the one looking for new friends), while the other tower processes features of the candidate user. The model is trained to maximize the similarity between embeddings of users who are friends and minimize it for non-friends.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y0fA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a1f7f89-aa13-4a2a-988a-5ab2191bf0eb_1366x358.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y0fA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a1f7f89-aa13-4a2a-988a-5ab2191bf0eb_1366x358.png 424w, https://substackcdn.com/image/fetch/$s_!Y0fA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a1f7f89-aa13-4a2a-988a-5ab2191bf0eb_1366x358.png 848w, https://substackcdn.com/image/fetch/$s_!Y0fA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a1f7f89-aa13-4a2a-988a-5ab2191bf0eb_1366x358.png 1272w, https://substackcdn.com/image/fetch/$s_!Y0fA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a1f7f89-aa13-4a2a-988a-5ab2191bf0eb_1366x358.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y0fA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a1f7f89-aa13-4a2a-988a-5ab2191bf0eb_1366x358.png" width="1366" height="358" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7a1f7f89-aa13-4a2a-988a-5ab2191bf0eb_1366x358.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:358,&quot;width&quot;:1366,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Y0fA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a1f7f89-aa13-4a2a-988a-5ab2191bf0eb_1366x358.png 424w, https://substackcdn.com/image/fetch/$s_!Y0fA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a1f7f89-aa13-4a2a-988a-5ab2191bf0eb_1366x358.png 848w, https://substackcdn.com/image/fetch/$s_!Y0fA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a1f7f89-aa13-4a2a-988a-5ab2191bf0eb_1366x358.png 1272w, https://substackcdn.com/image/fetch/$s_!Y0fA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a1f7f89-aa13-4a2a-988a-5ab2191bf0eb_1366x358.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 4: Two Tower models (schematic inspired from <a href="https://www.linkedin.com/pulse/personalized-recommendations-iv-two-tower-models-gaurav-chakravorty/">post</a>)</figcaption></figure></div><p></p><h4><strong>Pros of Two Tower Models over Graph Search</strong></h4><p>Two Tower models offer several advantages over traditional graph search:</p><ul><li><p><strong>Nuanced Relationship Capture</strong>: These models can learn complex, latent relationships by processing a variety of features and considering recency or frequency of interactions, which graph search approaches struggle to incorporate effectively.</p></li><li><p><strong>Multi-Task Potential</strong>: Since Two Tower models capture user relationships in embeddings, they are flexible enough to be used for tasks beyond friend recommendations, like predicting message frequency or engagement with friends' content. This multi-task ability leads to enhanced alignment between the sourced candidates and the final ranking model; they can both be trained to value the same interactions and training samples.</p></li><li><p><strong>Scalability</strong>: Unlike graph search, which can become computationally intensive at scale, Two Tower models are optimized for distributed computation, making them well-suited for large networks.</p></li><li><p><strong>Compute-Enhanced Accuracy</strong>: Two Tower models allow state-of-the-art architectural techniques, like transformers or mixture-of-experts, to complement increased training data, boosting accuracy in retrieval.</p></li></ul><p>Two Tower models for friend recommendations is a deep area of research and there is a lot more to talk about it. For instance, the Multi-task reference above is a rich and impactful vein of exploration.</p><h3><strong>Priming Friend Recommendations with Existing Connections</strong></h3><p>One effective way to enhance a Two Tower model for friend recommendation is to prime the training dataset with existing friends. By including all current connections as positive samples, the model can better understand what a "friendship" looks like, helping it differentiate friends from non-friends more effectively. This approach enriches the embedding space, offering the model a broader understanding of social connections and improving its performance in predicting new friendships. This can be even more useful if the embedding table of user and target ids is shared with other sparse features that leverage userids. During inference using K-nearest neighbors, you will need to filter out current friends from new friend recommendations. Open research question: Can this be avoided? Can we directly train embeddings which are close for new friends but aren&#8217;t for existing friends?</p><h3><strong>Using Impact Weighting for Topline Alignment and to Prevent Embedding Collapse</strong></h3><p>A risk in embedding-based approaches is that the embeddings may <a href="https://arxiv.org/abs/2310.04400">collapse</a> into a low-rank representation, meaning they do not adequately capture the uniqueness of individual users. Collapsed embeddings are often overly fixated on high-activity "power users," limiting the model's ability to recommend relevant connections for a broader user base. Put another way, the neural net is learning the simplest solution to the challenging problem.&nbsp;</p><p>To address this, we need to force the model to avoid the simple solution by making the loss depend less on these &#8220;easy&#8221; training examples. Try weighting your examples by inverse square root of number of friends of user and target in the Two Tower model training data [<a href="https://arxiv.org/abs/1609.02907">3</a>]. This will focus more on users who are likely to derive greater value from new friendships. By emphasizing these users, you not only mitigate the risk of embedding collapse but also develop an embedding space that aligns more closely with business goals. This approach improves the quality and diversity of friend recommendations, supporting the creation of meaningful connections for a wider range of users.<br>Embedding collapse is a deep topic, e.g. &#8203;&#8203;[<a href="https://arxiv.org/abs/2310.04400">6</a>], and we will share a separate post on it.</p><h3><strong>Blueprint for Adding Two Tower Models to Graph Search Implementations</strong></h3><p>If you currently have a friend-of-friends implementation and are considering adding Two Tower-based retrieval, it can be a challenging transition. Two Tower model development is a new skill for many teams, and without a proper validation path, delivering top-line results may take time. The following trajectory is suggested:</p><ol><li><p><strong>Optimize Offline Hit Rate</strong>: Improve the model to achieve an offline hit rate at rank 1 for a batch size of 1024, targeting a hit rate of 0.7.</p></li><li><p><strong>Begin with Offline Inference</strong>: Before deploying embedding-based retrieval in production, start with offline computations.</p><ul><li><p>Compute embeddings for both user and candidate towers.</p></li><li><p>For each user, generate the top 100 candidates.</p></li><li><p>For each candidate, generate the top 100 users, and cross-reference in SQL to produce candidate lists for users.</p></li></ul></li><li><p><strong>Validate Recall</strong>: Use candidate lists to evaluate recall against ground truth (organically added friends) in offline experiments. This recall should be measured at the retrieval stage, as the ranking model has not yet been trained for this distribution.</p></li><li><p><strong>Develop Ranking Features</strong>: Using the lists from the retrieval stage, develop ranking features for friend recommendations. Validate that these features improve performance by reducing offline normalized entropy.</p></li><li><p><strong>Generator efficiency</strong>: After step 4, see if the impression rate of candidates from your two tower model is more than the retrieval distribution rate of your generator. This would mean that candidates from your generator are considered better by the ranking layer than alternatives.</p></li></ol><p>If the above is not happening, it is too soon to expect topline gains. You might want to iterate.</p><p>More generally about embedding based retrieval, Snap&#8217;s retrieval team found ([15]) adding Embedding Based Retrieval (EBR) increased friendships made from friend-recs by 5% to 10%, and the overlap of these with Graph-Search (Friends-of-Friends) is low. In a follow up ([16]) they found an 11% improvement in friends-made-with-communication. This can be seen as a hybrid of two approaches. Here they sample (up to) 5 of the friends of the viewer and find nearest neighbors with them.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e506!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb94f259c-00b7-488f-912d-7e999c411179_1272x884.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e506!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb94f259c-00b7-488f-912d-7e999c411179_1272x884.png 424w, https://substackcdn.com/image/fetch/$s_!e506!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb94f259c-00b7-488f-912d-7e999c411179_1272x884.png 848w, https://substackcdn.com/image/fetch/$s_!e506!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb94f259c-00b7-488f-912d-7e999c411179_1272x884.png 1272w, https://substackcdn.com/image/fetch/$s_!e506!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb94f259c-00b7-488f-912d-7e999c411179_1272x884.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e506!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb94f259c-00b7-488f-912d-7e999c411179_1272x884.png" width="1272" height="884" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b94f259c-00b7-488f-912d-7e999c411179_1272x884.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:884,&quot;width&quot;:1272,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!e506!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb94f259c-00b7-488f-912d-7e999c411179_1272x884.png 424w, https://substackcdn.com/image/fetch/$s_!e506!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb94f259c-00b7-488f-912d-7e999c411179_1272x884.png 848w, https://substackcdn.com/image/fetch/$s_!e506!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb94f259c-00b7-488f-912d-7e999c411179_1272x884.png 1272w, https://substackcdn.com/image/fetch/$s_!e506!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb94f259c-00b7-488f-912d-7e999c411179_1272x884.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 5: Illustration of sampling 1-hop before embedding based retrieval in [<a href="https://dl.acm.org/doi/10.1145/3626772.3661367">16</a>]</figcaption></figure></div><p></p><h3><strong>Conclusion</strong></h3><p>The journey from graph search to Two Tower models represents a significant advancement in friend recommendation systems, enabling platforms to deliver recommendations that are both accurate and meaningful. While graph search techniques provide a solid foundation, the advent of embedding-based and deep neural models like Two Tower architectures has opened new avenues for friend retrieval, allowing social networks to capture the nuances of human connections in increasingly sophisticated ways. As these systems continue to evolve, they will foster richer, more personalized experiences that mirror the dynamic nature of social relationships.</p><p>Though we discuss Two Tower models in the Friending use case, this architecture lends itself towards efficient, scalable ad &amp; video recommendation, facial recognition, and more generally - learned embedding database search.&nbsp;</p><p></p><h3><strong>References &amp; Further reading</strong></h3><ol><li><p><a href="https://arxiv.org/abs/2407.13218">[2407.13218] LiNR: Model Based Neural Retrieval on GPUs at LinkedIn</a></p></li><li><p><a href="https://www.linkedin.com/blog/engineering/recommendations/candidate-generation-in-a-large-scale-graph-recommendation-system-people-you-may-know">Candidate Generation in a Large Scale Graph Recommendation System: People You May Know</a></p></li><li><p><a href="https://arxiv.org/abs/1609.02907">[1609.02907] Semi-Supervised Classification with Graph Convolutional Networks</a> the authors propose a layer-wise propagation rule that includes a normalization factor 1didj &#8203;, where di&#8203; and djare the degrees of the nodes connected by an edge. This normalization helps to stabilize training by scaling the contributions of each node's neighbors according to their degrees. This motivation was used above in the weighted paths idea.</p></li><li><p><a href="https://arxiv.org/abs/1403.6652">[1403.6652] DeepWalk: Online Learning of Social Representations</a>&nbsp;</p></li><li><p><a href="https://arxiv.org/abs/1607.00653">[1607.00653] node2vec: Scalable Feature Learning for Networks</a></p></li><li><p><a href="https://arxiv.org/abs/2310.04400">[2310.04400] On the Embedding Collapse when Scaling up Recommendation Models</a></p></li><li><p><a href="https://github.com/gauravchak/two_tower_models">GitHub - gauravchak/two_tower_models: Repo to guide implementation of Two Tower models</a></p></li><li><p><a href="https://pengcui.thumedialab.com/papers/NE-ArbitraryProximity.pdf">AROPE - Arbitrary-Order Proximity Preserved Network Embedding&nbsp;</a></p></li><li><p><a href="https://arxiv.org/abs/1906.11156">[1906.11156] NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization</a></p></li><li><p><a href="https://www.ijcai.org/proceedings/2019/0594.pdf">ProNE: Fast and Scalable Network Representation Learning</a></p></li><li><p><a href="https://github.com/facebookresearch/faiss">FAISS</a> - Facebook AI Similarity Search</p></li><li><p><a href="https://link.springer.com/article/10.1007/s11432-017-9243-7">Integrating a weighted-average method into the random walk framework to generate individual friend recommendations - Gong et al. 2017</a></p></li><li><p><a href="https://tkipf.github.io/graph-convolutional-networks/">Graph Convolutional Networks | Thomas Kipf | Google DeepMind</a></p></li><li><p><a href="https://arxiv.org/pdf/1903.08755">[1903.08755] Using Ego-Clusters to Measure Network Effects at LinkedIn</a></p></li><li><p><a href="https://dl.acm.org/doi/10.1145/3539618.3591848">Embedding Based Retrieval in Friend Recommendation (Snap 2023)</a></p></li><li><p><a href="https://dl.acm.org/doi/10.1145/3626772.3661367">Improving Embedding-Based Retrieval in Friend Recommendation with ANN Query Expansion (Snap 2024)</a></p></li></ol><div id="youtube2-checTInZguM" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;checTInZguM&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/checTInZguM?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</p>]]></content:encoded></item><item><title><![CDATA[Declarative Value-Model Tuning]]></title><description><![CDATA[Code to show a couple of approaches to achieve the desired task importance in value model]]></description><link>https://recsysml.substack.com/p/declarative-value-model-tuning</link><guid isPermaLink="false">https://recsysml.substack.com/p/declarative-value-model-tuning</guid><dc:creator><![CDATA[Gaurav Chakravorty]]></dc:creator><pubDate>Tue, 10 Sep 2024 06:02:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/xMQEFyNzsHc" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Normally Value Models in recommender systems are hand tuned. We provide a couple of utilities to derive VM weights from targeted task importance.</p><p></p><h3><strong>Context</strong></h3><p>Value model (VM) weights are used to combine multiple task predictions in a recommender system. For instance the following could be a config to produce a ranked list by (0.1 * P(watch &gt; 3s) + 0.3 * P(watch &gt; 30s) + 20 * P(watch &amp; share) + 2 * P(watch &amp; like) + 5 * P(watch &amp; follow)).</p><pre><code><code>{
  "weights": {
    "p_watch_3s": 0.1,
    "p_watch_30s": 0.3,
    "p_watch_and_share": 20,
    "p_watch_and_like": 2,
    "p_watch_and_follow": 5
  }
}</code></code></pre><p>More info on VM <a href="https://recsysml.substack.com/p/ranking-model-calibration-in-recommender">here</a> and <a href="https://arxiv.org/abs/2208.04560">here</a>.</p><h3>Normal workflow</h3><p>Normally VM weights are tuned by either a grid search or by multiplying by 2 or 1/2 times the current task weight and running experiments.</p><p>What could be empowering is if practitioners had a tool to specify the desired importance of each task and compute the VM weights from that. This could be a great baseline to jump to and then search nearby this weight.</p><h3>Two approaches to task importance</h3><p>In <a href="https://github.com/gauravchak/value_model_tuning">github.com/gauravchak/value_model_tuning</a>, we look at two approaches for declarative VM tuning.</p><ol><li><p><strong>NDCG Gap Targeting:</strong> This computes a leave one out ranking for each task and then computes the NDCG gap of this ranking from the current ranking. This gap or difference is the importance of this task. Then it tries to adjust weights to achieve the desired relative gaps/deltas.</p></li><li><p><strong>Per task regret targeting:</strong> In the spirit of <a href="https://www.kdd.org/kdd2020/accepted-papers/view/the-nodehopper-enabling-low-latency-ranking-with-constraints-via-a-fast-dua.html">this Google-Deepmind paper</a>, this describes the regret of current weights per task as compared to ranking purely based on that task, how much worse is the current ranking doing. It then adjusts weights so that these per-task regrets have the relative importance specified by the user.</p></li></ol><p></p><div id="youtube2-xMQEFyNzsHc" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;xMQEFyNzsHc&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/xMQEFyNzsHc?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Applied ML | Recommender systems! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p><em><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of or attributable to their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</em></p>]]></content:encoded></item><item><title><![CDATA[Ranking model calibration in recommender systems]]></title><description><![CDATA[We define calibration of ranking models in calibration, the benefit it can bring to prioritize calibration and how to achieve it without affecting normalized cross entropy / AUC metrics.]]></description><link>https://recsysml.substack.com/p/ranking-model-calibration-in-recommender</link><guid isPermaLink="false">https://recsysml.substack.com/p/ranking-model-calibration-in-recommender</guid><dc:creator><![CDATA[Gaurav Chakravorty]]></dc:creator><pubDate>Sun, 09 Jun 2024 00:27:32 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338ecd74-c686-4d41-908a-1f72249165c7_1428x1236.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We show the importance of calibration in ranking models and how to implement it efficiently.</p><h2>Context setting</h2><p>Most recommender systems have a multi-task estimator model that estimates the probability of various user actions on the recommendation. After that there is usually a &#8220;value model&#8221; (a.k.a. <a href="https://arxiv.org/abs/2208.04560">multi-task fusion</a>) to combine these into a single score to rank by. However, as we will show below the emitted probabilities might not be calibrated (explained below) with the observed probabilities.</p><blockquote><p>Fixing model calibration can improve the topline metrics of your recsys.<br>(see benefit section)</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6ZjU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef57cd3-7fb4-4693-9702-f6e8066c9565_1620x1116.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6ZjU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef57cd3-7fb4-4693-9702-f6e8066c9565_1620x1116.png 424w, https://substackcdn.com/image/fetch/$s_!6ZjU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef57cd3-7fb4-4693-9702-f6e8066c9565_1620x1116.png 848w, https://substackcdn.com/image/fetch/$s_!6ZjU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef57cd3-7fb4-4693-9702-f6e8066c9565_1620x1116.png 1272w, https://substackcdn.com/image/fetch/$s_!6ZjU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef57cd3-7fb4-4693-9702-f6e8066c9565_1620x1116.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6ZjU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef57cd3-7fb4-4693-9702-f6e8066c9565_1620x1116.png" width="1456" height="1003" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8ef57cd3-7fb4-4693-9702-f6e8066c9565_1620x1116.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1003,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:105687,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6ZjU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef57cd3-7fb4-4693-9702-f6e8066c9565_1620x1116.png 424w, https://substackcdn.com/image/fetch/$s_!6ZjU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef57cd3-7fb4-4693-9702-f6e8066c9565_1620x1116.png 848w, https://substackcdn.com/image/fetch/$s_!6ZjU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef57cd3-7fb4-4693-9702-f6e8066c9565_1620x1116.png 1272w, https://substackcdn.com/image/fetch/$s_!6ZjU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ef57cd3-7fb4-4693-9702-f6e8066c9565_1620x1116.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 1: Multi-task fusion / Value model is used often in modern recommender system. See <a href="https://github.com/gauravchak/causal_debiased_ranking/blob/main/src/top_item_selector.py#L73">code:L73</a> for an example implementation. When it is used, it basically assumes that your ranking model / estimator is self-calibrated.</figcaption></figure></div><p><strong>Optional Context: </strong>For readers interested in late stage ranking, various aspects have been covered in</p><ul><li><p>[<a href="https://recsysml.substack.com/p/does-your-model-get-better-at-task">ranking_basics</a>]</p></li><li><p>[<a href="https://recsysml.substack.com/p/how-to-reduce-cost-of-ranking-by">interplay-with-ESR</a>]</p></li><li><p>[<a href="https://recsysml.substack.com/p/reducing-selection-bias-popularity">reducing biases</a>]</p></li><li><p>[<a href="https://recsysml.substack.com/p/user-preference-modeling-in-a-recommender">how to model userid</a>]</p></li><li><p>[<a href="https://recsysml.substack.com/p/retention-modeling-at-feed-entry">how to rank for long-term user satisfaction</a>].</p></li></ul><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://recsysml.substack.com/subscribe?"><span>Subscribe now</span></a></p><p><strong>Code &amp; Video</strong></p><p>PTAL code here: <a href="https://github.com/gauravchak/calibration_arch_in_ranking_mtml/">github - gauravchak/calibration_arch_in_ranking_mtml</a></p><div id="youtube2-kVFkEOieqNg" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;kVFkEOieqNg&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/kVFkEOieqNg?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2>What will calibration bring you in recsys?</h2><p>But hang on &#8230;<strong> </strong> you say, while these &#8220;late stage ranking&#8221; models are trained against binary user labels with binary cross entropy loss, we don&#8217;t really need them to be actual probabilities right, since there is value model on top of it. </p><p>And yes you would be right technically, if this isn&#8217;t ads ranking or feed blending you don&#8217;t really need the outputs to be probabilities. Any score that is comparable between two options should work to order them. However, since you have a value model (a.k.a. Multi-task fusion model), to combines these scores, calibration provides a sort of decoupling, a contract if you will about the scale and distribution between your model and your value model. </p><blockquote><p>Without calibration, many of your ranking model accuracy improvements <strong>will fail to be launched</strong> because they are changing the scale / distribution especially for some country or user cohort. This will make your experiments look unintentionally soft in metrics.</p></blockquote><p>In addition, since the goal of calibration is to provide the true likelihood of the predicted outcomes it can indeed help your ranking. Wouldn't you prefer to put at the top 1 position what you believe has the highest likelihood? For example <a href="https://arxiv.org/pdf/2211.01494">Youtube</a> found that using calibration increased their search CTR by 0.66%. This might seem small but in the context of a model like Youtube this is not a small feast! <br><a href="https://arxiv.org/pdf/2105.04651">Microsoft</a> found similar results for the context of search retrieval. They also argue that this helps to find a threshold when not to show results for a query.</p><h2>Prior work &amp; Insights</h2><h3><strong>1. Post trained model on logits</strong></h3><p>In the most classical approaches, calibration is often applied as a post-training step; training a specific model on the logits of the first model using an independent dataset [<a href="https://scikit-learn.org/stable/modules/calibration.html#calibrating-a-classifier">1</a>]. For example in <a href="https://arxiv.org/abs/1706.04599">On Calibration of Modern Neural Networks</a>, Guo et al. leverage the idea of using a calibration model to improve the reliability of probability estimates from classification models&#8203; <strong>using the temperature to do the calibration</strong>.<br>An <strong>advantage</strong> of this method is that by using an independent dataset and not the training dataset of your first model that created the logit it should generalize better on new data. One obvious <strong>disadvantage</strong> is that it complicates your overall architecture and training a second model can be costly. In addition, scalability can quickly become an issue if you wish to ensure calibration on different features. But don&#8217;t worry as with anything with machine learning there are other ways!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Z5H3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9415fadc-5336-4560-bb13-25a1934beb0b_1600x460.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Z5H3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9415fadc-5336-4560-bb13-25a1934beb0b_1600x460.png 424w, https://substackcdn.com/image/fetch/$s_!Z5H3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9415fadc-5336-4560-bb13-25a1934beb0b_1600x460.png 848w, https://substackcdn.com/image/fetch/$s_!Z5H3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9415fadc-5336-4560-bb13-25a1934beb0b_1600x460.png 1272w, https://substackcdn.com/image/fetch/$s_!Z5H3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9415fadc-5336-4560-bb13-25a1934beb0b_1600x460.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Z5H3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9415fadc-5336-4560-bb13-25a1934beb0b_1600x460.png" width="1456" height="419" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9415fadc-5336-4560-bb13-25a1934beb0b_1600x460.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:419,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Z5H3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9415fadc-5336-4560-bb13-25a1934beb0b_1600x460.png 424w, https://substackcdn.com/image/fetch/$s_!Z5H3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9415fadc-5336-4560-bb13-25a1934beb0b_1600x460.png 848w, https://substackcdn.com/image/fetch/$s_!Z5H3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9415fadc-5336-4560-bb13-25a1934beb0b_1600x460.png 1272w, https://substackcdn.com/image/fetch/$s_!Z5H3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9415fadc-5336-4560-bb13-25a1934beb0b_1600x460.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 2: Apply post-training model on logits <a href="https://www.overleaf.com/read/mdftjvtbjwhk#c36916">source</a></figcaption></figure></div><h3><strong>2. Loss function</strong></h3><p>In <a href="https://arxiv.org/pdf/2211.01494">Regression Compatible Listwise Objectives for Calibrated Ranking with Binary Relevance</a>, Google explores how creating a <a href="https://dl.acm.org/doi/pdf/10.1145/3534678.3539072">scale-calibrated</a> multi-objective loss function can not only allow you to increase calibration but also benefit your ranking score in a listwise context. Interestingly, in this paper they argue that using a multi-objective where one is for ranking and the second is for calibration is <strong>not always compatible</strong>. More precisely, they prove that the common loss used (sigmoid for regression and softmax for ranking) actually pushes the gradient scores in different directions. Therefore, we will not cover others multi-objectives as a means for calibration here.<br>Also please note that for pointwise there are also <a href="https://arxiv.org/pdf/2209.05310">ways</a> to include calibration as part of the loss.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h-yw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6737d7c3-14f2-4770-8e9e-900d405f11fe_1600x475.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h-yw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6737d7c3-14f2-4770-8e9e-900d405f11fe_1600x475.png 424w, https://substackcdn.com/image/fetch/$s_!h-yw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6737d7c3-14f2-4770-8e9e-900d405f11fe_1600x475.png 848w, https://substackcdn.com/image/fetch/$s_!h-yw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6737d7c3-14f2-4770-8e9e-900d405f11fe_1600x475.png 1272w, https://substackcdn.com/image/fetch/$s_!h-yw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6737d7c3-14f2-4770-8e9e-900d405f11fe_1600x475.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h-yw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6737d7c3-14f2-4770-8e9e-900d405f11fe_1600x475.png" width="1456" height="432" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6737d7c3-14f2-4770-8e9e-900d405f11fe_1600x475.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:432,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!h-yw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6737d7c3-14f2-4770-8e9e-900d405f11fe_1600x475.png 424w, https://substackcdn.com/image/fetch/$s_!h-yw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6737d7c3-14f2-4770-8e9e-900d405f11fe_1600x475.png 848w, https://substackcdn.com/image/fetch/$s_!h-yw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6737d7c3-14f2-4770-8e9e-900d405f11fe_1600x475.png 1272w, https://substackcdn.com/image/fetch/$s_!h-yw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6737d7c3-14f2-4770-8e9e-900d405f11fe_1600x475.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 3: <a href="https://arxiv.org/pdf/2211.01494">Regression Compatible Ranking (RCR) loss</a> for a single query in a logistic-regression ranking task (i.e. ranking with binary relevance labels)</figcaption></figure></div><h3><strong>3. Layer</strong></h3><p>Another way to achieve calibration is by <strong>introducing a last layer into your network</strong> that achieves the calibration for you. For example, <a href="https://arxiv.org/pdf/2402.06859">Linkedin</a> created an Isotonic Calibration Layer for their ranker which <strong>helped increase offline and online metrics</strong>. In the repository, we also included a <a href="https://github.com/gauravchak/calibration_arch_in_ranking_mtml/blob/main/src/platt_scaling_calibration.py">layer</a> to represent Platt scaling.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Hp86!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1108129c-7a1f-40b4-a15e-b9d515469802_1044x1064.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Hp86!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1108129c-7a1f-40b4-a15e-b9d515469802_1044x1064.png 424w, https://substackcdn.com/image/fetch/$s_!Hp86!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1108129c-7a1f-40b4-a15e-b9d515469802_1044x1064.png 848w, https://substackcdn.com/image/fetch/$s_!Hp86!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1108129c-7a1f-40b4-a15e-b9d515469802_1044x1064.png 1272w, https://substackcdn.com/image/fetch/$s_!Hp86!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1108129c-7a1f-40b4-a15e-b9d515469802_1044x1064.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Hp86!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1108129c-7a1f-40b4-a15e-b9d515469802_1044x1064.png" width="1044" height="1064" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1108129c-7a1f-40b4-a15e-b9d515469802_1044x1064.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1064,&quot;width&quot;:1044,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Hp86!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1108129c-7a1f-40b4-a15e-b9d515469802_1044x1064.png 424w, https://substackcdn.com/image/fetch/$s_!Hp86!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1108129c-7a1f-40b4-a15e-b9d515469802_1044x1064.png 848w, https://substackcdn.com/image/fetch/$s_!Hp86!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1108129c-7a1f-40b4-a15e-b9d515469802_1044x1064.png 1272w, https://substackcdn.com/image/fetch/$s_!Hp86!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1108129c-7a1f-40b4-a15e-b9d515469802_1044x1064.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 4: Isotonic Layer by <a href="https://arxiv.org/pdf/2402.06859">Linkedin LiRank</a></figcaption></figure></div><h3><strong>Which one to pick?</strong></h3><p>As with everything in life.. It depends. Every problem is different, are you doing a listwise loss? Are you working on a retrieval system? Is scale an issue for you? Do you want to add a bayesian layer at the end of your model to do some exploration? Do you have a pointwise or listwise loss?<br>Sadly there is no one solution that fits all. And to be frank, this does not cover all the different ways to do calibration to start with. The importance as with everything is to start with something easy, test it and gradually make it better.</p><h2>Defining calibration</h2><h3><strong>1. Overall calibration (per task)</strong></h3><p>For each task the average value of the predicted probability of the task should match the average value of the observed probability.</p><h3>2. Calibration on each prediction bucket/bin</h3><p>It is possible that your model overpredicts or underpredicts at some ranges of the prediction. </p><p>For instance if you make 5 equal buckets of the eval dataset based on the predicted labels and compare the average values of the predicted label and observed task, do you see some buckets where there is significant gap in prediction vs observation?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yOqo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338ecd74-c686-4d41-908a-1f72249165c7_1428x1236.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yOqo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338ecd74-c686-4d41-908a-1f72249165c7_1428x1236.png 424w, https://substackcdn.com/image/fetch/$s_!yOqo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338ecd74-c686-4d41-908a-1f72249165c7_1428x1236.png 848w, https://substackcdn.com/image/fetch/$s_!yOqo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338ecd74-c686-4d41-908a-1f72249165c7_1428x1236.png 1272w, https://substackcdn.com/image/fetch/$s_!yOqo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338ecd74-c686-4d41-908a-1f72249165c7_1428x1236.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yOqo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338ecd74-c686-4d41-908a-1f72249165c7_1428x1236.png" width="512" height="443.15966386554624" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/338ecd74-c686-4d41-908a-1f72249165c7_1428x1236.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1236,&quot;width&quot;:1428,&quot;resizeWidth&quot;:512,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Example 1&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Example 1" title="Example 1" srcset="https://substackcdn.com/image/fetch/$s_!yOqo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338ecd74-c686-4d41-908a-1f72249165c7_1428x1236.png 424w, https://substackcdn.com/image/fetch/$s_!yOqo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338ecd74-c686-4d41-908a-1f72249165c7_1428x1236.png 848w, https://substackcdn.com/image/fetch/$s_!yOqo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338ecd74-c686-4d41-908a-1f72249165c7_1428x1236.png 1272w, https://substackcdn.com/image/fetch/$s_!yOqo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338ecd74-c686-4d41-908a-1f72249165c7_1428x1236.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 5: Showing how your prediction could be over-calibrated in certain ranges and under-calibrated in another range while being overall well calibrated.</figcaption></figure></div><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DsDo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819a4501-8691-4953-8f68-3a13958235a4_1534x1258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DsDo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819a4501-8691-4953-8f68-3a13958235a4_1534x1258.png 424w, https://substackcdn.com/image/fetch/$s_!DsDo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819a4501-8691-4953-8f68-3a13958235a4_1534x1258.png 848w, https://substackcdn.com/image/fetch/$s_!DsDo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819a4501-8691-4953-8f68-3a13958235a4_1534x1258.png 1272w, https://substackcdn.com/image/fetch/$s_!DsDo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819a4501-8691-4953-8f68-3a13958235a4_1534x1258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DsDo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819a4501-8691-4953-8f68-3a13958235a4_1534x1258.png" width="1456" height="1194" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/819a4501-8691-4953-8f68-3a13958235a4_1534x1258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1194,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:767178,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DsDo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819a4501-8691-4953-8f68-3a13958235a4_1534x1258.png 424w, https://substackcdn.com/image/fetch/$s_!DsDo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819a4501-8691-4953-8f68-3a13958235a4_1534x1258.png 848w, https://substackcdn.com/image/fetch/$s_!DsDo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819a4501-8691-4953-8f68-3a13958235a4_1534x1258.png 1272w, https://substackcdn.com/image/fetch/$s_!DsDo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819a4501-8691-4953-8f68-3a13958235a4_1534x1258.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 6: Another way to look at this is to plot the fraction of the positive label by binning your data along your predictions.</figcaption></figure></div><h3>3. Calibrated per user-cohort (or in general based on a feature)</h3><p>Here we are concerned where your multi-task model might be overall well calibrated but could be mis-calibrated for certain cases. For instance:</p><ol><li><p>You are building a music recommender system and your system could be under-calibrated for &#8220;timeless classics&#8221;. That means for timeless classics the predicted probability of listening might be lower than what you observe in data.</p></li><li><p>You are building a video recommender system where the training data is dominated by short videos and you find that you could be under predicting the probability of &#8220;like&#8221; on longer videos. </p></li></ol><h2></h2><h2>Implementation suggestions for calibrated ranking</h2><h3>Platt Scaling</h3><p>In <a href="https://github.com/gauravchak/calibration_arch_in_ranking_mtml/blob/main/src/platt_scaling_calibration.py">src/platt_scaling_calibration.py</a> we show an option of adding Platt Scaling to improve Overall Calibration .</p><pre><code># in init: set the weight and bias for Platt Scaling
self.weights = nn.Parameter(torch.zeros(num_tasks))
self.bias = nn.Parameter(torch.zeros(num_tasks))

# during inference: computing task estimates
calibrated_logits = self.weights * ui_logits + self.bias</code></pre><h3>Add a loss to improve calibration per prediction bucket</h3><p>In <a href="https://github.com/gauravchak/calibration_arch_in_ranking_mtml/blob/main/src/prediction_buckets_calibration.py">src/prediction_buckets_calibration.py</a>, we add a loss based on #2 &#8220;Calibration on each prediction bucket/bin&#8221;. To do that, we compute the mean squared error (MSE) between the mean of the label and the mean of the prediction in each equally spaced bucket/bin (like histogram) of the prediction values.</p><pre><code># Compute ECE-MeanSquaredError loss
# These steps have been verified <a href="https://colab.research.google.com/drive/1EkubNvQ3X_fFLOSb6KDbUGCogk02Ae8b#scrollTo=JNzZTr5BoOkg">in this google colab</a>

# Sigmoid to go from logits to predicted probabilities
preds: torch.Tensor = torch.sigmoid(ui_logits)

# Assuming preds and labels are of shape [B, T], sort preds to get indices. sorted_indices[0, t] would then be the index (from 0 to B-1) corresponding to the smallest value of the predicted probabilities of the t_th task.
sorted_indices: torch.Tensor = torch.argsort(preds, dim=0)
# Hence sorted_preds[i, t] is the i-th smallest predicted probability
sorted_preds: torch.Tensor = torch.gather(input=preds, dim=0, index=sorted_indices)
# sorted_labels[i, t] is the corresponding label
sorted_labels: torch.Tensor = torch.gather(input=labels.float(), dim=0, index=sorted_indices)

# Compute the mean prediction in each bin
pred_mean_per_bin: torch.Tensor = torch.matmul(self.scale_proj_mat, sorted_preds)  # [PB, T]
# Compute label_mean in the bucket
label_mean_per_bin: torch.Tensor = torch.matmul(self.scale_proj_mat, sorted_labels)  # [PB, T]

# Compute MSE between mean label and prediction in the bucket.
# First compute per task. This will allow us to later reuse any task specific weights set by the user for cross_entropy_loss.
mse_per_task: torch.Tensor = ((pred_mean_per_bin - label_mean_per_bin)**2).mean(dim=0)
calibration_loss: torch.Tensor = mse_per_task.mean()</code></pre><p>Note&nbsp;</p><ol><li><p>ECE is defined with absolute deviation but we have chosen to use mean squared error in this implementation.</p></li><li><p><a href="https://arxiv.org/abs/1909.10155">Verified Uncertainty Calibration</a> shows that ECE has some bias. However, we think the drawback is not considerable.</p></li></ol><h3>Making sure the model is calibrated for different user cohorts</h3><p>In <a href="https://github.com/gauravchak/calibration_arch_in_ranking_mtml/blob/main/src/feature_based_calibration.py">src/feature_based_calibration.py</a>, we add a loss that captures calibration for both values of a given feature. Imagine you are building a friend recommendation application and you want to ensure that your ranking model works for both new users and tenured users. By setting a feature &#8220;is_tenured&#8221;, this code shows how to ensure your models are calibrated for both tenured and new users. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PPF6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d0bfdb-0d5f-4154-a1f0-54d8b5139700_1768x808.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PPF6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d0bfdb-0d5f-4154-a1f0-54d8b5139700_1768x808.png 424w, https://substackcdn.com/image/fetch/$s_!PPF6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d0bfdb-0d5f-4154-a1f0-54d8b5139700_1768x808.png 848w, https://substackcdn.com/image/fetch/$s_!PPF6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d0bfdb-0d5f-4154-a1f0-54d8b5139700_1768x808.png 1272w, https://substackcdn.com/image/fetch/$s_!PPF6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d0bfdb-0d5f-4154-a1f0-54d8b5139700_1768x808.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PPF6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d0bfdb-0d5f-4154-a1f0-54d8b5139700_1768x808.png" width="1456" height="665" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/41d0bfdb-0d5f-4154-a1f0-54d8b5139700_1768x808.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:665,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:401998,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PPF6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d0bfdb-0d5f-4154-a1f0-54d8b5139700_1768x808.png 424w, https://substackcdn.com/image/fetch/$s_!PPF6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d0bfdb-0d5f-4154-a1f0-54d8b5139700_1768x808.png 848w, https://substackcdn.com/image/fetch/$s_!PPF6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d0bfdb-0d5f-4154-a1f0-54d8b5139700_1768x808.png 1272w, https://substackcdn.com/image/fetch/$s_!PPF6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d0bfdb-0d5f-4154-a1f0-54d8b5139700_1768x808.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig 7: Showing how the model could have been miscalibrated for a key feature like say user cohort, and using this approach we can fix that. Hence with value model above it, we will not be unfair to any user cohort if we use this approach.</figcaption></figure></div><h3>Debiasing model arch against a feature</h3><p>In <a href="https://github.com/gauravchak/calibration_arch_in_ranking_mtml/blob/main/src/feature_bias_capture.py">src/feature_bias_capture.py</a>, we are not trying to add a loss. We are adding a model architecture component, a shallow tower if you will, that computes each task logit purely based on the single feature and then adds the main task logits to it.</p><p></p><h2>Appendix</h2><ol><li><p>Note on nomenclature: Calibration here is different from <a href="https://dl.acm.org/doi/pdf/10.1145/3240323.3240372">Steck (2018)</a>.</p><p>What we are referring to as calibration is different from what Harald Steck refers to <a href="https://dl.acm.org/doi/pdf/10.1145/3240323.3240372">here</a>. In that paper, he is suggesting that if you observe the user&#8217;s prior interest in some categories, by ensuring your current slate of recommendations match the user&#8217;s prior distribution, you will not be under or over predicting a category. Here we are talking about matching the rate of the observed true label in your predictions.</p></li></ol><p></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://recsysml.substack.com/subscribe?"><span>Subscribe now</span></a></p><p></p><p><em><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of or attributable to their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</em></p>]]></content:encoded></item><item><title><![CDATA[Entrypoint retention modeling in recommender systems]]></title><description><![CDATA[Choose/rank items at the entrypoint of a recommended feed to drive retention and not just consumption]]></description><link>https://recsysml.substack.com/p/retention-modeling-at-feed-entry</link><guid isPermaLink="false">https://recsysml.substack.com/p/retention-modeling-at-feed-entry</guid><dc:creator><![CDATA[Gaurav Chakravorty]]></dc:creator><pubDate>Fri, 24 May 2024 13:49:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!h_We!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa626b5af-bfa1-428e-814a-f813b5599e40_554x750.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In our previous post we shared our learnings on how to <a href="https://recsysml.substack.com/p/how-to-account-for-risk-in-recommended">consider the risk of abandonment in a repeated recommender system</a>. In this post, we focus on the initial recommendation, which we call the &#8220;entrypoint&#8221; item. We share our learnings on improved entrypoint recommendations, which in our experience has led to higher daily active users and app-sessions.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Applied ML | Recommender systems! Subscribe for free to receive new posts and please comment / give feedback.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h3>What do we mean by &#8220;entrypoint&#8221; recommendations</h3><p><strong>Carousel entrypoint: </strong>In many apps you will find either a carousel of options that opens up an immersive UI where you can scroll for more content. For example the Youtube Shorts carousel shown below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h_We!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa626b5af-bfa1-428e-814a-f813b5599e40_554x750.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h_We!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa626b5af-bfa1-428e-814a-f813b5599e40_554x750.png 424w, https://substackcdn.com/image/fetch/$s_!h_We!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa626b5af-bfa1-428e-814a-f813b5599e40_554x750.png 848w, https://substackcdn.com/image/fetch/$s_!h_We!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa626b5af-bfa1-428e-814a-f813b5599e40_554x750.png 1272w, https://substackcdn.com/image/fetch/$s_!h_We!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa626b5af-bfa1-428e-814a-f813b5599e40_554x750.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h_We!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa626b5af-bfa1-428e-814a-f813b5599e40_554x750.png" width="370" height="500.90252707581226" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a626b5af-bfa1-428e-814a-f813b5599e40_554x750.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:750,&quot;width&quot;:554,&quot;resizeWidth&quot;:370,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;How to use YouTube Shorts | Mashable&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="How to use YouTube Shorts | Mashable" title="How to use YouTube Shorts | Mashable" srcset="https://substackcdn.com/image/fetch/$s_!h_We!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa626b5af-bfa1-428e-814a-f813b5599e40_554x750.png 424w, https://substackcdn.com/image/fetch/$s_!h_We!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa626b5af-bfa1-428e-814a-f813b5599e40_554x750.png 848w, https://substackcdn.com/image/fetch/$s_!h_We!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa626b5af-bfa1-428e-814a-f813b5599e40_554x750.png 1272w, https://substackcdn.com/image/fetch/$s_!h_We!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa626b5af-bfa1-428e-814a-f813b5599e40_554x750.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An example in a modern recommender system of a carousel of entrypoint recommendations. (<a href="https://mashable.com/article/how-to-use-youtube-shorts">Image source</a>, image chosen without prejudice as the first on the topic on Google image search)</figcaption></figure></div><p><strong>Feed entrypoint: </strong>We have also had success applying these approaches to the first item in an interface where the user can swipe/scroll up to see more recommendations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Aip4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40bd1bf4-7491-495e-9b09-5ff807f73d16_222x480.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Aip4!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40bd1bf4-7491-495e-9b09-5ff807f73d16_222x480.gif 424w, https://substackcdn.com/image/fetch/$s_!Aip4!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40bd1bf4-7491-495e-9b09-5ff807f73d16_222x480.gif 848w, https://substackcdn.com/image/fetch/$s_!Aip4!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40bd1bf4-7491-495e-9b09-5ff807f73d16_222x480.gif 1272w, https://substackcdn.com/image/fetch/$s_!Aip4!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40bd1bf4-7491-495e-9b09-5ff807f73d16_222x480.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Aip4!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40bd1bf4-7491-495e-9b09-5ff807f73d16_222x480.gif" width="224" height="484.3243243243243" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/40bd1bf4-7491-495e-9b09-5ff807f73d16_222x480.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:480,&quot;width&quot;:222,&quot;resizeWidth&quot;:224,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Scroll Through TikTok Hands-Free on Your iPhone or iPad Using Simple Voice  Commands &#171; iOS &amp; iPhone :: Gadget Hacks&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Scroll Through TikTok Hands-Free on Your iPhone or iPad Using Simple Voice  Commands &#171; iOS &amp; iPhone :: Gadget Hacks" title="Scroll Through TikTok Hands-Free on Your iPhone or iPad Using Simple Voice  Commands &#171; iOS &amp; iPhone :: Gadget Hacks" srcset="https://substackcdn.com/image/fetch/$s_!Aip4!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40bd1bf4-7491-495e-9b09-5ff807f73d16_222x480.gif 424w, https://substackcdn.com/image/fetch/$s_!Aip4!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40bd1bf4-7491-495e-9b09-5ff807f73d16_222x480.gif 848w, https://substackcdn.com/image/fetch/$s_!Aip4!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40bd1bf4-7491-495e-9b09-5ff807f73d16_222x480.gif 1272w, https://substackcdn.com/image/fetch/$s_!Aip4!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40bd1bf4-7491-495e-9b09-5ff807f73d16_222x480.gif 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An example of a continuous feed of content where the first content recommended in the feed is being referred to as &#8220;entrypoint&#8221; in this article. (<a href="https://ios.gadgethacks.com/how-to/scroll-through-tiktok-hands-free-your-iphone-ipad-using-simple-voice-commands-0384514/">image source</a>, image chosen without prejudice as the first GIF on google image search on the topic)</figcaption></figure></div><p>In both these situations the entrypoint item has a high impact on the value derived by the user from the session. Not getting it right can lead to shallow sessions and reduced current and future sessions.</p><h3>The benefit of entrypoint retention modeling</h3><h4>Retention modeling in carousel entrypoint</h4><p>If done right can lead to:</p><ul><li><p>reduction in number of single item sessions, which means sessions that don&#8217;t advance beyond the item in the carousel</p></li><li><p>longer app-sessions</p></li><li><p>an increase in app-sessions and daily active users especially for infrequent (non power) users.</p></li><li><p>increased revenue caused by the effects above.</p></li></ul><h4>Retention modeling in feed entrypoint. </h4><p>If done right can lead to:</p><ul><li><p>reduction in &#8220;skip rate&#8221;, the rate at which users skip the recommended item.</p></li><li><p>longer app-sessions</p></li><li><p>an increase in app-sessions and daily active users especially for infrequent (non power) users.</p></li><li><p>increased revenue caused by the effects above.</p></li></ul><h2>Mathematical derivation</h2><p>The cumulative value of a trajectory given the user is guaranteed to see the first recommendation &#8220;entrypoint&#8221; is <strong>Eq(1)</strong></p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V(\\tau) = V(0) + \\sum_{i=1}^{\\infty} \\left( \\prod_{j=0}^{i-1} (1 - \\text{exit}(j)) \\right) V(i)&quot;,&quot;id&quot;:&quot;VCDZHFMRWM&quot;}" data-component-name="LatexBlockToDOM"></div><ul><li><p>V(i) refers to the value derived from the i_th recommendation</p></li><li><p>exit(i) is 1 if the user exits the feed at the i_th recommendation</p></li></ul><p>We will use this version, <strong>Eq(1)</strong>, in the implementation section but before we dive into implementation it might help to look at the problem in a couple of ways so that you can connect it to what we discussed in <a href="https://recsysml.substack.com/p/how-to-account-for-risk-in-recommended">our previous post of incorporating P(exit)</a>.</p><p><strong>Eq(2)</strong> We can also write this in a way that separates the value derived from every point in the feed.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V(\\tau) = V(0) + (1 - \\text{exit}(0)) V(1) + (1 - \\text{exit}(0))(1 - \\text{exit}(1)) V(2) + (1 - \\text{exit}(0))(1 - \\text{exit}(1))(1 - \\text{exit}(2)) V(3) + \\cdots&quot;,&quot;id&quot;:&quot;ASOGQNSNMN&quot;}" data-component-name="LatexBlockToDOM"></div><p></p><p>Another way to look at this is is <strong>Eq(3)</strong>:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V(\\tau \\mid 0) = V(0) + (1 - \\text{exit}(0)) \\cdot V(\\tau \\mid 1)&quot;,&quot;id&quot;:&quot;CAQLWUGRLO&quot;}" data-component-name="LatexBlockToDOM"></div><p>and in general the value starting the k_th position is the value from that recommendation and conditional on the user not exiting the value from the rest of the session starting at position (k+1)</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V(\\tau \\mid k) = V(k) + (1 - \\text{exit}(k)) \\cdot V(\\tau \\mid (k+1))&quot;,&quot;id&quot;:&quot;MNQVDYFHFG&quot;}" data-component-name="LatexBlockToDOM"></div><p></p><h2>Key Insight</h2><p>Our key insights are:</p><ol><li><p>Optimizing entrypoint purely by pointwise reward E[V(0)] ignores the second term in Eq(1).</p></li><li><p>Entrypoint recommendations can affect the value of the full session since, as shown in Eq(3), the recommendation directly affects the probability of exit and hence the second term.</p></li><li><p>Since you expect to have a higher impact of improved entrypoint recommendation on users who don&#8217;t yet have entrenched habits, it might help to limit your training data for these tasks to infrequent users.</p></li><li><p>In recommender systems that are responsive and alter what they show based on previous user interactions, entrypoint recommendations with good follow on recommendation options can capitalize on user intent generated by the entrypoint.</p></li><li><p>To account for the reduced causality between entrypoint and future terms, we can use a discounted sum of future rewards as opposed to the sum of future rewards. (See more in implementation set 2.a. below)</p></li></ol><h2>Recommended Implementation</h2><ol><li><p>Join the user interaction at the entrypoint with the value for the rest of the session like time spent, number of items seen by the user, likes, etc.</p></li><li><p>We recommend keeping these downstream tasks separate in initial iteration like</p><ol><li><p>retention_time_spent = <strong>discounted</strong> sum of time spent starting at position 1</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V(\\tau) = V(0) + \\sum_{i=1}^{\\infty} \\left(\\alpha^i * V(i)\\right)&quot;,&quot;id&quot;:&quot;CVCWCJKKMS&quot;}" data-component-name="LatexBlockToDOM"></div></li><li><p>retention_items_seen = discounted count of items seen starting position 1</p></li><li><p>retention_conversions etc.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3yVu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb70eb1d-4dd3-4272-9ec8-2031f0ea4ccb_1562x692.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3yVu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb70eb1d-4dd3-4272-9ec8-2031f0ea4ccb_1562x692.png 424w, https://substackcdn.com/image/fetch/$s_!3yVu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb70eb1d-4dd3-4272-9ec8-2031f0ea4ccb_1562x692.png 848w, https://substackcdn.com/image/fetch/$s_!3yVu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb70eb1d-4dd3-4272-9ec8-2031f0ea4ccb_1562x692.png 1272w, https://substackcdn.com/image/fetch/$s_!3yVu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb70eb1d-4dd3-4272-9ec8-2031f0ea4ccb_1562x692.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3yVu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb70eb1d-4dd3-4272-9ec8-2031f0ea4ccb_1562x692.png" width="1456" height="645" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eb70eb1d-4dd3-4272-9ec8-2031f0ea4ccb_1562x692.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:645,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:393327,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3yVu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb70eb1d-4dd3-4272-9ec8-2031f0ea4ccb_1562x692.png 424w, https://substackcdn.com/image/fetch/$s_!3yVu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb70eb1d-4dd3-4272-9ec8-2031f0ea4ccb_1562x692.png 848w, https://substackcdn.com/image/fetch/$s_!3yVu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb70eb1d-4dd3-4272-9ec8-2031f0ea4ccb_1562x692.png 1272w, https://substackcdn.com/image/fetch/$s_!3yVu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb70eb1d-4dd3-4272-9ec8-2031f0ea4ccb_1562x692.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Fig1: Compute a decayed sum of future rewards in the session, and use it as a label for the first video. In the example in the image we have used watch time of the videos as the reward. Similar to the multi-task paper, we recommend experiment with engagement and number of views as reward as well.</figcaption></figure></div></li></ol></li><li><p>Add these tasks to your Multi-task estimator (&#8220;ranking&#8221;) model.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Rl8r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2779d32-4bb3-4849-9bbc-0203aa424642_598x504.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Rl8r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2779d32-4bb3-4849-9bbc-0203aa424642_598x504.png 424w, https://substackcdn.com/image/fetch/$s_!Rl8r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2779d32-4bb3-4849-9bbc-0203aa424642_598x504.png 848w, https://substackcdn.com/image/fetch/$s_!Rl8r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2779d32-4bb3-4849-9bbc-0203aa424642_598x504.png 1272w, https://substackcdn.com/image/fetch/$s_!Rl8r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2779d32-4bb3-4849-9bbc-0203aa424642_598x504.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Rl8r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2779d32-4bb3-4849-9bbc-0203aa424642_598x504.png" width="278" height="234.30100334448161" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b2779d32-4bb3-4849-9bbc-0203aa424642_598x504.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:504,&quot;width&quot;:598,&quot;resizeWidth&quot;:278,&quot;bytes&quot;:138656,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Rl8r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2779d32-4bb3-4849-9bbc-0203aa424642_598x504.png 424w, https://substackcdn.com/image/fetch/$s_!Rl8r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2779d32-4bb3-4849-9bbc-0203aa424642_598x504.png 848w, https://substackcdn.com/image/fetch/$s_!Rl8r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2779d32-4bb3-4849-9bbc-0203aa424642_598x504.png 1272w, https://substackcdn.com/image/fetch/$s_!Rl8r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2779d32-4bb3-4849-9bbc-0203aa424642_598x504.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Fig 2: Showing that the new retention tasks are added to the multi task estimator (&#8220;ranking&#8221;) model</figcaption></figure></div></li><li><p>Experiment with conditioning on the right user segment and perhaps on entrypoints where the immediate value to the user was strong enough.</p></li><li><p>Assuming you have a <a href="https://arxiv.org/abs/2208.04560">Multi-task fusion</a> (a.k.a. &#8220;value model&#8221;) approach to combining your task estimates to actually pick the entrypoint item to recommend, in this step there will be a fair bit of iteration to use these new tasks. </p></li></ol><p></p><h2>Video explaining the post</h2><div id="youtube2-6sARaA2h1uY" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;6sARaA2h1uY&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/6sARaA2h1uY?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p></p><p><em><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</em></p><p></p>]]></content:encoded></item><item><title><![CDATA[Optimal whole page ranking = reward / risk]]></title><description><![CDATA[We show how tech can learn from finance in using risk models for better feed construction of recommender systems.]]></description><link>https://recsysml.substack.com/p/how-to-account-for-risk-in-recommended</link><guid isPermaLink="false">https://recsysml.substack.com/p/how-to-account-for-risk-in-recommended</guid><dc:creator><![CDATA[Gaurav Chakravorty]]></dc:creator><pubDate>Sat, 11 May 2024 14:00:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/dN5UqaT0O48" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Are we looking at risk enough in recommender systems?</p><p>While it is tempting to equate &#8220;risk&#8221; with the absence of &#8220;reward&#8221;, we think we can learn from portfolio construction in finance in how modeling and accounting for risk in action selection leads to an increase in long term user value in a multi-iteration setting.</p><div id="youtube2-dN5UqaT0O48" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;dN5UqaT0O48&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/dN5UqaT0O48?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2>Illustration of the idea in finance</h2><p>There is many decades of beautiful mathematics on optimal portfolio construction factoring into risk and reward. In <a href="https://github.com/gauravchak/risk_aware_feed_construction/blob/main/SPY_VIX.ipynb">this Colab</a>, we have shown using a simplistic example how instead of allocating inversely proportional to risk leads to: </p><ol><li><p>higher returns as evidenced by the final portfolio being 1.81 times of the normal full stocks allocation.</p></li><li><p>lower risk as evidenced by lower drawdowns during times of crisis and hence lower risk of the investor having to liquidate.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E6B-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4ec9df-7863-45bd-9854-b51a31513763_1248x934.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E6B-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4ec9df-7863-45bd-9854-b51a31513763_1248x934.png 424w, https://substackcdn.com/image/fetch/$s_!E6B-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4ec9df-7863-45bd-9854-b51a31513763_1248x934.png 848w, https://substackcdn.com/image/fetch/$s_!E6B-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4ec9df-7863-45bd-9854-b51a31513763_1248x934.png 1272w, https://substackcdn.com/image/fetch/$s_!E6B-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4ec9df-7863-45bd-9854-b51a31513763_1248x934.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E6B-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4ec9df-7863-45bd-9854-b51a31513763_1248x934.png" width="1248" height="934" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1f4ec9df-7863-45bd-9854-b51a31513763_1248x934.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:934,&quot;width&quot;:1248,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:534205,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!E6B-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4ec9df-7863-45bd-9854-b51a31513763_1248x934.png 424w, https://substackcdn.com/image/fetch/$s_!E6B-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4ec9df-7863-45bd-9854-b51a31513763_1248x934.png 848w, https://substackcdn.com/image/fetch/$s_!E6B-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4ec9df-7863-45bd-9854-b51a31513763_1248x934.png 1272w, https://substackcdn.com/image/fetch/$s_!E6B-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f4ec9df-7863-45bd-9854-b51a31513763_1248x934.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Conclusion from <a href="https://github.com/gauravchak/risk_aware_feed_construction/blob/main/SPY_VIX.ipynb">notebook</a> illustrating the value of factoring risk into long term value optimization in financial portfolio construction.</figcaption></figure></div><p>The above is just an illustration and all the disclaimers that you usually find with things related to financial advice apply here. Based on the personal experience of the authors, there are ways to mess it up and there are ways to deliver 100+ times more value than the above chart as well. Let&#8217;s take the basic idea and expand on it in the next section.</p><h2>High level idea</h2><div class="pullquote"><p>Take the resources that are limited and make sure to use them optimally.</p></div><p>In financial portfolios, risk is also limited. It&#8217;s not just money.</p><ul><li><p>In fact with leverage, which is to a large extent accessible and cheap, the portfolio invested value is not strictly limited to the money that is invested. </p></li><li><p>Risk is limited: An investor can only stomach a certain amount of it. Hence we see how a portfolio that maximizes reward while containing risk is optimal.</p></li></ul><p>In recommender systems, you have a limited amount of <a href="https://www.newyorker.com/magazine/2024/05/06/the-battle-for-attention">attention</a>, or interest from the user. You are constantly balancing the risk of depleting that resource while trying to deliver value.</p><h2>Reward, Risk and Regret in financial portfolios</h2><ul><li><p>Reward could be the returns or the increase in the portfolio</p></li><li><p>Risk</p><ul><li><p>short term risk of negative returns</p></li><li><p>long term risk of stopping to invest altogether. This is all too common. If you speak to an investor who has been personal investing for around 25 years, our experience has been that more than half of them just don&#8217;t invest in the stock market since they would have gone through some period of extreme risk where they just got disillusioned with the outcome.</p></li><li><p>medium term risk of underallocation</p></li></ul></li><li><p>Regret</p><ul><li><p>often people are looking to have some play in asset classes or exciting stocks / investments that their friends / peers are invested in. This comes from the fear of missing out. So if you are a portfolio manager and you are not allocating at all to something like say cryptocurrency, you could be incurring the risk of regret from your clients who have friends who are allocated.</p></li></ul></li></ul><h2>Reward, Risk and Regret in recommended feed construction</h2><ul><li><p>Reward: This is often related to your definition of business value. For instance, it could be the number of daily active users for your platform, or the total time spent or user activity on your platform.</p></li><li><p>Risk:</p><ul><li><p>users retaining less (high severity)</p></li><li><p>exiting from this session on your app (medium severity)</p></li><li><p>exit from the feed or skip the recommendation (low severity)</p></li></ul></li><li><p>Regret</p><ul><li><p>recommendations not capturing some category/topic/creator/job to be done that you consider par for the course. An inspiring way to reduce this risk is <a href="https://dl.acm.org/doi/10.1145/3240323.3240372">calibrated recommendations</a> we think.</p></li></ul></li></ul><h1>Final algorithm for feed recommendations</h1><p>Instead of ranking items by Expected value / reward, borrowing from the formula in finance, we recommend ranking items for your recommended feed by</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{E[reward]}{E[risk]}&quot;,&quot;id&quot;:&quot;TYYEIBRBOS&quot;}" data-component-name="LatexBlockToDOM"></div><p>A simplistic formulation of this in say a video recommendation system could be</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{P[watch > 30s | user, item]}{P[exit\\_app | user, item]}&quot;,&quot;id&quot;:&quot;KOJWVOXQOY&quot;}" data-component-name="LatexBlockToDOM"></div><p></p><p>This is similar to what <a href="https://www.linkedin.com/in/guanfengliang/">Guanfeng</a> et al. show in <a href="https://patents.google.com/patent/WO2017019548A1/en">Improving feeds by modelling scrolling behavior</a>, i.e. the optimal solution is to rank by probability of reward / probability of ending session:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UDse!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90446f3e-4912-409f-b796-4f9f19755143_920x246.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UDse!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90446f3e-4912-409f-b796-4f9f19755143_920x246.png 424w, https://substackcdn.com/image/fetch/$s_!UDse!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90446f3e-4912-409f-b796-4f9f19755143_920x246.png 848w, https://substackcdn.com/image/fetch/$s_!UDse!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90446f3e-4912-409f-b796-4f9f19755143_920x246.png 1272w, https://substackcdn.com/image/fetch/$s_!UDse!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90446f3e-4912-409f-b796-4f9f19755143_920x246.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UDse!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90446f3e-4912-409f-b796-4f9f19755143_920x246.png" width="920" height="246" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/90446f3e-4912-409f-b796-4f9f19755143_920x246.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:246,&quot;width&quot;:920,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:114134,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UDse!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90446f3e-4912-409f-b796-4f9f19755143_920x246.png 424w, https://substackcdn.com/image/fetch/$s_!UDse!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90446f3e-4912-409f-b796-4f9f19755143_920x246.png 848w, https://substackcdn.com/image/fetch/$s_!UDse!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90446f3e-4912-409f-b796-4f9f19755143_920x246.png 1272w, https://substackcdn.com/image/fetch/$s_!UDse!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90446f3e-4912-409f-b796-4f9f19755143_920x246.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Optimal ranking from <a href="https://patents.google.com/patent/WO2017019548A1/en">Improving feeds by modelling scrolling behavior</a></figcaption></figure></div><p></p><p><em><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</em></p><p></p>]]></content:encoded></item><item><title><![CDATA[User representation in a recommender system | memorization vs generalization]]></title><description><![CDATA[We look at memorization, generalization and mixture of representations based implementations for user preference representation in a recommender system]]></description><link>https://recsysml.substack.com/p/user-preference-modeling-in-a-recommender</link><guid isPermaLink="false">https://recsysml.substack.com/p/user-preference-modeling-in-a-recommender</guid><dc:creator><![CDATA[Gaurav Chakravorty]]></dc:creator><pubDate>Fri, 03 May 2024 16:54:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!gJ5E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff929c706-6f18-4edd-bffd-57d896e6cd1c_1590x934.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Representing the preferences of the user is crucial to personalizing recommender systems. In this article, we propose an approach to representing user preferences that we believe is optimal for large scale recommender systems. The approach is inspired by how humans communicate with others. We use progressively more memory when there is more historical context to fall back to (see figure below).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gJ5E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff929c706-6f18-4edd-bffd-57d896e6cd1c_1590x934.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gJ5E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff929c706-6f18-4edd-bffd-57d896e6cd1c_1590x934.png 424w, https://substackcdn.com/image/fetch/$s_!gJ5E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff929c706-6f18-4edd-bffd-57d896e6cd1c_1590x934.png 848w, https://substackcdn.com/image/fetch/$s_!gJ5E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff929c706-6f18-4edd-bffd-57d896e6cd1c_1590x934.png 1272w, https://substackcdn.com/image/fetch/$s_!gJ5E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff929c706-6f18-4edd-bffd-57d896e6cd1c_1590x934.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gJ5E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff929c706-6f18-4edd-bffd-57d896e6cd1c_1590x934.png" width="1456" height="855" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f929c706-6f18-4edd-bffd-57d896e6cd1c_1590x934.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:855,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:434308,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gJ5E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff929c706-6f18-4edd-bffd-57d896e6cd1c_1590x934.png 424w, https://substackcdn.com/image/fetch/$s_!gJ5E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff929c706-6f18-4edd-bffd-57d896e6cd1c_1590x934.png 848w, https://substackcdn.com/image/fetch/$s_!gJ5E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff929c706-6f18-4edd-bffd-57d896e6cd1c_1590x934.png 1272w, https://substackcdn.com/image/fetch/$s_!gJ5E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff929c706-6f18-4edd-bffd-57d896e6cd1c_1590x934.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Similarly the approach presented in the section &#8220;Mixture of representations&#8221; uses table lookup based memory primarily for users for whom we have enough data to specialize, else it relies on a generalized representation.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support our work. Also we would love to learn your feedback of what we should write about more in future.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Introduction: Generalization vs Memorization</h2><p>We have seen successful recsys implemented using both schools of thought:</p><ol><li><p><strong>Generalization:</strong> Let&#8217;s not have any user specific memorization and purely personalize based on user features. Look at <a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45530.pdf">this seminal Youtube paper</a> for an example.</p></li><li><p><strong>Memorization:</strong> Let&#8217;s have a large lookup table keyed by user id. This is essentially a way to capture clear causality based specifically on the user&#8217;s preference. One could think of this as a modern recommender system incarnation of collaborative filtering, which even today is quote close to SOTA (see <a href="https://scholar.google.com/citations?view_op=view_citation&amp;hl=en&amp;user=yR-ugIoAAAAJ&amp;sortby=pubdate&amp;citation_for_view=yR-ugIoAAAAJ:fPk4N6BV_jEC">here</a>). For those who want to learn collaborative filtering from the best, I recommend reading <a href="https://datajobs.com/data-science-repo/Collaborative-Filtering-[Koren-and-Bell].pdf">this chapter</a>.</p></li></ol><h2>Outline</h2><p>In this post, we will discuss various implementations to capture user preference:</p><ol><li><p>table lookup</p></li><li><p>deep hash embeddings</p></li><li><p>generalization based on user cohort and not specific to the user</p></li><li><p>mixture of representations</p></li></ol><p>We encourage you to try multiple approaches since the results could vary based on the scale, the dynamism of user preferences and how asymmetric the distribution of power / marginal user is on your platform.</p><h2>Code</h2><p>We share PyTorch code <a href="https://github.com/gauravchak/user_preference_modeling">here</a>. It is tested and freely available. We have also posted a walkthrough of the code on Youtube. See the first of the 6 videos below.</p><div id="youtube2-pbboLjaAe0s" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;pbboLjaAe0s&quot;,&quot;startTime&quot;:&quot;104s&quot;,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/pbboLjaAe0s?start=104s&amp;rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2>Memorization based user representation (Table lookup)</h2><p>In this implementation, we create a (large) embedding table keyed with user id. Notwithstanding hash collisions, this enables us to memorize the user&#8217;s preferences and use it in future recommendations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n6WP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa410d369-724d-4dc0-b517-c55005c0ebb1_1268x604.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n6WP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa410d369-724d-4dc0-b517-c55005c0ebb1_1268x604.png 424w, https://substackcdn.com/image/fetch/$s_!n6WP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa410d369-724d-4dc0-b517-c55005c0ebb1_1268x604.png 848w, https://substackcdn.com/image/fetch/$s_!n6WP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa410d369-724d-4dc0-b517-c55005c0ebb1_1268x604.png 1272w, https://substackcdn.com/image/fetch/$s_!n6WP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa410d369-724d-4dc0-b517-c55005c0ebb1_1268x604.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n6WP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa410d369-724d-4dc0-b517-c55005c0ebb1_1268x604.png" width="1268" height="604" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a410d369-724d-4dc0-b517-c55005c0ebb1_1268x604.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:604,&quot;width&quot;:1268,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:187469,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n6WP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa410d369-724d-4dc0-b517-c55005c0ebb1_1268x604.png 424w, https://substackcdn.com/image/fetch/$s_!n6WP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa410d369-724d-4dc0-b517-c55005c0ebb1_1268x604.png 848w, https://substackcdn.com/image/fetch/$s_!n6WP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa410d369-724d-4dc0-b517-c55005c0ebb1_1268x604.png 1272w, https://substackcdn.com/image/fetch/$s_!n6WP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa410d369-724d-4dc0-b517-c55005c0ebb1_1268x604.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Deep Hash Embeddings</h2><p>This is based on <a href="https://arxiv.org/abs/2010.10784">this seemingly magical paper</a>, which achieves performance similar to id lookup without using embedding tables.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-0an!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb724e5-3103-4788-b5ba-247fd2293a30_1194x660.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-0an!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb724e5-3103-4788-b5ba-247fd2293a30_1194x660.png 424w, https://substackcdn.com/image/fetch/$s_!-0an!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb724e5-3103-4788-b5ba-247fd2293a30_1194x660.png 848w, https://substackcdn.com/image/fetch/$s_!-0an!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb724e5-3103-4788-b5ba-247fd2293a30_1194x660.png 1272w, https://substackcdn.com/image/fetch/$s_!-0an!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb724e5-3103-4788-b5ba-247fd2293a30_1194x660.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-0an!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb724e5-3103-4788-b5ba-247fd2293a30_1194x660.png" width="1194" height="660" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8bb724e5-3103-4788-b5ba-247fd2293a30_1194x660.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:660,&quot;width&quot;:1194,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:235039,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-0an!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb724e5-3103-4788-b5ba-247fd2293a30_1194x660.png 424w, https://substackcdn.com/image/fetch/$s_!-0an!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb724e5-3103-4788-b5ba-247fd2293a30_1194x660.png 848w, https://substackcdn.com/image/fetch/$s_!-0an!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb724e5-3103-4788-b5ba-247fd2293a30_1194x660.png 1272w, https://substackcdn.com/image/fetch/$s_!-0an!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb724e5-3103-4788-b5ba-247fd2293a30_1194x660.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I believe the intuition is that the stacked neural network layers learn a form of generalization that is competitive with memorization with an order of magnitude less parameters.</p><h3>User cohort / cluster based representation</h3><p>In this implementation we only look at user features (not including user id). We have a embedding table of a smallish size, say 1024 and we try to find the index in this table the user should map to based on their features, like say location, broad interests etc. This is especially useful when you have very little information for this user.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!d81g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef504cd1-76a4-494e-b047-28add49cb7c7_1588x1194.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!d81g!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef504cd1-76a4-494e-b047-28add49cb7c7_1588x1194.png 424w, https://substackcdn.com/image/fetch/$s_!d81g!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef504cd1-76a4-494e-b047-28add49cb7c7_1588x1194.png 848w, https://substackcdn.com/image/fetch/$s_!d81g!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef504cd1-76a4-494e-b047-28add49cb7c7_1588x1194.png 1272w, https://substackcdn.com/image/fetch/$s_!d81g!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef504cd1-76a4-494e-b047-28add49cb7c7_1588x1194.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!d81g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef504cd1-76a4-494e-b047-28add49cb7c7_1588x1194.png" width="1456" height="1095" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ef504cd1-76a4-494e-b047-28add49cb7c7_1588x1194.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1095,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:481225,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!d81g!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef504cd1-76a4-494e-b047-28add49cb7c7_1588x1194.png 424w, https://substackcdn.com/image/fetch/$s_!d81g!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef504cd1-76a4-494e-b047-28add49cb7c7_1588x1194.png 848w, https://substackcdn.com/image/fetch/$s_!d81g!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef504cd1-76a4-494e-b047-28add49cb7c7_1588x1194.png 1272w, https://substackcdn.com/image/fetch/$s_!d81g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef504cd1-76a4-494e-b047-28add49cb7c7_1588x1194.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Mixture of representations</h3><p>Now we try to combine these ideas. In the image below, </p><ol><li><p>the &#8221;Table Lookup&#8221; refers to the module in &#8220;Memorization based user representation (Table lookup)&#8221; section. </p></li><li><p>the &#8220;Cohort lookup&#8221; refers to the module in &#8220;User cohort / cluster based representation&#8221; section.</p></li></ol><p>Then we take a weighted sum. This weight is hopefully intelligent and knows how to use the best embedding for this user.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fOdo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcf7cc27-251d-4fd3-b3a9-7003047122b4_986x1098.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fOdo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcf7cc27-251d-4fd3-b3a9-7003047122b4_986x1098.png 424w, https://substackcdn.com/image/fetch/$s_!fOdo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcf7cc27-251d-4fd3-b3a9-7003047122b4_986x1098.png 848w, https://substackcdn.com/image/fetch/$s_!fOdo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcf7cc27-251d-4fd3-b3a9-7003047122b4_986x1098.png 1272w, https://substackcdn.com/image/fetch/$s_!fOdo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcf7cc27-251d-4fd3-b3a9-7003047122b4_986x1098.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fOdo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcf7cc27-251d-4fd3-b3a9-7003047122b4_986x1098.png" width="986" height="1098" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fcf7cc27-251d-4fd3-b3a9-7003047122b4_986x1098.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1098,&quot;width&quot;:986,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:352156,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fOdo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcf7cc27-251d-4fd3-b3a9-7003047122b4_986x1098.png 424w, https://substackcdn.com/image/fetch/$s_!fOdo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcf7cc27-251d-4fd3-b3a9-7003047122b4_986x1098.png 848w, https://substackcdn.com/image/fetch/$s_!fOdo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcf7cc27-251d-4fd3-b3a9-7003047122b4_986x1098.png 1272w, https://substackcdn.com/image/fetch/$s_!fOdo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcf7cc27-251d-4fd3-b3a9-7003047122b4_986x1098.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>A note about sequential recommendation</h2><p>Please note that this article is about understanding the user&#8217;s preferences beyond current session&#8217;s ephemeral interests. Sequential recommendation modules might be best at capturing those and making your recsys responsive. User representation and sequential recommendation modules should be complementary.</p><p></p><p><em><strong>Disclaimer:</strong> These are the personal opinions of the author(s). Any assumptions, opinions stated here are theirs and not representative of their current or any prior employer(s). Apart from publicly available information, any other information here is not claimed to refer to any company including ones the author(s) may have worked in or been associated with.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://recsysml.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Applied ML | Recommender systems! Subscribe for free to receive new posts and support our work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>