<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Machine Learning Ops Roundup]]></title><description><![CDATA[Why is machine learning in the real world hard and how do you make it better? This newsletter brings together the best articles, news, and papers highlighting the challenges and opportunities in MLOps]]></description><link>https://mlopsroundup.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!_Ynl!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff72f75fb-0f86-423e-825b-8577e7d5b2ac_1280x1280.png</url><title>Machine Learning Ops Roundup</title><link>https://mlopsroundup.substack.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 11 Apr 2026 19:53:47 GMT</lastBuildDate><atom:link href="https://mlopsroundup.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[ml-ops]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[mlopsroundup@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[mlopsroundup@substack.com]]></itunes:email><itunes:name><![CDATA[ml-ops]]></itunes:name></itunes:owner><itunes:author><![CDATA[ml-ops]]></itunes:author><googleplay:owner><![CDATA[mlopsroundup@substack.com]]></googleplay:owner><googleplay:email><![CDATA[mlopsroundup@substack.com]]></googleplay:email><googleplay:author><![CDATA[ml-ops]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Issue #31: Refuel.ai. LLMs can label data better than humans. Autolabel. ]]></title><description><![CDATA[Back with some news! We started a company called Refuel.ai - helping teams create clean, labeled datasets at the speed of thought.]]></description><link>https://mlopsroundup.substack.com/p/issue-31-refuelai-llms-can-label</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-31-refuelai-llms-can-label</guid><dc:creator><![CDATA[Rishabh Bhargava]]></dc:creator><pubDate>Mon, 26 Jun 2023 16:03:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dcf9e29-c518-4bb9-b11e-e8fc024fd0cd_1080x371.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rm-w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e5ddc8d-25e9-4ed9-8078-074800330f27_1600x877.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rm-w!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e5ddc8d-25e9-4ed9-8078-074800330f27_1600x877.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rm-w!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e5ddc8d-25e9-4ed9-8078-074800330f27_1600x877.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rm-w!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e5ddc8d-25e9-4ed9-8078-074800330f27_1600x877.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rm-w!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e5ddc8d-25e9-4ed9-8078-074800330f27_1600x877.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rm-w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e5ddc8d-25e9-4ed9-8078-074800330f27_1600x877.jpeg" width="1456" height="798" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e5ddc8d-25e9-4ed9-8078-074800330f27_1600x877.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:798,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rm-w!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e5ddc8d-25e9-4ed9-8078-074800330f27_1600x877.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rm-w!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e5ddc8d-25e9-4ed9-8078-074800330f27_1600x877.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rm-w!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e5ddc8d-25e9-4ed9-8078-074800330f27_1600x877.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rm-w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e5ddc8d-25e9-4ed9-8078-074800330f27_1600x877.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Welcome to the 31st issue of the MLOps newsletter (hint: it&#8217;s a special one!).&nbsp;</p><p>Nihit and I have been busy building a company &#8211; <a href="https://www.refuel.ai/">Refuel.ai</a>! We&#8217;re building a platform to clean, label and enrich datasets at human-level quality, using Large Language Models (LLMs). <a href="https://mlopsroundup.substack.com/p/issue-20-ai-playbook-curating-data">Great data leads to great models</a>, but clean, labeled data is a huge bottleneck for machine learning teams.&nbsp;</p><p>We just launched out of stealth (<a href="https://venturebeat.com/ai/refuel-ai-nabs-5m-to-create-training-ready-datasets-with-llms/">media coverage</a>) and are excited to share more details with you. In this issue, we&#8217;ll cover a benchmark for LLM-powered data labeling and <a href="https://github.com/refuel-ai/autolabel">Autolabel</a>, a library to label, clean, and enrich data with your favorite LLM. Feel free to join us on <a href="https://discord.gg/uEdr8nrMGm">Discord</a> if you want to discuss LLMs or data labeling (or anything else) with us! &#128522;</p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><a href="https://www.refuel.ai/blog-posts/llm-labeling-technical-report">LLMs can label data as well as humans, but faster</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.refuel.ai/blog-posts/llm-labeling-technical-report" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!giKI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dcf9e29-c518-4bb9-b11e-e8fc024fd0cd_1080x371.png 424w, https://substackcdn.com/image/fetch/$s_!giKI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dcf9e29-c518-4bb9-b11e-e8fc024fd0cd_1080x371.png 848w, https://substackcdn.com/image/fetch/$s_!giKI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dcf9e29-c518-4bb9-b11e-e8fc024fd0cd_1080x371.png 1272w, https://substackcdn.com/image/fetch/$s_!giKI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dcf9e29-c518-4bb9-b11e-e8fc024fd0cd_1080x371.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!giKI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dcf9e29-c518-4bb9-b11e-e8fc024fd0cd_1080x371.png" width="1080" height="371" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9dcf9e29-c518-4bb9-b11e-e8fc024fd0cd_1080x371.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:371,&quot;width&quot;:1080,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.refuel.ai/blog-posts/llm-labeling-technical-report&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!giKI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dcf9e29-c518-4bb9-b11e-e8fc024fd0cd_1080x371.png 424w, https://substackcdn.com/image/fetch/$s_!giKI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dcf9e29-c518-4bb9-b11e-e8fc024fd0cd_1080x371.png 848w, https://substackcdn.com/image/fetch/$s_!giKI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dcf9e29-c518-4bb9-b11e-e8fc024fd0cd_1080x371.png 1272w, https://substackcdn.com/image/fetch/$s_!giKI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dcf9e29-c518-4bb9-b11e-e8fc024fd0cd_1080x371.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Motivation</strong></h4><p>LLMs are an incredibly powerful piece of technology. While it is well known that LLMs can <a href="https://www.learnwitharobot.com/p/cute-robot-poems-written-by-large">write poems</a> and <a href="https://openai.com/research/gpt-4">solve the BAR and SAT exams</a>, there is increasing <a href="https://aclanthology.org/2021.findings-emnlp.354/">evidence</a> suggesting that when leveraged with supervision from domain experts, LLMs can label datasets with comparable quality to skilled human annotators</p><h4><strong>What we did</strong></h4><p>We evaluated the performance of LLMs vs human annotators for labeling text datasets across a range of tasks (classification, question answering, etc) on three axes: quality, turnaround time, and cost. We tested techniques for boosting LLM accuracy such as few-shot prompting, chain-of-thought reasoning, and confidence estimation, and propose <a href="https://www.refuel.ai/blog-posts/llm-labeling-technical-report">a live benchmark</a> for text data annotation tasks, to which <a href="https://discord.gg/fweVnRx6CU">the community</a> can contribute over time.</p><h4><strong>Results</strong></h4><ul><li><p>State-of-the-art LLMs can label text datasets at the same or better quality compared to skilled human annotators, but ~20x faster and ~7x cheaper.</p></li><li><p>For achieving the highest quality labels, GPT-4 is the best choice among out-of-the-box LLMs (88.4% agreement with ground truth, compared to 86% for skilled human annotators). For achieving the best tradeoff between label quality and cost, GPT-3.5-turbo, PaLM-2 and open-source models like FLAN-T5-XXL are compelling.</p></li><li><p>Confidence-based thresholding can be a very effective way to mitigate the impact of hallucinations and ensure high label quality.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.refuel.ai/blog-posts/llm-labeling-technical-report" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5250!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34163410-514b-40c0-81c1-606f63f28ca3_885x551.png 424w, https://substackcdn.com/image/fetch/$s_!5250!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34163410-514b-40c0-81c1-606f63f28ca3_885x551.png 848w, https://substackcdn.com/image/fetch/$s_!5250!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34163410-514b-40c0-81c1-606f63f28ca3_885x551.png 1272w, https://substackcdn.com/image/fetch/$s_!5250!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34163410-514b-40c0-81c1-606f63f28ca3_885x551.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5250!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34163410-514b-40c0-81c1-606f63f28ca3_885x551.png" width="885" height="551" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/34163410-514b-40c0-81c1-606f63f28ca3_885x551.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:551,&quot;width&quot;:885,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.refuel.ai/blog-posts/llm-labeling-technical-report&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5250!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34163410-514b-40c0-81c1-606f63f28ca3_885x551.png 424w, https://substackcdn.com/image/fetch/$s_!5250!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34163410-514b-40c0-81c1-606f63f28ca3_885x551.png 848w, https://substackcdn.com/image/fetch/$s_!5250!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34163410-514b-40c0-81c1-606f63f28ca3_885x551.png 1272w, https://substackcdn.com/image/fetch/$s_!5250!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34163410-514b-40c0-81c1-606f63f28ca3_885x551.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Conclusion</strong></h4><p>We think that LLM-powered labeling is the future for a vast majority of data cleaning, annotation, and enrichment efforts, and will continue to track the performance of many LLMs across a diversity of tasks in this ongoing benchmark.&nbsp;</p><p>You can read this in tweet thread form here:&nbsp; <a href="https://twitter.com/nihit_desai/status/1669752203949793281?s=20">https://twitter.com/nihit_desai/status/1669752203949793281?s=20</a></p><h2><a href="https://github.com/refuel-ai/autolabel">Autolabel: Python library to label, clean, and enrich datasets with LLMs</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://github.com/refuel-ai/autolabel" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VNiV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d326b5-777c-4ef9-b646-cd73a8c8091e_1600x907.png 424w, https://substackcdn.com/image/fetch/$s_!VNiV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d326b5-777c-4ef9-b646-cd73a8c8091e_1600x907.png 848w, https://substackcdn.com/image/fetch/$s_!VNiV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d326b5-777c-4ef9-b646-cd73a8c8091e_1600x907.png 1272w, https://substackcdn.com/image/fetch/$s_!VNiV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d326b5-777c-4ef9-b646-cd73a8c8091e_1600x907.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VNiV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d326b5-777c-4ef9-b646-cd73a8c8091e_1600x907.png" width="1456" height="825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79d326b5-777c-4ef9-b646-cd73a8c8091e_1600x907.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:825,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://github.com/refuel-ai/autolabel&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VNiV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d326b5-777c-4ef9-b646-cd73a8c8091e_1600x907.png 424w, https://substackcdn.com/image/fetch/$s_!VNiV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d326b5-777c-4ef9-b646-cd73a8c8091e_1600x907.png 848w, https://substackcdn.com/image/fetch/$s_!VNiV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d326b5-777c-4ef9-b646-cd73a8c8091e_1600x907.png 1272w, https://substackcdn.com/image/fetch/$s_!VNiV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79d326b5-777c-4ef9-b646-cd73a8c8091e_1600x907.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We open-source <a href="https://github.com/refuel-ai/autolabel">Autolabel</a>, a Python library to label, clean, and enrich text datasets with any Large Language Model (LLM) of your choice. With a few lines of code, you&#8217;ll be labeling data at extremely high accuracy, but in a tiny fraction of the time compared to human annotation! Join us on <a href="https://discord.gg/uEdr8nrMGm">Discord</a> if you have any questions, or open an issue on <a href="https://github.com/refuel-ai/autolabel">Github</a> to report a bug. If this is interesting, feel free to give us a star! &#11088;</p><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> or email us at <a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a>.&nbsp;</p><p>If you like what we are doing, please tell your friends and colleagues to spread the word. &#8203;&#8203;&#10084;&#65039;</p>]]></content:encoded></item><item><title><![CDATA[Issue #30.5: Special Announcement - A new course on MLOps!]]></title><description><![CDATA[Welcome to the latest issue of the MLOps newsletter.]]></description><link>https://mlopsroundup.substack.com/p/issue-31-a-new-course-on-machine</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-31-a-new-course-on-machine</guid><dc:creator><![CDATA[Nihit Desai]]></dc:creator><pubDate>Fri, 18 Mar 2022 16:31:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mhBS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff37cfd-2cdb-427f-af4f-fc2bec8cd61a_1000x400.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mhBS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff37cfd-2cdb-427f-af4f-fc2bec8cd61a_1000x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mhBS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff37cfd-2cdb-427f-af4f-fc2bec8cd61a_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!mhBS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff37cfd-2cdb-427f-af4f-fc2bec8cd61a_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!mhBS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff37cfd-2cdb-427f-af4f-fc2bec8cd61a_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!mhBS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff37cfd-2cdb-427f-af4f-fc2bec8cd61a_1000x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mhBS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff37cfd-2cdb-427f-af4f-fc2bec8cd61a_1000x400.png" width="1000" height="400" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/5ff37cfd-2cdb-427f-af4f-fc2bec8cd61a_1000x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mhBS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff37cfd-2cdb-427f-af4f-fc2bec8cd61a_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!mhBS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff37cfd-2cdb-427f-af4f-fc2bec8cd61a_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!mhBS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff37cfd-2cdb-427f-af4f-fc2bec8cd61a_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!mhBS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ff37cfd-2cdb-427f-af4f-fc2bec8cd61a_1000x400.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Welcome to the latest issue of the MLOps newsletter. In this special (mini) edition of the newsletter we have some exciting news to share! </p><p><a href="https://corise.com/course/mlops">MLOps: From Models to Production</a>, a new course that @Nihit has been developing in partnership with the team at <a href="https://corise.com/">co:rise</a>, is almost ready to go live! This will be a 4 week course, with the first iteration starting July 11th (07/11/2022). More details below &#128071;</p><h2><a href="https://corise.com/course/mlops?utm_source=nihit">New Course | MLOps: From Models to Production</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Q28P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ddffa9-8a99-4863-8232-79bfec2034d1_1298x772.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Q28P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ddffa9-8a99-4863-8232-79bfec2034d1_1298x772.png 424w, https://substackcdn.com/image/fetch/$s_!Q28P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ddffa9-8a99-4863-8232-79bfec2034d1_1298x772.png 848w, https://substackcdn.com/image/fetch/$s_!Q28P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ddffa9-8a99-4863-8232-79bfec2034d1_1298x772.png 1272w, https://substackcdn.com/image/fetch/$s_!Q28P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ddffa9-8a99-4863-8232-79bfec2034d1_1298x772.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Q28P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ddffa9-8a99-4863-8232-79bfec2034d1_1298x772.png" width="1298" height="772" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/a2ddffa9-8a99-4863-8232-79bfec2034d1_1298x772.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:772,&quot;width&quot;:1298,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:150645,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Q28P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ddffa9-8a99-4863-8232-79bfec2034d1_1298x772.png 424w, https://substackcdn.com/image/fetch/$s_!Q28P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ddffa9-8a99-4863-8232-79bfec2034d1_1298x772.png 848w, https://substackcdn.com/image/fetch/$s_!Q28P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ddffa9-8a99-4863-8232-79bfec2034d1_1298x772.png 1272w, https://substackcdn.com/image/fetch/$s_!Q28P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ddffa9-8a99-4863-8232-79bfec2034d1_1298x772.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Summary</h4><p>Over the last decade, Machine Learning has become ubiquitous, powering platforms and applications across a variety of domains. And yet, we are likely only at the <a href="https://hai.stanford.edu/research/ai-index-2022">beginning</a> of the ML adoption journey. In order to realize the promise of Machine Learning, we believe it is important to combine the knowledge from Machine Learning research with skills and best practices around building effective real-world ML systems. With that in mind, we&#8217;d like to share a new course @Nihit has been developing in partnership with the team at <a href="https://corise.com/">co:rise</a>.</p><h4><strong>Who is this course for?</strong></h4><ul><li><p>Software engineers who want to build production systems that integrate ML</p></li><li><p>Data scientists who want to learn about the production ML lifecycle&nbsp;</p></li><li><p>Students/recent college grads who want hands-on experience building and shipping ML applications</p></li></ul><p>This course is meant to complement a foundation course in Machine Learning/Deep Learning and assumes familiarity with basic machine learning concepts.&nbsp;</p><h4><strong>Logistics</strong></h4><ul><li><p>This will be a <strong>4-week course</strong>, with the first iteration starting July 11th (07/11/2022). We expect to run the course a few times each year.&nbsp;</p></li><li><p>To enroll or learn more about the course, you can visit the <a href="https://corise.com/course/mlops?utm_source=nihit">course webpage</a>. You can use the code <strong>NIHIT10MLOPS </strong>for a discount.&nbsp;</p></li><li><p>Also, check out other courses in the <a href="https://corise.com/track/ml-foundations">ML track </a>that you might find interesting.</p></li></ul><h4><strong>Scholarships</strong></h4><p>We are also offering a limited number of scholarships! If you're interested to apply, or know someone who is, here is the <a href="https://docs.google.com/forms/d/e/1FAIpQLSclUf4Qyw6One_sY5Ntg5QloeqnR_SrrtNrUvf0pULND9UM9A/viewform">application link</a></p><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> or email us at <a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a>.&nbsp;</p><p>If you like what we are doing, please tell your friends and colleagues to spread the word. &#8203;&#8203;&#10084;&#65039; </p>]]></content:encoded></item><item><title><![CDATA[Issue #30: ML Platforms. On Deck Data Science. Explainability in Healthcare. Re:Invent. ML for Content Moderation.]]></title><description><![CDATA[Happy New Year and welcome to the 30th issue of the MLOps newsletter.]]></description><link>https://mlopsroundup.substack.com/p/issue-30-ml-platforms-ondeck-data</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-30-ml-platforms-ondeck-data</guid><dc:creator><![CDATA[Rishabh Bhargava]]></dc:creator><pubDate>Mon, 24 Jan 2022 19:45:09 GMT</pubDate><enclosure url="https://cdn.substack.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff372ae86-a567-466f-80ce-aa3bea0aaaad_941x461.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Happy New Year and welcome to the 30th issue of the MLOps newsletter. We have been on a bit of a hiatus from the newsletter (sorry that we can&#8217;t share the reasons just yet &#128516;), but will be back on a regular cadence now!&nbsp;</p><p>In this issue, we cover a collection of learnings from ML platforms at Netflix, Doordash, and Spotify; share a fantastic new opportunity for data scientists with the OnDeck community; discuss challenges in building explainable machine learning models for healthcare; link to a Twitter thread on building hybrid human/machine learning systems for content moderation and more. </p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><a href="https://towardsdatascience.com/lessons-on-ml-platforms-from-netflix-doordash-spotify-and-more-f455400115c7">Medium | Lessons on ML Platforms &#8212; from Netflix, DoorDash, Spotify, and more</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://towardsdatascience.com/lessons-on-ml-platforms-from-netflix-doordash-spotify-and-more-f455400115c7" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LNZ0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b814ad9-5f23-4ab8-aaa5-4dc5b88f05bd_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!LNZ0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b814ad9-5f23-4ab8-aaa5-4dc5b88f05bd_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!LNZ0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b814ad9-5f23-4ab8-aaa5-4dc5b88f05bd_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!LNZ0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b814ad9-5f23-4ab8-aaa5-4dc5b88f05bd_960x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LNZ0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b814ad9-5f23-4ab8-aaa5-4dc5b88f05bd_960x720.png" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7b814ad9-5f23-4ab8-aaa5-4dc5b88f05bd_960x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://towardsdatascience.com/lessons-on-ml-platforms-from-netflix-doordash-spotify-and-more-f455400115c7&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LNZ0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b814ad9-5f23-4ab8-aaa5-4dc5b88f05bd_960x720.png 424w, https://substackcdn.com/image/fetch/$s_!LNZ0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b814ad9-5f23-4ab8-aaa5-4dc5b88f05bd_960x720.png 848w, https://substackcdn.com/image/fetch/$s_!LNZ0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b814ad9-5f23-4ab8-aaa5-4dc5b88f05bd_960x720.png 1272w, https://substackcdn.com/image/fetch/$s_!LNZ0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b814ad9-5f23-4ab8-aaa5-4dc5b88f05bd_960x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a wonderful article by <a href="https://twitter.com/ErnestKLChan">Ernest Chan</a> where he analyzes the broad components of ML Platforms at large companies such as Netflix, Doordash, and Spotify.</p><h4><strong>Why are we talking about ML Platforms anyway?&nbsp;</strong></h4><blockquote><p>Your data scientists produce wonderful models, but they can only deliver value once the models are integrated into your production systems. How do you make it easy for the data scientists to repeatedly deliver value? What do you build, what do you buy, and what tools do you need to solve your organization&#8217;s problems specifically?</p></blockquote><p>The goal of an ML platform is to accelerate the output of data science teams. Let&#8217;s look at the common components of an ML platform (at least from a large tech company perspective).&nbsp;</p><h4><strong>Components of an ML Platform</strong></h4><ul><li><p>Feature Store</p></li><li><p>Workflow Orchestration</p></li><li><p>Model Registry</p></li><li><p>Model Serving</p></li><li><p>Model Quality Monitoring</p></li></ul><p>It turns out that most of these components have been built in-house so far. This is for many different reasons: these companies had to solve their problems before much ML tooling existed (and we are still very much in the early innings of MLOps), organizational reasons (how engineers at large companies are rewarded), etc.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://towardsdatascience.com/lessons-on-ml-platforms-from-netflix-doordash-spotify-and-more-f455400115c7" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MQZR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9cc325b-7070-43dd-9669-951ace971b5f_1400x895.png 424w, https://substackcdn.com/image/fetch/$s_!MQZR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9cc325b-7070-43dd-9669-951ace971b5f_1400x895.png 848w, https://substackcdn.com/image/fetch/$s_!MQZR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9cc325b-7070-43dd-9669-951ace971b5f_1400x895.png 1272w, https://substackcdn.com/image/fetch/$s_!MQZR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9cc325b-7070-43dd-9669-951ace971b5f_1400x895.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MQZR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9cc325b-7070-43dd-9669-951ace971b5f_1400x895.png" width="1400" height="895" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/e9cc325b-7070-43dd-9669-951ace971b5f_1400x895.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:895,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://towardsdatascience.com/lessons-on-ml-platforms-from-netflix-doordash-spotify-and-more-f455400115c7&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MQZR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9cc325b-7070-43dd-9669-951ace971b5f_1400x895.png 424w, https://substackcdn.com/image/fetch/$s_!MQZR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9cc325b-7070-43dd-9669-951ace971b5f_1400x895.png 848w, https://substackcdn.com/image/fetch/$s_!MQZR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9cc325b-7070-43dd-9669-951ace971b5f_1400x895.png 1272w, https://substackcdn.com/image/fetch/$s_!MQZR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9cc325b-7070-43dd-9669-951ace971b5f_1400x895.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>What does this mean?&nbsp;</strong></h4><p>Well, first of all, most companies are not one of these large companies, so your situation will likely be different.&nbsp;</p><p>The first thing that stands out is workflow orchestration component is the one where an out-of-the-box (and open-source) solution is most commonly used. This makes sense since workflow orchestration tools are part of a generic software/data toolkit, and are more mature. <a href="https://github.com/apache/airflow">Airflow</a> seems to be dominant here.&nbsp;</p><p>The Model Registry is the next component that has some established tools, such as <a href="https://github.com/mlflow/mlflow">MLflow</a>. However, Feature Stores, Model Serving, and Model quality monitoring seem to have been built in-house in pretty much every case. Aside from most tools being new, there are a few reasons for this: all these components have &#8220;stringent production requirements&#8221;, need to support diverse use cases within an organization, and can require tight integrations with the rest of the tech stack in the company.&nbsp;</p><p>If this is interesting to you, check out <a href="https://towardsdatascience.com/lessons-on-ml-platforms-from-netflix-doordash-spotify-and-more-f455400115c7">the full article</a> and the beautiful set of references compiled at the end of it.&nbsp;</p><h2><a href="https://beondeck.com/data-science?utm_source=mlopsroundup&amp;utm_medium=newsletter&amp;utm_campaign=odds1">Community | On Deck Data Science</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://beondeck.com/data-science?utm_source=mlopsroundup&amp;utm_medium=newsletter&amp;utm_campaign=odds1" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8-ix!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6dc1bdac-b1fd-43e3-adb3-d9dd1e8b057a_880x495.png 424w, https://substackcdn.com/image/fetch/$s_!8-ix!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6dc1bdac-b1fd-43e3-adb3-d9dd1e8b057a_880x495.png 848w, https://substackcdn.com/image/fetch/$s_!8-ix!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6dc1bdac-b1fd-43e3-adb3-d9dd1e8b057a_880x495.png 1272w, https://substackcdn.com/image/fetch/$s_!8-ix!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6dc1bdac-b1fd-43e3-adb3-d9dd1e8b057a_880x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8-ix!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6dc1bdac-b1fd-43e3-adb3-d9dd1e8b057a_880x495.png" width="880" height="495" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/6dc1bdac-b1fd-43e3-adb3-d9dd1e8b057a_880x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:880,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://beondeck.com/data-science?utm_source=mlopsroundup&amp;utm_medium=newsletter&amp;utm_campaign=odds1&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8-ix!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6dc1bdac-b1fd-43e3-adb3-d9dd1e8b057a_880x495.png 424w, https://substackcdn.com/image/fetch/$s_!8-ix!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6dc1bdac-b1fd-43e3-adb3-d9dd1e8b057a_880x495.png 848w, https://substackcdn.com/image/fetch/$s_!8-ix!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6dc1bdac-b1fd-43e3-adb3-d9dd1e8b057a_880x495.png 1272w, https://substackcdn.com/image/fetch/$s_!8-ix!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6dc1bdac-b1fd-43e3-adb3-d9dd1e8b057a_880x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We wanted to share an exciting opportunity with all of you. Both of us have been members of the <a href="https://beondeck.com/">On Deck</a> community for the past few months, and cannot recommend it highly enough. We have met some incredible people, and learned a lot from the community.</p><p>While we are part of the <a href="https://www.beondeck.com/deeptech">On Deck Deep Tech</a> cohort, they have a new cohort called <a href="https://beondeck.com/data-science?utm_source=mlopsroundup&amp;utm_medium=newsletter&amp;utm_campaign=odds1">On Deck Data Science</a>, and here is a blurb from their team. If you have any questions about our experience, drop us a note.&nbsp;</p><h4><strong>Enter On Deck Data Science (ODDS):</strong></h4><blockquote><p><em>ODDS is a continuous community for ambitious Data Science leaders who want to maximize their impact and accelerate their careers alongside a highly-curated network of peers.</em></p><p><em>The fellowship brings together experienced data science leaders who have delivered results for organizations and customers at the highest level.</em></p><p><em>Members get access to:</em></p><ul><li><p><em>Community: Develop meaningful relationships with peers and mentors. Curated 1:1 connections, mastermind sessions and mentorship matchmaking - the hard work is done for you.</em></p></li><li><p><em>Networking: ODDS is a side door to 10x your network of peers. You can find exciting job opportunities, investors to fund your next idea or your next star hire. You get access to the entire On Deck network (like an internal LinkedIn) instantly when you join.</em></p></li><li><p><em>Professional Development: Acquire specialized frameworks, knowledge and skills to add to your toolkit via live sessions and fireside chats with incredible guests and an extensive library of content.</em></p></li></ul><p><em>Applications close on Feb 13th, so don&#8217;t miss your chance to join some of the most accomplished data scientists in the business. <a href="https://beondeck.com/data-science?utm_source=mlopsroundup&amp;utm_medium=newsletter&amp;utm_campaign=odds1">Apply now</a>!</em></p></blockquote><h2><a href="https://www.sciencedirect.com/science/article/pii/S2589750021002089#!">Paper | The false hope of current approaches to explainable artificial intelligence in health care</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.sciencedirect.com/science/article/pii/S2589750021002089#!" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gCCO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff372ae86-a567-466f-80ce-aa3bea0aaaad_941x461.png 424w, https://substackcdn.com/image/fetch/$s_!gCCO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff372ae86-a567-466f-80ce-aa3bea0aaaad_941x461.png 848w, https://substackcdn.com/image/fetch/$s_!gCCO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff372ae86-a567-466f-80ce-aa3bea0aaaad_941x461.png 1272w, https://substackcdn.com/image/fetch/$s_!gCCO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff372ae86-a567-466f-80ce-aa3bea0aaaad_941x461.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gCCO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff372ae86-a567-466f-80ce-aa3bea0aaaad_941x461.png" width="941" height="461" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f372ae86-a567-466f-80ce-aa3bea0aaaad_941x461.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:461,&quot;width&quot;:941,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.sciencedirect.com/science/article/pii/S2589750021002089#!&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gCCO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff372ae86-a567-466f-80ce-aa3bea0aaaad_941x461.png 424w, https://substackcdn.com/image/fetch/$s_!gCCO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff372ae86-a567-466f-80ce-aa3bea0aaaad_941x461.png 848w, https://substackcdn.com/image/fetch/$s_!gCCO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff372ae86-a567-466f-80ce-aa3bea0aaaad_941x461.png 1272w, https://substackcdn.com/image/fetch/$s_!gCCO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff372ae86-a567-466f-80ce-aa3bea0aaaad_941x461.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We have covered explainability for ML multiple times (see <a href="https://mlopsroundup.substack.com/p/issue-16-eu-ai-regulations-data-centric">here</a> and <a href="https://mlopsroundup.substack.com/p/issue-8-toronto-ml-summit-gpt-2-ml">here</a>), but this was a thought-provoking paper that talks about some of the pitfalls of these techniques (especially in healthcare).&nbsp;</p><h4><strong>What are the authors saying?</strong></h4><blockquote><p>It has been argued that explainable AI will engender trust with the health-care workforce, provide transparency into the AI decision making process, and potentially mitigate various kinds of bias&#8230; we advocate for rigorous internal and external validation of AI models as a more direct means of achieving the goals often associated with explainability, and we caution against having explainability be a requirement for clinically deployed models.</p></blockquote><h4><strong>Current explainability approaches and their gaps</strong></h4><p>Explanations for decisions can fall into two categories &#8211; through models that are inherently explainable (simple models such as linear regression) or through post-hoc explainability techniques (saliency maps, LIME, SHAP, etc).&nbsp;</p><p>While inherently explainable models might seem appealing, there can be confounding variables in the mix, and when the number of variables grows, &#8220;information overload&#8221; can make explanations tricky.&nbsp;</p><p>Heat maps (or saliency maps) highlight how much of a region contributed to a decision in imaging use cases. However, as seen in a pneumonia diagnosis example, the authors claim that:</p><blockquote><p>&#8220;the hottest parts of the map contain both useful and non-useful information (from the perspective of a human expert), and simply localising the region does not reveal exactly what it was in that area that the model considered useful.&#8221;</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.sciencedirect.com/science/article/pii/S2589750021002089#!" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2HL8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0dca9bc4-f57f-4c02-830b-d7e0aba2c4f6_445x759.png 424w, https://substackcdn.com/image/fetch/$s_!2HL8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0dca9bc4-f57f-4c02-830b-d7e0aba2c4f6_445x759.png 848w, https://substackcdn.com/image/fetch/$s_!2HL8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0dca9bc4-f57f-4c02-830b-d7e0aba2c4f6_445x759.png 1272w, https://substackcdn.com/image/fetch/$s_!2HL8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0dca9bc4-f57f-4c02-830b-d7e0aba2c4f6_445x759.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2HL8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0dca9bc4-f57f-4c02-830b-d7e0aba2c4f6_445x759.png" width="445" height="759" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/0dca9bc4-f57f-4c02-830b-d7e0aba2c4f6_445x759.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:759,&quot;width&quot;:445,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.sciencedirect.com/science/article/pii/S2589750021002089#!&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2HL8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0dca9bc4-f57f-4c02-830b-d7e0aba2c4f6_445x759.png 424w, https://substackcdn.com/image/fetch/$s_!2HL8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0dca9bc4-f57f-4c02-830b-d7e0aba2c4f6_445x759.png 848w, https://substackcdn.com/image/fetch/$s_!2HL8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0dca9bc4-f57f-4c02-830b-d7e0aba2c4f6_445x759.png 1272w, https://substackcdn.com/image/fetch/$s_!2HL8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0dca9bc4-f57f-4c02-830b-d7e0aba2c4f6_445x759.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Why does this happen?</strong>&nbsp;</h4><blockquote><p>This interpretability gap of explainability methods relies on humans to decide what a given explanation might mean. Unfortunately, the human tendency is to ascribe a positive interpretation: we assume that the feature we would find important is the one that was used (this is an example of a famously harmful cognitive error called confirmation bias).</p></blockquote><p>Another issue is that explanations have no performance guarantees. Most tests &#8220;rely on heuristic measures&#8221; and qualitative measures rather than explicit scores. Since explanations are often a simplification of the original model, they are very likely a less accurate version of the trained (and hard to explain) model, which makes this process harder.&nbsp;</p><h4><strong>Final Thoughts</strong></h4><blockquote><p>Rather than seeing explainability techniques as producing valid, local explanations to justify the use of model predictions, it is more realistic to view these methods as global descriptions of how a model functions. If, for example, a clinical diagnostic model appears to perform well in a specific test set but the heat maps show that the model is consistently distracted by regions of the images that cannot logically inform the diagnosis, then this finding can indicate that the test set itself is flawed and that further forensic investigation is required.</p></blockquote><p><a href="https://www.sciencedirect.com/science/article/pii/S2589750021002089#!">Here</a> is the full paper if this was an interesting read!</p><h2><a href="https://aws.amazon.com/blogs/aws/top-announcements-of-aws-reinvent-2021/#Artificial_Intelligence">AWS News Blog | Top Announcements of AWS re:Invent 2021&nbsp;</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://aws.amazon.com/blogs/aws/top-announcements-of-aws-reinvent-2021/#Artificial_Intelligence" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KYV0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3a0e58a4-5e91-4a74-b36c-cac7134b35af_737x415.png 424w, https://substackcdn.com/image/fetch/$s_!KYV0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3a0e58a4-5e91-4a74-b36c-cac7134b35af_737x415.png 848w, https://substackcdn.com/image/fetch/$s_!KYV0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3a0e58a4-5e91-4a74-b36c-cac7134b35af_737x415.png 1272w, https://substackcdn.com/image/fetch/$s_!KYV0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3a0e58a4-5e91-4a74-b36c-cac7134b35af_737x415.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KYV0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3a0e58a4-5e91-4a74-b36c-cac7134b35af_737x415.png" width="737" height="415" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/3a0e58a4-5e91-4a74-b36c-cac7134b35af_737x415.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:415,&quot;width&quot;:737,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://aws.amazon.com/blogs/aws/top-announcements-of-aws-reinvent-2021/#Artificial_Intelligence&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KYV0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3a0e58a4-5e91-4a74-b36c-cac7134b35af_737x415.png 424w, https://substackcdn.com/image/fetch/$s_!KYV0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3a0e58a4-5e91-4a74-b36c-cac7134b35af_737x415.png 848w, https://substackcdn.com/image/fetch/$s_!KYV0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3a0e58a4-5e91-4a74-b36c-cac7134b35af_737x415.png 1272w, https://substackcdn.com/image/fetch/$s_!KYV0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3a0e58a4-5e91-4a74-b36c-cac7134b35af_737x415.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://aws.amazon.com/blogs/aws/top-announcements-of-aws-reinvent-2021">AWS re:Invent 2021</a> had a ton of interesting announcements from AWS, but they were light on AI/ML improvements this year around (compare this with <a href="https://mlopsroundup.substack.com/p/issue-7-trustworthy-ai-in-the-govt">our re:Invent coverage from last year</a>).&nbsp;</p><h4><strong>MLOps Highlights:</strong></h4><ul><li><p><a href="https://aws.amazon.com/blogs/aws/now-in-preview-amazon-sagemaker-studio-lab-a-free-service-to-learn-and-experiment-with-ml/">Amazon Sagemaker Studio Lab</a>: This is a free service that gives people access to a working Jupyter instance for any experimentation - appears very similar to Google Colab.&nbsp;</p></li><li><p><a href="https://aws.amazon.com/blogs/aws/announcing-amazon-sagemaker-inference-recommender/">Amazon Sagemaker Inference Recommender</a>: This service recommends what instance types you should be running your inference workloads on.&nbsp;</p></li><li><p><a href="https://aws.amazon.com/blogs/aws/announcing-amazon-sagemaker-ground-truth-plus/">Amazon SageMaker Ground Truth Plus</a>: This service (currently in pilot) appears to be a higher quality labeling service compared to Mechanical Turk, and seems to be a competitor to the Scale AI&#8217;s of the world.&nbsp;</p></li><li><p><a href="https://aws.amazon.com/blogs/aws/new-introducing-sagemaker-training-compiler/">Amazon Sagemaker Training Compiler</a>: This service optimizes deep learning training code to run faster on Sagemaker GPU instances (if you do use this, let us know what your experience is like)</p></li></ul><p>There are a few more, but we&#8217;ll leave it up to you to read about them <a href="https://aws.amazon.com/blogs/aws/top-announcements-of-aws-reinvent-2021/#Artificial_Intelligence">here</a>. We do worry about the length of Sagemaker service names at this point - we wonder if we will be covering Amazon Sagemaker Deep Training Speed-Booster Plus next year&#8230;</p><h2><a href="https://www.washingtonpost.com/transportation/2021/11/09/drunk-driving-technology-infrastructure/">Washington Post | New technology mandate in infrastructure bill could significantly cut drunken driving deaths</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.washingtonpost.com/transportation/2021/11/09/drunk-driving-technology-infrastructure/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2dgA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F49fcd7bf-8266-4baa-842c-756cd5b045f6_916x569.png 424w, https://substackcdn.com/image/fetch/$s_!2dgA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F49fcd7bf-8266-4baa-842c-756cd5b045f6_916x569.png 848w, https://substackcdn.com/image/fetch/$s_!2dgA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F49fcd7bf-8266-4baa-842c-756cd5b045f6_916x569.png 1272w, https://substackcdn.com/image/fetch/$s_!2dgA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F49fcd7bf-8266-4baa-842c-756cd5b045f6_916x569.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2dgA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F49fcd7bf-8266-4baa-842c-756cd5b045f6_916x569.png" width="916" height="569" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/49fcd7bf-8266-4baa-842c-756cd5b045f6_916x569.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:569,&quot;width&quot;:916,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.washingtonpost.com/transportation/2021/11/09/drunk-driving-technology-infrastructure/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2dgA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F49fcd7bf-8266-4baa-842c-756cd5b045f6_916x569.png 424w, https://substackcdn.com/image/fetch/$s_!2dgA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F49fcd7bf-8266-4baa-842c-756cd5b045f6_916x569.png 848w, https://substackcdn.com/image/fetch/$s_!2dgA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F49fcd7bf-8266-4baa-842c-756cd5b045f6_916x569.png 1272w, https://substackcdn.com/image/fetch/$s_!2dgA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F49fcd7bf-8266-4baa-842c-756cd5b045f6_916x569.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We try to track interesting news from the policy world where ML may have a part to play, and this certainly caught our eye.&nbsp;</p><h4><strong>What is this and why is it important?&nbsp;</strong></h4><p>The recent infrastructure bill from the US Congress has a mandate that would require new cars to have technology that would stop drunk people from driving. As the article reports:</p><blockquote><p>More than 10,000 people died in crashes involving an alcohol-impaired driver in 2019, according to the National Highway Traffic Safety Administration&#8230;A <a href="https://www.iihs.org/news/detail/alcohol-detection-systems-could-prevent-more-than-a-fourth-of-u-s-road-fatalities">recent study</a> by the Insurance Institute for Highway Safety concluded the technology could reduce deaths by 9,400 people a year if widely deployed.</p></blockquote><h4><strong>How would it work?&nbsp;</strong></h4><p>The technology involved in preventing drunk driving isn&#8217;t finalized yet (and probably won&#8217;t be for a few years), but one of the ideas being floated is to:</p><blockquote><p>&#8220;rely on cameras that monitor drivers for signs they are impaired, building on systems that automakers are using to ensure people relying on driver assistance technologies don&#8217;t lose concentration.&#8221;</p></blockquote><p>Whether it plays out this way or not, this definitely seems like an ML application that has the potential to save a lot of lives.&nbsp;</p><h2><a href="https://twitter.com/nihit_desai/status/1483171836107976705">Twitter | Lessons from ML Systems for Content Moderation</a></h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/nihit_desai/status/1483171836107976705&quot;,&quot;full_text&quot;:&quot;1/ Content Moderation: Reducing harm to the user community and the platform from illegal/undesirable content (typically with hybrid human + <span class=\&quot;tweet-fake-link\&quot;>#MachineLearning</span> systems). \n\n3 learnings from having worked on it at <span class=\&quot;tweet-fake-link\&quot;>@Facebook</span> &amp;amp; talking to engineers who've worked on it elsewhere:&quot;,&quot;username&quot;:&quot;nihit_desai&quot;,&quot;name&quot;:&quot;Nihit Desai&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Mon Jan 17 20:18:16 +0000 2022&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:3,&quot;like_count&quot;:12,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>&nbsp;<strong>Nihit</strong>: I worked on systems for content moderation at Facebook. In a recent thread, I shared some learnings &amp; observations around what makes this a challenging problem.&nbsp;</p><p>Detecting &amp; taking enforcement action on illegal/undesirable content is an important problem for most online platforms as they scale. This is done typically with human-in-the-loop Machine Learning systems. The content moderation domain presents some unique challenges when building machine learning models - bootstrapping labels, subjective annotation guidelines that can lead to label noise, adversarial drift, and the need for adaptive enforcement. If some of these sound interesting or relevant, definitely check out the thread!&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mk3a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5315cd58-ad47-4cea-95b0-56f93f9fc61a_1600x669.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mk3a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5315cd58-ad47-4cea-95b0-56f93f9fc61a_1600x669.png 424w, https://substackcdn.com/image/fetch/$s_!mk3a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5315cd58-ad47-4cea-95b0-56f93f9fc61a_1600x669.png 848w, https://substackcdn.com/image/fetch/$s_!mk3a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5315cd58-ad47-4cea-95b0-56f93f9fc61a_1600x669.png 1272w, https://substackcdn.com/image/fetch/$s_!mk3a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5315cd58-ad47-4cea-95b0-56f93f9fc61a_1600x669.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mk3a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5315cd58-ad47-4cea-95b0-56f93f9fc61a_1600x669.png" width="1456" height="609" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/5315cd58-ad47-4cea-95b0-56f93f9fc61a_1600x669.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:609,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mk3a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5315cd58-ad47-4cea-95b0-56f93f9fc61a_1600x669.png 424w, https://substackcdn.com/image/fetch/$s_!mk3a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5315cd58-ad47-4cea-95b0-56f93f9fc61a_1600x669.png 848w, https://substackcdn.com/image/fetch/$s_!mk3a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5315cd58-ad47-4cea-95b0-56f93f9fc61a_1600x669.png 1272w, https://substackcdn.com/image/fetch/$s_!mk3a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5315cd58-ad47-4cea-95b0-56f93f9fc61a_1600x669.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> or email us at <a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a>.&nbsp;</p><p>If you like what we are doing, please tell your friends and colleagues to spread the word. &#8203;&#8203;&#10084;&#65039;</p>]]></content:encoded></item><item><title><![CDATA[Issue #29: State of AI. Kaggle ML Survey. ML Deployment at Reddit. Inferentia. ]]></title><description><![CDATA[Welcome to the 29th issue of the MLOps newsletter.]]></description><link>https://mlopsroundup.substack.com/p/issue-29-state-of-ai-kaggle-ml-survey</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-29-state-of-ai-kaggle-ml-survey</guid><dc:creator><![CDATA[Rishabh Bhargava]]></dc:creator><pubDate>Wed, 27 Oct 2021 17:05:54 GMT</pubDate><enclosure url="https://cdn.substack.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdf77a61-4c75-4d28-9f21-cd5ed6657d3d_1314x466.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Zmxp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9345b7f6-ad6a-4408-b9d1-b6b6818d79f5_1000x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Zmxp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9345b7f6-ad6a-4408-b9d1-b6b6818d79f5_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!Zmxp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9345b7f6-ad6a-4408-b9d1-b6b6818d79f5_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!Zmxp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9345b7f6-ad6a-4408-b9d1-b6b6818d79f5_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!Zmxp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9345b7f6-ad6a-4408-b9d1-b6b6818d79f5_1000x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Zmxp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9345b7f6-ad6a-4408-b9d1-b6b6818d79f5_1000x400.png" width="1000" height="400" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/9345b7f6-ad6a-4408-b9d1-b6b6818d79f5_1000x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Zmxp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9345b7f6-ad6a-4408-b9d1-b6b6818d79f5_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!Zmxp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9345b7f6-ad6a-4408-b9d1-b6b6818d79f5_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!Zmxp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9345b7f6-ad6a-4408-b9d1-b6b6818d79f5_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!Zmxp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9345b7f6-ad6a-4408-b9d1-b6b6818d79f5_1000x400.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Welcome to the 29th issue of the MLOps newsletter. This one is going to be fairly image-rich - hope you enjoy it! &#128444;&#65039;</p><p>In this issue, we briefly cover the latest State of AI Report, look into the key trends from Kaggle&#8217;s Data Science and ML survey, explore Reddit&#8217;s model deployment architecture, deep dive into a paper on augmenting human annotations in microscopy, and much more.</p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><a href="https://www.stateof.ai/">State of AI Report 2021</a></h2><p>We covered the State of AI Report 2020 last year <a href="https://mlopsroundup.substack.com/p/issue-3-state-of-ai-behavioral-testing-ml-models-dynamic-benchmarks-data-versioning-madewithml-283540">here</a> and the authors are back with their latest analysis of the most interesting developments in AI. It&#8217;s a whopping 188 slides and ~5 hours of reading and digesting, but probably worth the effort. While it&#8217;s broken up into sections on research, talent, industry, politics, and predictions, we will only focus on the slides on AI in industry.&nbsp;</p><p>It&#8217;s nice to see that the lessons from ML in production have made it back into the world of research -- recently, one of the topics we have covered a lot is the move towards <a href="https://mlopsroundup.substack.com/p/issue-16-eu-ai-regulations-data-centric">data-centric AI</a>.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.stateof.ai/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T1LW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F96d9d37a-9504-4627-b08e-4ed8406cfea4_1600x908.png 424w, https://substackcdn.com/image/fetch/$s_!T1LW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F96d9d37a-9504-4627-b08e-4ed8406cfea4_1600x908.png 848w, https://substackcdn.com/image/fetch/$s_!T1LW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F96d9d37a-9504-4627-b08e-4ed8406cfea4_1600x908.png 1272w, https://substackcdn.com/image/fetch/$s_!T1LW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F96d9d37a-9504-4627-b08e-4ed8406cfea4_1600x908.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T1LW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F96d9d37a-9504-4627-b08e-4ed8406cfea4_1600x908.png" width="1456" height="826" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/96d9d37a-9504-4627-b08e-4ed8406cfea4_1600x908.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:826,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.stateof.ai/&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!T1LW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F96d9d37a-9504-4627-b08e-4ed8406cfea4_1600x908.png 424w, https://substackcdn.com/image/fetch/$s_!T1LW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F96d9d37a-9504-4627-b08e-4ed8406cfea4_1600x908.png 848w, https://substackcdn.com/image/fetch/$s_!T1LW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F96d9d37a-9504-4627-b08e-4ed8406cfea4_1600x908.png 1272w, https://substackcdn.com/image/fetch/$s_!T1LW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F96d9d37a-9504-4627-b08e-4ed8406cfea4_1600x908.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Similarly, changes in distribution between training and prediction-time is a common topic in industry, and it was interesting to see the <a href="https://wilds.stanford.edu/">WILDS benchmark</a> featured here.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.stateof.ai/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lHY3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F314d131d-3719-4cd5-af2c-a5d653bf5a58_1600x903.png 424w, https://substackcdn.com/image/fetch/$s_!lHY3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F314d131d-3719-4cd5-af2c-a5d653bf5a58_1600x903.png 848w, https://substackcdn.com/image/fetch/$s_!lHY3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F314d131d-3719-4cd5-af2c-a5d653bf5a58_1600x903.png 1272w, https://substackcdn.com/image/fetch/$s_!lHY3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F314d131d-3719-4cd5-af2c-a5d653bf5a58_1600x903.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lHY3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F314d131d-3719-4cd5-af2c-a5d653bf5a58_1600x903.png" width="1456" height="822" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/314d131d-3719-4cd5-af2c-a5d653bf5a58_1600x903.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:822,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.stateof.ai/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lHY3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F314d131d-3719-4cd5-af2c-a5d653bf5a58_1600x903.png 424w, https://substackcdn.com/image/fetch/$s_!lHY3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F314d131d-3719-4cd5-af2c-a5d653bf5a58_1600x903.png 848w, https://substackcdn.com/image/fetch/$s_!lHY3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F314d131d-3719-4cd5-af2c-a5d653bf5a58_1600x903.png 1272w, https://substackcdn.com/image/fetch/$s_!lHY3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F314d131d-3719-4cd5-af2c-a5d653bf5a58_1600x903.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A bunch of other topics that we have covered before made an appearance-- <a href="https://mlopsroundup.substack.com/p/issue-3-state-of-ai-behavioral-testing-ml-models-dynamic-benchmarks-data-versioning-madewithml-283540">dynamic benchmarking</a> (and Dynabench in particular) to quickly improve models, <a href="https://mlopsroundup.substack.com/p/issue-7-trustworthy-ai-in-the-govt">underspecification</a> when the same model with different random seeds behave differently, and pervasive &#8220;bad data&#8221; issues (highlighted in the research around <a href="https://mlopsroundup.substack.com/p/issue-20-ai-playbook-curating-data">ML for Covid</a>).&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.stateof.ai/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E2Mm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab63922-e0fe-4c23-8566-6675ad67a16e_1600x902.png 424w, https://substackcdn.com/image/fetch/$s_!E2Mm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab63922-e0fe-4c23-8566-6675ad67a16e_1600x902.png 848w, https://substackcdn.com/image/fetch/$s_!E2Mm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab63922-e0fe-4c23-8566-6675ad67a16e_1600x902.png 1272w, https://substackcdn.com/image/fetch/$s_!E2Mm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab63922-e0fe-4c23-8566-6675ad67a16e_1600x902.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E2Mm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab63922-e0fe-4c23-8566-6675ad67a16e_1600x902.png" width="1456" height="821" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/8ab63922-e0fe-4c23-8566-6675ad67a16e_1600x902.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:821,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.stateof.ai/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!E2Mm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab63922-e0fe-4c23-8566-6675ad67a16e_1600x902.png 424w, https://substackcdn.com/image/fetch/$s_!E2Mm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab63922-e0fe-4c23-8566-6675ad67a16e_1600x902.png 848w, https://substackcdn.com/image/fetch/$s_!E2Mm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab63922-e0fe-4c23-8566-6675ad67a16e_1600x902.png 1272w, https://substackcdn.com/image/fetch/$s_!E2Mm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8ab63922-e0fe-4c23-8566-6675ad67a16e_1600x902.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We loved the stories of applied ML; from <a href="https://www.ocadogroup.com/">Ocado</a>&#8217;s deep learning being used for 98% of stock replenishment decisions for online grocers to <a href="https://www.viz.ai/">Viz.ai&#8217;s</a> stroke detection software helping 1 patient every 47 seconds in the US, to reinforcement learning used at the Greek border (another story we covered <a href="https://mlopsroundup.substack.com/p/issue-28-mad-landscape-covid-19-border">here</a>) for Covid testing, to computer vision aiding in disaster relief, transformer models being used to accurately forecast electricity demand and so much more!</p><p>There is a lot in here that we can&#8217;t fully do justice to -- check out the slides or the Twitter thread from one of the authors:</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/nathanbenaich/status/1447805094070792193?s=20&quot;,&quot;full_text&quot;:&quot;The <span class=\&quot;tweet-fake-link\&quot;>@stateofaireport</span> 2021 is live! \n\nEst. 2018, <span class=\&quot;tweet-fake-link\&quot;>@soundboy</span> and I compile the most important work in AI research, industry, talent, and politics to inform conversation about the <span class=\&quot;tweet-fake-link\&quot;>#stateofai</span>. Our report is open-access to all. \n\nHere's a director's cut &#129525;:\n<a class=\&quot;tweet-url\&quot; href=\&quot;https://www.stateof.ai/\&quot;>stateof.ai</a>&quot;,&quot;username&quot;:&quot;nathanbenaich&quot;,&quot;name&quot;:&quot;Nathan Benaich&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Tue Oct 12 06:03:28 +0000 2021&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:247,&quot;like_count&quot;:583,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{&quot;url&quot;:&quot;https://www.stateof.ai/&quot;,&quot;image&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/bdf77a61-4c75-4d28-9f21-cd5ed6657d3d_1314x466.png&quot;,&quot;title&quot;:&quot;State of AI Report 2021&quot;,&quot;description&quot;:&quot;A Report on artificial intelligence research, talent, politics, industry, startups, and China.&quot;,&quot;domain&quot;:&quot;stateof.ai&quot;},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><h2><a href="https://www.kaggle.com/kaggle-survey-2021">Kaggle | 2021 Data Science and Machine Learning Survey</a></h2><p>Kaggle recently shared results from their annual survey of data scientists and machine learning engineers. The survey focuses on trends in the data science landscape in industry. It is quite insightful and we recommend reading it in its entirety. A few key takeaways for us:</p><ul><li><p><strong>Continuing democratization of Data Science: </strong>Advanced degrees (Masters or higher) are still the norm (&gt;60% of respondents) but going into data science after a bachelor's degree is becoming increasingly more common (~35% of respondents vs only 20% in 2018).&nbsp; Only 12.2% of survey respondents reside in the US and 24.4% in India, followed by a long tail.</p></li><li><p><strong>Online learning continues to be popular: </strong>Given the rapidly progressing landscape, a large number of data scientists use one or more online learning resources to stay up to date. Coursera (57.8%) and Kaggle (39%) remain the most popular online learning destinations.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.kaggle.com/kaggle-survey-2021" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F9H_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba02833-e22b-4d15-af31-767ccf03b92c_996x582.png 424w, https://substackcdn.com/image/fetch/$s_!F9H_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba02833-e22b-4d15-af31-767ccf03b92c_996x582.png 848w, https://substackcdn.com/image/fetch/$s_!F9H_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba02833-e22b-4d15-af31-767ccf03b92c_996x582.png 1272w, https://substackcdn.com/image/fetch/$s_!F9H_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba02833-e22b-4d15-af31-767ccf03b92c_996x582.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F9H_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba02833-e22b-4d15-af31-767ccf03b92c_996x582.png" width="598" height="349.43373493975906" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/5ba02833-e22b-4d15-af31-767ccf03b92c_996x582.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:582,&quot;width&quot;:996,&quot;resizeWidth&quot;:598,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.kaggle.com/kaggle-survey-2021&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!F9H_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba02833-e22b-4d15-af31-767ccf03b92c_996x582.png 424w, https://substackcdn.com/image/fetch/$s_!F9H_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba02833-e22b-4d15-af31-767ccf03b92c_996x582.png 848w, https://substackcdn.com/image/fetch/$s_!F9H_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba02833-e22b-4d15-af31-767ccf03b92c_996x582.png 1272w, https://substackcdn.com/image/fetch/$s_!F9H_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba02833-e22b-4d15-af31-767ccf03b92c_996x582.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Algorithms and Model architectures</strong>: Strong year-over-year growth in the use of large complex models such as transformers, but linear models (linear regression, logistic regression) and decision trees are still among the most widely used and deployed ML models.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.kaggle.com/kaggle-survey-2021" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AO_K!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F267972f2-3a9d-4937-b535-9e5267c70c43_1004x594.png 424w, https://substackcdn.com/image/fetch/$s_!AO_K!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F267972f2-3a9d-4937-b535-9e5267c70c43_1004x594.png 848w, https://substackcdn.com/image/fetch/$s_!AO_K!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F267972f2-3a9d-4937-b535-9e5267c70c43_1004x594.png 1272w, https://substackcdn.com/image/fetch/$s_!AO_K!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F267972f2-3a9d-4937-b535-9e5267c70c43_1004x594.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AO_K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F267972f2-3a9d-4937-b535-9e5267c70c43_1004x594.png" width="646" height="382.195219123506" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/267972f2-3a9d-4937-b535-9e5267c70c43_1004x594.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:594,&quot;width&quot;:1004,&quot;resizeWidth&quot;:646,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.kaggle.com/kaggle-survey-2021&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AO_K!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F267972f2-3a9d-4937-b535-9e5267c70c43_1004x594.png 424w, https://substackcdn.com/image/fetch/$s_!AO_K!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F267972f2-3a9d-4937-b535-9e5267c70c43_1004x594.png 848w, https://substackcdn.com/image/fetch/$s_!AO_K!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F267972f2-3a9d-4937-b535-9e5267c70c43_1004x594.png 1272w, https://substackcdn.com/image/fetch/$s_!AO_K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F267972f2-3a9d-4937-b535-9e5267c70c43_1004x594.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Machine Learning Frameworks</strong>: It&#8217;s all Python. Scikit-learn is still the most widely used framework for building and deploying ML models (&gt;80% of respondents). Pytorch is continuing to grow in popularity (&gt;33% of respondents).&nbsp;</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.kaggle.com/kaggle-survey-2021" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B7JD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc7d8a90-da59-4bdc-b044-8126f0ce72da_1002x602.png 424w, https://substackcdn.com/image/fetch/$s_!B7JD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc7d8a90-da59-4bdc-b044-8126f0ce72da_1002x602.png 848w, https://substackcdn.com/image/fetch/$s_!B7JD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc7d8a90-da59-4bdc-b044-8126f0ce72da_1002x602.png 1272w, https://substackcdn.com/image/fetch/$s_!B7JD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc7d8a90-da59-4bdc-b044-8126f0ce72da_1002x602.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B7JD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc7d8a90-da59-4bdc-b044-8126f0ce72da_1002x602.png" width="632" height="379.7045908183633" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/dc7d8a90-da59-4bdc-b044-8126f0ce72da_1002x602.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:602,&quot;width&quot;:1002,&quot;resizeWidth&quot;:632,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.kaggle.com/kaggle-survey-2021&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B7JD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc7d8a90-da59-4bdc-b044-8126f0ce72da_1002x602.png 424w, https://substackcdn.com/image/fetch/$s_!B7JD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc7d8a90-da59-4bdc-b044-8126f0ce72da_1002x602.png 848w, https://substackcdn.com/image/fetch/$s_!B7JD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc7d8a90-da59-4bdc-b044-8126f0ce72da_1002x602.png 1272w, https://substackcdn.com/image/fetch/$s_!B7JD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc7d8a90-da59-4bdc-b044-8126f0ce72da_1002x602.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>ML experimentation tools</strong>: Repeatable and well-documented experimentation tracking has received a lot of attention in the ML community. But according to the survey, &gt;57% of respondents do not use any ML experimentation tools. This shows that we are in the early innings of ML tooling ecosystem maturity.</p></li><li><p><strong>AutoML usage growing steadily: </strong>AutoML is becoming an increasingly common part of the ML development lifecycle. Google Cloud AutoML is the most widely used AutoML framework at 23.4% of respondents.</p></li></ul><h2><a href="https://www.reddit.com/r/RedditEng/comments/q14tsw/evolving_reddits_ml_model_deployment_and_serving/">Reddit Engineering Blog | Evolving Reddit&#8217;s ML Model Deployment and Serving Architecture</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.reddit.com/r/RedditEng/comments/q14tsw/evolving_reddits_ml_model_deployment_and_serving/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F4o0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa467e1a1-9a73-47ae-b5d2-755f2f02a4fc_512x350.png 424w, https://substackcdn.com/image/fetch/$s_!F4o0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa467e1a1-9a73-47ae-b5d2-755f2f02a4fc_512x350.png 848w, https://substackcdn.com/image/fetch/$s_!F4o0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa467e1a1-9a73-47ae-b5d2-755f2f02a4fc_512x350.png 1272w, https://substackcdn.com/image/fetch/$s_!F4o0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa467e1a1-9a73-47ae-b5d2-755f2f02a4fc_512x350.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F4o0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa467e1a1-9a73-47ae-b5d2-755f2f02a4fc_512x350.png" width="512" height="350" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/a467e1a1-9a73-47ae-b5d2-755f2f02a4fc_512x350.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:350,&quot;width&quot;:512,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.reddit.com/r/RedditEng/comments/q14tsw/evolving_reddits_ml_model_deployment_and_serving/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!F4o0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa467e1a1-9a73-47ae-b5d2-755f2f02a4fc_512x350.png 424w, https://substackcdn.com/image/fetch/$s_!F4o0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa467e1a1-9a73-47ae-b5d2-755f2f02a4fc_512x350.png 848w, https://substackcdn.com/image/fetch/$s_!F4o0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa467e1a1-9a73-47ae-b5d2-755f2f02a4fc_512x350.png 1272w, https://substackcdn.com/image/fetch/$s_!F4o0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa467e1a1-9a73-47ae-b5d2-755f2f02a4fc_512x350.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Problem</strong></h4><p>Machine Learning powers a lot of the product experiences on Reddit - from the personalization of content feed for users, to optimization of push notifications, content moderation, and more. ML systems at Reddit serve thousands of inference requests every second. The engineering team at Reddit undertook a redesign of the production ML system to keep up with usage growth and help with scalability, and this article shares the goals and details of this effort.</p><h4><strong>Legacy ML Stack</strong></h4><p>Reddit&#8217;s legacy ML stack was based on <a href="https://github.com/reddit/baseplate.py">baseplate.py</a> (Reddit&#8217;s python web services framework): Machine learning models were deployed as model classes inside of their application service called Minsky. When launching a new model, a machine learning engineer would typically end up touching the application code to download the new model upon application start, load it into memory and implement the feature transformation and inference logic for serving inference traffic.&nbsp;</p><p>As you might have noticed, this process has several limitations that are highlighted in the article. One that especially stands out is that there is a tight coupling across models: if a single model encounters an exception the entire service instance will crash. Moreover, this coupling between the model inference and application layer means limited ability to monitor online model characteristics.</p><h4><strong>Gazette Inference Service</strong></h4><p>The team at Reddit solved some of the limitations of previous architecture with the Gazette Inference Service. This is a dedicated service for online model inference with a single endpoint: Predict. The request simply needs to specify the model, version, and how to fetch features that go into the model. The service is dockerized and deployed with Kubernetes. While it currently supports Tensorflow models, they are working on expanding support to other frameworks.&nbsp;</p><p>The article highlights how Gazette overcomes the limitations of the previous setup, especially scalability and isolation between different models. In general, this progression of locally coupled models in the application layer &#8594; dedicated inference service sounds quite familiar. We look forward to future learnings and improvements shared by the team as they build out Gazette.</p><h2><a href="https://www.nature.com/articles/s41746-021-00520-6.pdf">Paper | Biological data annotation via a human-augmenting AI-based labeling system</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.nature.com/articles/s41746-021-00520-6.pdf" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!10K5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0e00e524-70e1-4e6f-9cf8-97162e3900dc_854x676.png 424w, https://substackcdn.com/image/fetch/$s_!10K5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0e00e524-70e1-4e6f-9cf8-97162e3900dc_854x676.png 848w, https://substackcdn.com/image/fetch/$s_!10K5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0e00e524-70e1-4e6f-9cf8-97162e3900dc_854x676.png 1272w, https://substackcdn.com/image/fetch/$s_!10K5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0e00e524-70e1-4e6f-9cf8-97162e3900dc_854x676.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!10K5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0e00e524-70e1-4e6f-9cf8-97162e3900dc_854x676.png" width="854" height="676" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/0e00e524-70e1-4e6f-9cf8-97162e3900dc_854x676.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:676,&quot;width&quot;:854,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.nature.com/articles/s41746-021-00520-6.pdf&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!10K5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0e00e524-70e1-4e6f-9cf8-97162e3900dc_854x676.png 424w, https://substackcdn.com/image/fetch/$s_!10K5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0e00e524-70e1-4e6f-9cf8-97162e3900dc_854x676.png 848w, https://substackcdn.com/image/fetch/$s_!10K5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0e00e524-70e1-4e6f-9cf8-97162e3900dc_854x676.png 1272w, https://substackcdn.com/image/fetch/$s_!10K5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0e00e524-70e1-4e6f-9cf8-97162e3900dc_854x676.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is an interesting paper that showcases some of the challenges in acquiring high-quality annotations in domains where large datasets can be generated easily, but the right tools for annotating don&#8217;t currently exist.&nbsp;</p><h4><strong>Background</strong></h4><blockquote><p>Supervised learning&#8212;in which computational models are trained using data points (e.g., histopathology image; raw microscopy image) and data annotations (e.g., &#8220;cancerous&#8221; vs &#8220;benign&#8221;; stained microscopy image)&#8212;have been central to the success of CV in biology. Biologists have the distinct advantage of being able to generate massive amounts of data&#8212;a single microscopy image can yield a gigabyte of visual data for algorithms to learn from. A disadvantage, however, is the difficulty and cost of obtaining complete annotations for datasets.</p></blockquote><p>While practitioners use computational techniques for augmenting data (rotations, distortions, changes in color), these are often a poor substitute for human-quality annotations. This led to the authors presenting a human-augmenting AI-based labeling system, or HALS.&nbsp;</p><h4><strong>What is it?</strong></h4><p>HAL provides a data annotation interface with three deep learning models (segmentation model, classifier model, and an active learner) that work in tandem to:</p><blockquote><ol><li><p>learn the labels provided by an annotator</p></li><li><p>provide recommendations to that annotator designed to increase their speed, and</p></li><li><p>determine the next best data to label to increase the overall quality of annotations while minimizing total labeling burden.</p></li></ol></blockquote><p>Every microscopy image (these are very large images) is first pre-processed by running it through the segmentation model, and then as human annotators start providing labels for small regions, the classifier starts to learn a model for the regions being labeled. After a small number of labels have been acquired for each class, the classifier starts to suggest labels for regions that annotators haven&#8217;t seen yet. The annotator can choose to accept or reject such proposed labels. Simultaneously, the active learner breaks the entire image into square patches, converts them into feature vectors, and recommended new regions for the annotator to look at.&nbsp;</p><p>If you&#8217;re interested in further details on the specific datasets, model pretraining, active learning algorithm used, etc, we recommend going through the paper.&nbsp;</p><h4><strong>Results</strong></h4><blockquote><p>Using four highly repetitive binary use-cases across two stain types, and working with expert pathologist annotators, we demonstrate a 90.6% average labeling workload reduction and a 4.34% average improvement in labeling effectiveness.</p></blockquote><p>&#8220;Workload reduction&#8221; has an interesting definition -- the fraction of AI-generated labels that were accepted immediately by the annotators, and it would be interesting to explore the actual time saved (something the authors do mention in the paper as well).&nbsp;</p><p>&#8220;Labeling effectiveness&#8221; was measured by the AUC score on a validation set, and it was nice to see a modest bump in the true performance of the final ML model as well.</p><h2>New Resources for Machine Learning</h2><h4><strong><a href="https://mad-data.simplecast.com/">MAD Podcast</a></strong>&nbsp;</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PBI4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d34dcba-161d-4888-bcf3-e8177c844903_1010x1010.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PBI4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d34dcba-161d-4888-bcf3-e8177c844903_1010x1010.png 424w, https://substackcdn.com/image/fetch/$s_!PBI4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d34dcba-161d-4888-bcf3-e8177c844903_1010x1010.png 848w, https://substackcdn.com/image/fetch/$s_!PBI4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d34dcba-161d-4888-bcf3-e8177c844903_1010x1010.png 1272w, https://substackcdn.com/image/fetch/$s_!PBI4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d34dcba-161d-4888-bcf3-e8177c844903_1010x1010.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PBI4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d34dcba-161d-4888-bcf3-e8177c844903_1010x1010.png" width="370" height="370" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/6d34dcba-161d-4888-bcf3-e8177c844903_1010x1010.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1010,&quot;width&quot;:1010,&quot;resizeWidth&quot;:370,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PBI4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d34dcba-161d-4888-bcf3-e8177c844903_1010x1010.png 424w, https://substackcdn.com/image/fetch/$s_!PBI4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d34dcba-161d-4888-bcf3-e8177c844903_1010x1010.png 848w, https://substackcdn.com/image/fetch/$s_!PBI4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d34dcba-161d-4888-bcf3-e8177c844903_1010x1010.png 1272w, https://substackcdn.com/image/fetch/$s_!PBI4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d34dcba-161d-4888-bcf3-e8177c844903_1010x1010.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We recommend checking out the <a href="https://mad-data.simplecast.com/">MAD Podcast</a> (Machine Learning, AI, and Data), co-hosted by Michael Harper and Honor Chan, featuring industry experts and with a focus on data quality. </p><h4><strong><a href="https://github.com/featureform/embeddinghub">Embeddinghub by Featureform</a></strong>&nbsp;</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://github.com/featureform/embeddinghub" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xshX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F205dabf9-1b50-4ee0-b9e4-6e026adb8a42_1600x1177.png 424w, https://substackcdn.com/image/fetch/$s_!xshX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F205dabf9-1b50-4ee0-b9e4-6e026adb8a42_1600x1177.png 848w, https://substackcdn.com/image/fetch/$s_!xshX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F205dabf9-1b50-4ee0-b9e4-6e026adb8a42_1600x1177.png 1272w, https://substackcdn.com/image/fetch/$s_!xshX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F205dabf9-1b50-4ee0-b9e4-6e026adb8a42_1600x1177.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xshX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F205dabf9-1b50-4ee0-b9e4-6e026adb8a42_1600x1177.png" width="406" height="298.6442307692308" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/205dabf9-1b50-4ee0-b9e4-6e026adb8a42_1600x1177.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1071,&quot;width&quot;:1456,&quot;resizeWidth&quot;:406,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://github.com/featureform/embeddinghub&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xshX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F205dabf9-1b50-4ee0-b9e4-6e026adb8a42_1600x1177.png 424w, https://substackcdn.com/image/fetch/$s_!xshX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F205dabf9-1b50-4ee0-b9e4-6e026adb8a42_1600x1177.png 848w, https://substackcdn.com/image/fetch/$s_!xshX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F205dabf9-1b50-4ee0-b9e4-6e026adb8a42_1600x1177.png 1272w, https://substackcdn.com/image/fetch/$s_!xshX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F205dabf9-1b50-4ee0-b9e4-6e026adb8a42_1600x1177.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Embeddings (dense vector representations) are a fundamental building block for Machine Learning. <a href="https://docs.featureform.com/">Embeddinghub</a> is an open-source database for storing and searching embeddings built by FeatureForm. EmbeddingHub uses RocksDB to durably store embeddings and metadata, and <a href="https://github.com/nmslib/hnswlib">HNSWLib</a> to build approximate nearest neighbor indices. Additionally, it supports versioning and rollbacks for embeddings which can be very useful for production applications. You can learn more about the project on <a href="https://github.com/featureform/embeddinghub">Github</a>.</p><h2><a href="https://twitter.com/sudeeppillai/status/1446642656797347854?s=20">Twitter | Inferentia chips</a></h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/sudeeppillai/status/1446642656797347854?s=20&quot;,&quot;full_text&quot;:&quot;1/ Spent the last day and a half running benchmarks on the Inferentia (inf1) chips from <span class=\&quot;tweet-fake-link\&quot;>@awscloud</span>, and the results are &#128293;&#128293;&#128293;! A &#129525; below:&quot;,&quot;username&quot;:&quot;sudeeppillai&quot;,&quot;name&quot;:&quot;Sudeep Pillai&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Sat Oct 09 01:04:21 +0000 2021&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:20,&quot;like_count&quot;:92,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p><a href="https://twitter.com/sudeeppillai">Sudeep Pillai</a> has a great thread testing the <a href="https://aws.amazon.com/machine-learning/inferentia/">AWS Inferentia</a> chips, custom-designed for ML inference. He finds that many ResNet and Transformer style models have pretty good performance, with some models returning sub-10-ms inference latencies. We&#8217;ll look forward to reading a more detailed post on this in the future!</p><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> or email us at <a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a>.&nbsp;</p><p>If you like what we are doing, please tell your friends and colleagues to spread the word. &#8203;&#8203;&#10084;&#65039;</p>]]></content:encoded></item><item><title><![CDATA[Issue #28: MAD Landscape. Covid-19 Border Testing. Blocking Spam@Slack. Applying ML. Scikit-learn 1.0.]]></title><description><![CDATA[Welcome to the 28th issue of the MLOps newsletter. We really enjoyed writing this one, hope you enjoy it too! In this issue, we briefly cover Nihit&#8217;s interview with Eugene Yan, discuss Matt Turck&#8217;s ML, AI, and Data landscape, share a cool ML-based invite spam detection from Slack, and dive into a fascinating reinforcement learning system for COVID testing.]]></description><link>https://mlopsroundup.substack.com/p/issue-28-mad-landscape-covid-19-border</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-28-mad-landscape-covid-19-border</guid><dc:creator><![CDATA[Rishabh Bhargava]]></dc:creator><pubDate>Mon, 04 Oct 2021 17:06:55 GMT</pubDate><enclosure url="https://cdn.substack.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5d8933-582a-483a-a2a9-70bbb3d1c6ff_2048x1059.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WLG3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fde513a64-41b9-4d6a-a2da-96fdd15d565a_1000x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WLG3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fde513a64-41b9-4d6a-a2da-96fdd15d565a_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!WLG3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fde513a64-41b9-4d6a-a2da-96fdd15d565a_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!WLG3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fde513a64-41b9-4d6a-a2da-96fdd15d565a_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!WLG3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fde513a64-41b9-4d6a-a2da-96fdd15d565a_1000x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WLG3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fde513a64-41b9-4d6a-a2da-96fdd15d565a_1000x400.png" width="1000" height="400" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/de513a64-41b9-4d6a-a2da-96fdd15d565a_1000x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WLG3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fde513a64-41b9-4d6a-a2da-96fdd15d565a_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!WLG3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fde513a64-41b9-4d6a-a2da-96fdd15d565a_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!WLG3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fde513a64-41b9-4d6a-a2da-96fdd15d565a_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!WLG3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fde513a64-41b9-4d6a-a2da-96fdd15d565a_1000x400.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Welcome to the 28th issue of the MLOps newsletter. We really enjoyed writing this one, hope you enjoy it too!&nbsp;</p><p>In this issue, we briefly cover Nihit&#8217;s interview with Eugene Yan, discuss Matt Turck&#8217;s ML, AI, and Data landscape, share a cool ML-based invite spam detection from Slack, and dive into a fascinating reinforcement learning system for COVID testing.</p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><a href="https://applyingml.com/mentors/nihit-desai/">Eugene Yan | Applying ML</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://applyingml.com/mentors/nihit-desai/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mLXM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F448c0590-1a5c-4ceb-9d33-ee79f3343120_491x250.png 424w, https://substackcdn.com/image/fetch/$s_!mLXM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F448c0590-1a5c-4ceb-9d33-ee79f3343120_491x250.png 848w, https://substackcdn.com/image/fetch/$s_!mLXM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F448c0590-1a5c-4ceb-9d33-ee79f3343120_491x250.png 1272w, https://substackcdn.com/image/fetch/$s_!mLXM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F448c0590-1a5c-4ceb-9d33-ee79f3343120_491x250.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mLXM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F448c0590-1a5c-4ceb-9d33-ee79f3343120_491x250.png" width="491" height="250" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/448c0590-1a5c-4ceb-9d33-ee79f3343120_491x250.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:491,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://applyingml.com/mentors/nihit-desai/&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mLXM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F448c0590-1a5c-4ceb-9d33-ee79f3343120_491x250.png 424w, https://substackcdn.com/image/fetch/$s_!mLXM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F448c0590-1a5c-4ceb-9d33-ee79f3343120_491x250.png 848w, https://substackcdn.com/image/fetch/$s_!mLXM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F448c0590-1a5c-4ceb-9d33-ee79f3343120_491x250.png 1272w, https://substackcdn.com/image/fetch/$s_!mLXM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F448c0590-1a5c-4ceb-9d33-ee79f3343120_491x250.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Nihit: I recently did an interview with <a href="https://twitter.com/eugeneyan">Eugene Yan</a> as part of his <a href="http://applyingml.com/">ApplyingML</a> initiative (which is a fantastic collection of resources - papers, tools, playbooks, mentor interviews - for building machine learning systems). In the interview, I share my thoughts on the machine learning infra and tooling landscape, and my experience building ML systems at Facebook. You can check out the transcript <a href="https://applyingml.com/mentors/nihit-desai/">here</a> (or read about interviews with <a href="https://applyingml.com/mentors/">other mentors</a>)</p><p>As Eugene put it, his main motivation is to collect &#8220;ghost knowledge&#8221; that resides within the community but is seldom officially documented:</p><blockquote><p>Knowledge that is present somewhere in the epistemic community, and is perhaps readily accessible to some central member of that community, but it is not really written down anywhere and it's not clear how to access it.</p></blockquote><h2><a href="https://mattturck.com/data2021/">Matt Turck | Red Hot: The 2021 Machine Learning, AI and Data (MAD) Landscape</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://mattturck.com/data2021/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mw-L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5d8933-582a-483a-a2a9-70bbb3d1c6ff_2048x1059.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Mw-L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5d8933-582a-483a-a2a9-70bbb3d1c6ff_2048x1059.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Mw-L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5d8933-582a-483a-a2a9-70bbb3d1c6ff_2048x1059.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Mw-L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5d8933-582a-483a-a2a9-70bbb3d1c6ff_2048x1059.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mw-L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5d8933-582a-483a-a2a9-70bbb3d1c6ff_2048x1059.jpeg" width="1456" height="753" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/6d5d8933-582a-483a-a2a9-70bbb3d1c6ff_2048x1059.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:753,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://mattturck.com/data2021/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mw-L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5d8933-582a-483a-a2a9-70bbb3d1c6ff_2048x1059.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Mw-L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5d8933-582a-483a-a2a9-70bbb3d1c6ff_2048x1059.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Mw-L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5d8933-582a-483a-a2a9-70bbb3d1c6ff_2048x1059.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Mw-L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6d5d8933-582a-483a-a2a9-70bbb3d1c6ff_2048x1059.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The <a href="https://mattturck.com/data2021/">Data Landscape</a> (now the MAD Landscape) post from <a href="https://twitter.com/mattturck">Matt Turck</a> is a great read every year. We covered the 2020 post <a href="https://mlopsroundup.substack.com/p/issue-4-data-landscape-ml-stack-imbalanced">last year</a>, and it was fun to look back on it and compare it to 2021.&nbsp;</p><h4><strong>How has this last year been?</strong></h4><p>As the authors write:</p><blockquote><p>It&#8217;s been a hot, hot year in the world of data, machine learning and AI.&nbsp;</p></blockquote><p>This is easily evidenced by the denser landscape chart (and our coverage of the ecosystem over the past year) but also observed by the recent, successful IPOs in the space (Snowflake, Confluent, UIPath, C3.ai, Sentinel One) and the increased funding for AI startups ($38B in the first half of 2021 compared to $36B in all of 2020).&nbsp;</p><p>We have seen tremendous growth in the number of startups and tools, leading to a dizzying set of options for companies and practitioners (which is mostly good news). As the authors put it:</p><blockquote><p>"...we believe that companies will continue to work with multiple vendors, platforms and tools, in whichever combination best suits their needs.&nbsp;&nbsp;</p><p>The key reason: the pace of innovation is just too explosive in the space for things to remain static for too long.&nbsp; Founders launch new startups, Big Tech companies create internal data/AI tools and then open source them, and for every established technology or product, a new one seems to emerge weekly.&nbsp;</p></blockquote><p>For newer categories such as MLOps, it has also led to a crowded market:</p><blockquote><p>as VCs aggressively invested in emerging sectors up and down the data stack, often betting on future growth over existing commercial traction, some categories went from nascent to crowded very rapidly &#8211; reverse ETL, data quality, data catalogs, data annotation and MLOps.&nbsp;&nbsp;</p></blockquote><h4><strong>2021 for MLOps</strong></h4><p>It&#8217;s rare that we get to quote ourselves, but here was one of our predictions from last year:</p><blockquote><p>we believe there will be many more categories (which will be much more crowded) in the coming years.</p></blockquote><p>That&#8217;s certainly proven to be true. This year, we saw new categories for:</p><ul><li><p>Feature Stores</p></li><li><p>Model Building</p></li><li><p>Deployment and Monitoring</p></li><li><p>Synthetic Media</p></li><li><p>Data Quality &amp; Observability</p></li></ul><p>The older categories for Data Generation &amp; Labelling, Data Science Notebooks, Data Science Platforms, ML Platforms, Speech, Computer Vision, and NLP continued to grow in terms of companies. We aren&#8217;t going to reference any specific companies here -- there are simply far too many for us to do them justice.&nbsp;</p><h4><strong>Final Thoughts</strong></h4><p>There are many interesting and important problems to be solved in the world of data and ML, and the density of talent trying to address these challenges is the most exciting aspect to us!&nbsp;</p><p>We definitely recommend giving the entire article a read, especially to get a sense of the broader data ecosystem.</p><h2><a href="https://www.nature.com/articles/s41586-021-04014-z">Paper | Efficient and targeted COVID-19 border testing via reinforcement learning</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.nature.com/articles/s41586-021-04014-z" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ic5B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d3b742-29bf-41f2-8c2d-dc81e48d4040_2048x911.png 424w, https://substackcdn.com/image/fetch/$s_!Ic5B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d3b742-29bf-41f2-8c2d-dc81e48d4040_2048x911.png 848w, https://substackcdn.com/image/fetch/$s_!Ic5B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d3b742-29bf-41f2-8c2d-dc81e48d4040_2048x911.png 1272w, https://substackcdn.com/image/fetch/$s_!Ic5B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d3b742-29bf-41f2-8c2d-dc81e48d4040_2048x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ic5B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d3b742-29bf-41f2-8c2d-dc81e48d4040_2048x911.png" width="1456" height="648" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/b1d3b742-29bf-41f2-8c2d-dc81e48d4040_2048x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:648,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.nature.com/articles/s41586-021-04014-z&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ic5B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d3b742-29bf-41f2-8c2d-dc81e48d4040_2048x911.png 424w, https://substackcdn.com/image/fetch/$s_!Ic5B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d3b742-29bf-41f2-8c2d-dc81e48d4040_2048x911.png 848w, https://substackcdn.com/image/fetch/$s_!Ic5B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d3b742-29bf-41f2-8c2d-dc81e48d4040_2048x911.png 1272w, https://substackcdn.com/image/fetch/$s_!Ic5B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d3b742-29bf-41f2-8c2d-dc81e48d4040_2048x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a fascinating paper that describes a Reinforcement Learning system deployed at the border of Greece.&nbsp;</p><h4><strong>What&#8217;s the Problem?</strong>&nbsp;</h4><blockquote><p>Throughout the COVID-19 pandemic, countries relied on a variety of ad-hoc border control protocols to allow for non-essential travel while safeguarding public health: from quarantining all travelers to restricting entry from select nations based on population-level epidemiological metrics such as cases, deaths or testing positivity rates.</p></blockquote><p>This has been a difficult problem for many countries -- whether to allow travelers in, which passengers to allow in, and what testing requirements to place on them. The circumstances change rapidly; outbreaks can occur in countries much faster than rules and policies can be created. The team built a reinforcement learning system that was deployed across all 40 ports of entry to Greece - from airports to land crossings to seaports.&nbsp;</p><p>Without going into details, the system consisted of the following components:</p><ol><li><p>Asking incoming travelers to fill out a form per household containing information about their travel and demographics.&nbsp;</p></li><li><p>Estimating prevalence of COVID using recent test results across a discrete set of interpretable traveler types (based on country, region, age, and gender)</p></li><li><p>Allocating a scarce number of tests among the travelers while balancing two objectives: maximize the number of infected asymptomatic travelers identified (exploitation) and better learn their prevalence of COIVD for traveler types it does not currently have precise estimates (exploration)</p></li><li><p>Conduct tests, and log data as quickly as possible to get better estimates for Step 2</p></li></ol><h4><strong>What were the Results?&nbsp;</strong></h4><p>The results are amazing. The authors compare the system&#8217;s performance against counter-factual modeling and found that it identified 1.85 times as many asymptomatic, infected travelers as random testing (with up to 2-4 times as many during peak travel), and 1.25-1.45 times as many asymptomatic, infected travelers as testing policies that only utilized epidemiological metrics.&nbsp;</p><h4><strong>Lessons from Design and Deployment</strong></h4><ul><li><p>Data minimization: Designing the algorithm with the philosophy of requiring minimal data about travelers, in order to comply with GDPR (even when there is a &#8220;tradeoff between privacy and effectiveness&#8221;)</p></li><li><p>Prioritize interpretability: For example, they used <a href="https://en.wikipedia.org/wiki/Empirical_Bayes_method">empirical Bayes</a> to communicate that large confidence intervals suggest higher risk. Similarly, their usage of <a href="https://en.wikipedia.org/wiki/Gittins_index">gittins indices</a> to provide a simple metric of risk for a traveler type made intuitive decisions easier.&nbsp;</p></li><li><p>Design for flexibility: The system required substantial financial and technical investment, and it needed to be flexible to accommodate unexpected changes. For example, they were able to quickly define new traveler types when vaccine distribution started without altering any other components.&nbsp;</p></li></ul><h4><strong>Our Thoughts</strong></h4><p>Problems such as this are fraught with complications of bias, data privacy, and other real-world concerns. We have covered extensively when ML goes wrong (see our coverage of ML + Covid <a href="https://mlopsroundup.substack.com/p/issue-25-tesla-ai-day-feature-stores">here</a> and <a href="https://mlopsroundup.substack.com/p/issue-13-feature-stores-information">here</a>), but this is a fantastic example of how we can build better systems when ML is deployed appropriately.</p><p>Some more reactions to this work:</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/EricTopol/status/1440744922668158982&quot;,&quot;full_text&quot;:&quot;\&quot;will be remembered as one of the best examples of using data in the fight against COVID-19\&quot;\n<a class=\&quot;tweet-url\&quot; href=\&quot;https://www.nature.com/articles/d41586-021-02556-w\&quot;>nature.com/articles/d4158&#8230;</a> <span class=\&quot;tweet-fake-link\&quot;>@NatureNV</span> by <span class=\&quot;tweet-fake-link\&quot;>@oziadias</span>\nAgree &#128175; &quot;,&quot;username&quot;:&quot;EricTopol&quot;,&quot;name&quot;:&quot;Eric Topol&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Wed Sep 22 18:28:51 +0000 2021&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/E_6NhIFVEAU9srS.jpg&quot;,&quot;link_url&quot;:&quot;https://t.co/Is8vA8exrH&quot;,&quot;alt_text&quot;:null},{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/E_6NimJUcAcppKZ.jpg&quot;,&quot;link_url&quot;:&quot;https://t.co/Is8vA8exrH&quot;,&quot;alt_text&quot;:null}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:27,&quot;like_count&quot;:94,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><h2><a href="https://slack.engineering/blocking-slack-invite-spam-with-machine-learning/">Slack Engineering Blog | Blocking Slack Invite Spam With Machine Learning</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://slack.engineering/blocking-slack-invite-spam-with-machine-learning/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hRIe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9ace6667-23e8-49fd-8a89-51bc66c37578_557x455.png 424w, https://substackcdn.com/image/fetch/$s_!hRIe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9ace6667-23e8-49fd-8a89-51bc66c37578_557x455.png 848w, https://substackcdn.com/image/fetch/$s_!hRIe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9ace6667-23e8-49fd-8a89-51bc66c37578_557x455.png 1272w, https://substackcdn.com/image/fetch/$s_!hRIe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9ace6667-23e8-49fd-8a89-51bc66c37578_557x455.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hRIe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9ace6667-23e8-49fd-8a89-51bc66c37578_557x455.png" width="557" height="455" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/9ace6667-23e8-49fd-8a89-51bc66c37578_557x455.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:455,&quot;width&quot;:557,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://slack.engineering/blocking-slack-invite-spam-with-machine-learning/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hRIe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9ace6667-23e8-49fd-8a89-51bc66c37578_557x455.png 424w, https://substackcdn.com/image/fetch/$s_!hRIe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9ace6667-23e8-49fd-8a89-51bc66c37578_557x455.png 848w, https://substackcdn.com/image/fetch/$s_!hRIe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9ace6667-23e8-49fd-8a89-51bc66c37578_557x455.png 1272w, https://substackcdn.com/image/fetch/$s_!hRIe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9ace6667-23e8-49fd-8a89-51bc66c37578_557x455.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://slack.engineering/blocking-slack-invite-spam-with-machine-learning/">Slack Engineering blog</a>. An example of invite spam</figcaption></figure></div><p>In this article, <a href="https://twitter.com/aronMaurer">Aaron Maurer</a> shares some learnings and insights from building invite spam detection models at Slack. We especially liked the journey (no spam detection &#8594; rules based &#8594; ML model-based) shared in this article and is something that many teams and companies will find relatable as they go through the ML adoption maturity journey.&nbsp;</p><h4><strong>Problem</strong></h4><p>One of the first things you do after creating a new Slack team is to invite people from your community/research group/company etc to join. This is done by entering emails of people you&#8217;d like to invite, and then Slack will send out invites on your behalf. However, in some cases, spammers abuse this system to send out spam invite emails.</p><h4><strong>Rule-based spam detection</strong></h4><p>As with many real-world ML applications, Slack&#8217;s early versions of detecting and enforcing invite-based spam were rules-based, based on attributes like IP addresses, the occurrence of certain phrases or words in the invite matches, etc. As can be expected, this worked reasonably well in preventing spam invites but was labor-intensive (to have to create new rules on an ongoing basis) and the rules created many false positives.</p><h4><strong>Model-based spam detection</strong></h4><ul><li><p><strong>Feature set</strong>: The team used the existing feature set that was used in the hand-tuned rules. This is quite intuitive for most ML applications that are trying to replace hand-tuned rules: oftentimes, the rules (even though they are noisy) can serve as a good set of features for your first model</p></li><li><p><strong>Labels</strong>: As highlighted in the post, getting ground truth labels (&#8220;is the invite a spam or not&#8221;) is not available for all invite emails. Even though it would be possible to get these labels via human review, there is a time and money cost to it. The team instead chose to train the model on a proxy observation: whether an invite would be accepted or not within 4 days (90% of invites are accepted in that timeframe)</p></li><li><p><strong>Model</strong>: A sparse logistic regression model, that outputs a score proportional to the probability that the invite is spammy.&nbsp;</p></li><li><p><strong>Online serving</strong>: Slack has an in-house service for serving model predictions. To serve predictions from a machine learning model online, an ML engineer would implement a lightweight Python class. The service deploys it as a microservice through Kubernetes, which can be queried from the rest of our tech stack</p></li></ul><h4><strong>Impact</strong></h4><p>As outlined in the post, the switch to model-based detection resulted in a big decrease in false positives, while maintaining high recall:&nbsp;</p><blockquote><p>The machine learning model was much better at preventing false positives. Only 3% of the invites it flagged ended up being accepted when allowed through, while around 70% of the invites flagged by the old model actually ended up being accepted</p></blockquote><h2><a href="https://towardsdatascience.com/the-6-minute-guide-to-scikit-learns-version-1-0-changes-91b739d99f71">Towards Data Science | The 6-Minute Guide to Scikit-learn&#8217;s Version 1.0 Changes &#128526;</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TKqH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7ceef2a0-8fd9-469f-b57e-9360dfaae0fc_1200x646.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TKqH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7ceef2a0-8fd9-469f-b57e-9360dfaae0fc_1200x646.png 424w, https://substackcdn.com/image/fetch/$s_!TKqH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7ceef2a0-8fd9-469f-b57e-9360dfaae0fc_1200x646.png 848w, https://substackcdn.com/image/fetch/$s_!TKqH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7ceef2a0-8fd9-469f-b57e-9360dfaae0fc_1200x646.png 1272w, https://substackcdn.com/image/fetch/$s_!TKqH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7ceef2a0-8fd9-469f-b57e-9360dfaae0fc_1200x646.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TKqH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7ceef2a0-8fd9-469f-b57e-9360dfaae0fc_1200x646.png" width="558" height="300.39" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7ceef2a0-8fd9-469f-b57e-9360dfaae0fc_1200x646.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:646,&quot;width&quot;:1200,&quot;resizeWidth&quot;:558,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TKqH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7ceef2a0-8fd9-469f-b57e-9360dfaae0fc_1200x646.png 424w, https://substackcdn.com/image/fetch/$s_!TKqH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7ceef2a0-8fd9-469f-b57e-9360dfaae0fc_1200x646.png 848w, https://substackcdn.com/image/fetch/$s_!TKqH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7ceef2a0-8fd9-469f-b57e-9360dfaae0fc_1200x646.png 1272w, https://substackcdn.com/image/fetch/$s_!TKqH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7ceef2a0-8fd9-469f-b57e-9360dfaae0fc_1200x646.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Scikit-learn, the widely used Python machine learning library, recently <a href="https://scikit-learn.org/stable/whats_new/v1.0.html">released version 1.0</a>. <a href="https://towardsdatascience.com/the-6-minute-guide-to-scikit-learns-version-1-0-changes-91b739d99f71">This article</a> by Jeff Hale highlights the changes, including bug fixes, new features, and API cleanups. We summarize the key takeaways below:</p><ol><li><p><strong>OneHotEncoder: </strong>This feature preprocessing step now supports handle_unknown='ignore' to accept values it hasn&#8217;t seen before (especially helpful for using in production where it can encounter values it does not have encoding for), and dropping categories within a feature that you might want to ignore.</p></li><li><p><strong>Pandas: </strong>When passing a dataframe as input to a transformer, it stores the feature columns in feature_names_in_. Additionally,&nbsp; get_feature_names_out has been added to the Transformer API to return names of output features.&nbsp;</p></li><li><p><strong>SGDOneClassSVM: </strong>This <a href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDOneClassSVM.html#sklearn.linear_model.SGDOneClassSVM">new class</a> is an SGD implementation of one-class SVM (commonly used for outlier or out-of-distribution detections).&nbsp;</p></li><li><p><strong>Normalization in Linear models</strong>: The &#8220;normalize&#8221; option in <a href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression">linear_model.LinearRegression</a> is deprecated and will be removed in version 1.2. This behavior can now be reproduced with a sklearn pipeline that stitches together the StandardScaler and the linear model.</p></li><li><p><strong>Sklearn metrics: </strong>Several metrics such as ConfusionMatrixDisplay and PrecisionRecallDisplay now expose two class methods from_estimator and from_predictions allowing to create a confusion matrix plot using an estimator or the predictions. Equivalent methods such as <a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_confusion_matrix.html#sklearn.metrics.plot_confusion_matrix">metrics.plot_confusion_matrix</a>&nbsp; and&nbsp; <a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_precision_recall_curve.html#sklearn.metrics.plot_precision_recall_curve">metrics.plot_precision_recall_curve</a> are deprecated and will be removed in version 1.2</p></li></ol><h2><a href="https://twitter.com/karlhigley/status/1443977231915798528">Twitter | Lessons for Recommender Systems</a></h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/karlhigley/status/1443977231915798528&quot;,&quot;full_text&quot;:&quot;Recommender systems lessons I wish someone had told me years ago:\n\n1. Any rock you lift up and peek under&#8212;models, datasets, loss functions, evaluation methods&#8212;will have a bunch of creepy crawly biases underneath. (If you haven&#8217;t spotted the popularity bias yet, keep looking.)&quot;,&quot;username&quot;:&quot;karlhigley&quot;,&quot;name&quot;:&quot;&#128123; Cold-Start Sparsity &#128123;&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Fri Oct 01 16:32:54 +0000 2021&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:56,&quot;like_count&quot;:280,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>&nbsp;</p><p>In this Twitter thread, <a href="https://www.linkedin.com/in/karlhigley/">Karl Higley</a> shares some great insights about building large-scale recommendation systems. Worth clicking and reading through it!</p><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> or email us at <a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a>. </p><p>If you like what we are doing, please tell your friends and colleagues to spread the word. &#10084;&#65039;</p>]]></content:encoded></item><item><title><![CDATA[Issue #27: Medical Imaging Challenges. Machine Unlearning. Managing Supply and Demand. AI Sandbox. ]]></title><description><![CDATA[Welcome to the 27th issue of the MLOps newsletter. It is officially one year since we started writing this newsletter, and we are incredibly grateful for your support. We are excited for many more years to come! &#127881; In this issue, we cover ... Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;]]></description><link>https://mlopsroundup.substack.com/p/issue-27</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-27</guid><dc:creator><![CDATA[Rishabh Bhargava]]></dc:creator><pubDate>Mon, 20 Sep 2021 17:08:34 GMT</pubDate><enclosure url="https://cdn.substack.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec0fee37-f269-41f8-b886-fef412cd3906_592x451.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y2i1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F122733bb-5cd1-4555-a98b-9df3cafd8257_1000x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y2i1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F122733bb-5cd1-4555-a98b-9df3cafd8257_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!y2i1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F122733bb-5cd1-4555-a98b-9df3cafd8257_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!y2i1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F122733bb-5cd1-4555-a98b-9df3cafd8257_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!y2i1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F122733bb-5cd1-4555-a98b-9df3cafd8257_1000x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y2i1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F122733bb-5cd1-4555-a98b-9df3cafd8257_1000x400.png" width="1000" height="400" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/122733bb-5cd1-4555-a98b-9df3cafd8257_1000x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:43471,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!y2i1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F122733bb-5cd1-4555-a98b-9df3cafd8257_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!y2i1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F122733bb-5cd1-4555-a98b-9df3cafd8257_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!y2i1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F122733bb-5cd1-4555-a98b-9df3cafd8257_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!y2i1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F122733bb-5cd1-4555-a98b-9df3cafd8257_1000x400.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Welcome to the 27th issue of the MLOps newsletter. It is officially one year since we started writing this newsletter, and we are incredibly grateful for your support. We are excited for the next year to come!&nbsp;&#127881;</p><p>In this issue, we cover a paper on the machine learning challenges in medical imaging, discuss interesting research regarding machine unlearning, share Doordash&#8217;s strategies for matching supply and demand, and much more. </p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><a href="https://www.arxiv-vanity.com/papers/2103.10292/">Paper | How I failed machine learning in medical imaging - shortcomings, and recommendations</a></h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.arxiv-vanity.com/papers/2103.10292/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xD1X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fba76-2bd0-496a-9ddf-e9e0fba18247_677x226.png 424w, https://substackcdn.com/image/fetch/$s_!xD1X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fba76-2bd0-496a-9ddf-e9e0fba18247_677x226.png 848w, https://substackcdn.com/image/fetch/$s_!xD1X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fba76-2bd0-496a-9ddf-e9e0fba18247_677x226.png 1272w, https://substackcdn.com/image/fetch/$s_!xD1X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fba76-2bd0-496a-9ddf-e9e0fba18247_677x226.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xD1X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fba76-2bd0-496a-9ddf-e9e0fba18247_677x226.png" width="677" height="226" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/288fba76-2bd0-496a-9ddf-e9e0fba18247_677x226.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:226,&quot;width&quot;:677,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.arxiv-vanity.com/papers/2103.10292/&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xD1X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fba76-2bd0-496a-9ddf-e9e0fba18247_677x226.png 424w, https://substackcdn.com/image/fetch/$s_!xD1X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fba76-2bd0-496a-9ddf-e9e0fba18247_677x226.png 848w, https://substackcdn.com/image/fetch/$s_!xD1X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fba76-2bd0-496a-9ddf-e9e0fba18247_677x226.png 1272w, https://substackcdn.com/image/fetch/$s_!xD1X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fba76-2bd0-496a-9ddf-e9e0fba18247_677x226.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>This is a fascinating paper that highlights some of the challenges when applying machine learning in the world of medical imaging.&nbsp;</p><p>As the authors put it:</p><blockquote><p>there is a staggering amount of research on machine learning for medical images, as many recent surveys show. This growth does not inherently lead to clinical progress. The higher volume of research can be aligned with the academic incentives rather than the needs of clinicians and patients. As an example, there can be an oversupply of papers showing state-of-the-art performance on benchmark data, but no practical improvement for the clinical problem.</p></blockquote><h4><strong>It&#8217;s Not All About Larger Datasets</strong></h4><p>Large labeled datasets have solved many problems in computer vision. However, very few clinical questions can be easily posed as discrimination tasks, and even when they do, larger datasets can often fail to &#8220;solve&#8221; such questions.&nbsp;</p><p>An example is that of early diagnosis of Alzheimer&#8217;s Disease. Even with a lot of effort and collection of lots of data:</p><blockquote><p>the increase in data size did not come with better diagnostic accuracy, in particular for the most clinically-relevant question, distinguishing pathological versus stable evolution for patients with symptoms of prodromal Alzheimer&#8217;s. Rather, studies with larger sample sizes tend to report worse prediction accuracy.</p></blockquote><h4><strong>Datasets Reflect An Application Only Partly</strong></h4><blockquote><p>Available datasets only partially reflect the clinical situation for a particular medical condition, leading to dataset bias... Dataset bias occurs when the data used to build the decision model (the training data), has a different distribution than the data representing the population on which it should be applied (the test data).&nbsp;</p></blockquote><p>There are many sources of dataset bias; from a cohort not representing the range of possible patients, to imaging procedures introducing biases. A particularly harmful bias is when spurious correlations can appear in clinical images, for example, when dermatologists place a mark next to lesions. Labeling errors can also introduce biases. Expert human annotators may give different labels with systematic biases, but multiple annotators are seldom available.</p><h4><strong>Metrics That Do Not Reflect What We Want</strong></h4><p>Suitable metrics for reporting performance on a dataset can change over time, and important metrics, such as <a href="https://machinelearningmastery.com/calibrated-classification-model-in-scikit-learn/">calibration</a>, can be missing from research. Also, metrics that are used may not be synonymous with practical improvement.&nbsp;</p><p>Similarly, when comparing such metrics to baseline, baselines may be poorly chosen. For example, choosing an underpowered baseline may create an &#8220;illusion of progress&#8221;. The opposite problem, of not reporting a simple problem that would have been effective on a dataset is also leaving our critical information.&nbsp;</p><h4>Finally, the evaluation error on a dataset might be higher than the performance gains, even as more and more effort is spent on diminishing returns. <strong>&#8203;&#8203;More Than Beating The Benchmark</strong></h4><blockquote><p>Good machine-learning benchmarks are more difficult than they may seem...we want to look at more than just outperforming the benchmark, even if this is done with proper validation and statistical significance testing. One point of view is that rejecting a null is not sufficient, and that a method should be accepted based on evidence that it brings a sizable improvement upon the existing solutions.</p></blockquote><h4><strong>Conclusions</strong></h4><p>This paper was a great (although not the most cheerful) read. It&#8217;s important to keep in mind some of the challenges with machine learning research, especially in complex domains like medical imaging. We remain super excited about what ML technology can bring to us in this discipline!</p><h2><a href="https://www.wired.com/story/machines-can-learn-can-they-unlearn/">Wired | Now That Machines Can Learn, Can They Unlearn?</a></h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.wired.com/story/machines-can-learn-can-they-unlearn/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QgYA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F44c1720d-67e3-4eec-baf3-82226a474c8e_275x183.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QgYA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F44c1720d-67e3-4eec-baf3-82226a474c8e_275x183.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QgYA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F44c1720d-67e3-4eec-baf3-82226a474c8e_275x183.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QgYA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F44c1720d-67e3-4eec-baf3-82226a474c8e_275x183.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QgYA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F44c1720d-67e3-4eec-baf3-82226a474c8e_275x183.jpeg" width="363" height="241.56" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/44c1720d-67e3-4eec-baf3-82226a474c8e_275x183.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:183,&quot;width&quot;:275,&quot;resizeWidth&quot;:363,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.wired.com/story/machines-can-learn-can-they-unlearn/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QgYA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F44c1720d-67e3-4eec-baf3-82226a474c8e_275x183.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QgYA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F44c1720d-67e3-4eec-baf3-82226a474c8e_275x183.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QgYA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F44c1720d-67e3-4eec-baf3-82226a474c8e_275x183.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QgYA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F44c1720d-67e3-4eec-baf3-82226a474c8e_275x183.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4><strong>What is Machine Unlearning?&nbsp;</strong></h4><p>Machine Unlearning is a new area of research that aims to make models selectively &#8220;forget&#8221; specific data points it was trained on, along with the &#8220;learning&#8221; derived from it i.e. the influence of these training instances on model parameters. The goal is to remove all traces of a particular data point from a machine learning system, without affecting the aggregate model performance.</p><h4><strong>Why is it important?&nbsp;</strong></h4><p>Work is motivated in this area, in part by growing concerns around privacy and regulations like GDPR and the &#8220;Right to be Forgotten&#8221;. While multiple companies today allow users to request their private data be deleted, there is no way to request that all context learned by algorithms from this data be deleted as well. Furthermore, as we have <a href="https://mlopsroundup.substack.com/p/issue-13-feature-stores-information">covered previously</a>, ML models suffer from information leakage and machine unlearning can be an important lever to combat this.&nbsp;</p><p>In our view, there is  another important reason why machine unlearning is important: it can help make recurring model training more efficient by making models forget those training examples that are outdated, or no longer matter.</p><h4><strong>Initial Explorations</strong></h4><ul><li><p>This <a href="https://arxiv.org/abs/1912.03817">paper</a> by researchers from the universities of Toronto and Wisconsin-Madison introduces a framework to expedite the unlearning process by limiting the influence of a data point in the training procedure.&nbsp;</p></li></ul><ul><li><p>A more recent <a href="https://arxiv.org/abs/2103.03279">paper</a> by researchers at Cornell, University of Waterloo, and Google studies the generalizability of various machine unlearning approaches. It also proposes a new unlearning algorithm that improves &#8220;deletion capacity&#8221; (fraction of the training dataset that can be deleted with a bounded loss to performance) for convex loss functions.</p></li></ul><p>This is a new area of research. The article highlights a couple of papers that have been published recently around this (that we shared above). Our general take is that we need a lot more exploration and rigorous testing before concluding the real-world applicability of machine unlearning.</p><h2><a href="https://doordash.engineering/2021/06/29/managing-supply-and-demand-balance-through-machine-learning/">Doordash Engineering Blog | Managing Supply and Demand Balance Through Machine Learning</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://doordash.engineering/2021/06/29/managing-supply-and-demand-balance-through-machine-learning/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uLtT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec0fee37-f269-41f8-b886-fef412cd3906_592x451.png 424w, https://substackcdn.com/image/fetch/$s_!uLtT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec0fee37-f269-41f8-b886-fef412cd3906_592x451.png 848w, https://substackcdn.com/image/fetch/$s_!uLtT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec0fee37-f269-41f8-b886-fef412cd3906_592x451.png 1272w, https://substackcdn.com/image/fetch/$s_!uLtT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec0fee37-f269-41f8-b886-fef412cd3906_592x451.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uLtT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec0fee37-f269-41f8-b886-fef412cd3906_592x451.png" width="592" height="451" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/ec0fee37-f269-41f8-b886-fef412cd3906_592x451.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:451,&quot;width&quot;:592,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://doordash.engineering/2021/06/29/managing-supply-and-demand-balance-through-machine-learning/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uLtT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec0fee37-f269-41f8-b886-fef412cd3906_592x451.png 424w, https://substackcdn.com/image/fetch/$s_!uLtT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec0fee37-f269-41f8-b886-fef412cd3906_592x451.png 848w, https://substackcdn.com/image/fetch/$s_!uLtT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec0fee37-f269-41f8-b886-fef412cd3906_592x451.png 1272w, https://substackcdn.com/image/fetch/$s_!uLtT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec0fee37-f269-41f8-b886-fef412cd3906_592x451.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Doordash recently shared their approach to managing supply and demand balance on their platform with machine learning. The article does a good job of explaining the problem, challenges, and their approach and we recommend reading it entirely. We highlight some salient takeaways below.</p><h4><strong>Delivery-level vs Market-level balance</strong></h4><p>The goal of the system is to balance supply (of Dashers) and demand (of delivery orders). But at what granularity should this balance be optimized (and measured)? The ideal scenario would be to have the system balance supply and demand at a delivery/order level - i.e. every order has a Dasher available at the most optimal time, and every Dasher meets or exceeds their targeted pay per hour. However, a more tractable problem is for the system to balance at the market level i.e. ensure that as many Dashers as are approximately necessary and sufficient to meet aggregate order demand but each delivery/Dasher&#8217;s outcome might be less than the optimal.</p><h4><strong>Problem Formulation</strong></h4><p><strong>Forecasting</strong>: As shared in the article, the primary metric used to measure this is &#8220;number of delivery hours&#8221; i.e. the system tries to balance the number of Dasher-hours required to make deliveries while keeping delivery time low and Dasher utilization high. In this way, the demand forecasting problem is mapped to a regression problem. As highlighted in the article, the team used <a href="https://en.wikipedia.org/wiki/Gradient_boosting#Gradient_tree_boosting">gradient boosting</a> as the underlying model architecture, through the <a href="https://lightgbm.readthedocs.io/en/latest/index.html">LightGBM framework</a>.</p><p><strong>Optimization</strong>: Predictions from the forecasting ML model are fed into a <a href="https://en.wikipedia.org/wiki/Integer_programming">Mixed-integer programming</a> (MIP) optimizer whose objective is to decide incentive rewards to minimize undersupply of Dashers subject to certain configurable constraints.&nbsp;</p><h4><strong>Reliability Considerations</strong></h4><p>The article shares some interesting and valuable learnings on tradeoffs between model performance and reliability. Their forecasting model, for example, relies on a minimal set of features by design and it reduces the complexity of ETL pipelines and the likelihood of feature drift, improving transparency:</p><blockquote><p>We could encode hundreds of features to build a model that has high performance. Although that choice is very appealing and it does help with creating a model that performs better than one that has a simple data pipeline, in practice it creates a system that lacks reliability and generates a high surface area for <a href="https://doordash.engineering/2021/05/20/monitor-machine-learning-model-drift/">feature drift</a></p></blockquote><p>Another point highlighted in the article is around long-chain ETL dependencies, something that we have observed in our own work experiences. ETL jobs can fail or be delayed for any number of reasons. When inputs to your models are a final result of a long chain of such dependencies, it can become very unreliable due to increased surface area for failures. Designing your input feature computations to have minimal intermediate dependencies can go a long way to improve reliability.</p><h2><a href="https://www.morningbrew.com/emerging-tech/stories/2021/05/26/regulate-ai-just-play-sandbox">Emerging Tech Brew Blog | To regulate AI, try playing in a sandbox</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.morningbrew.com/emerging-tech/stories/2021/05/26/regulate-ai-just-play-sandbox" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!d06a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d3fafc-1149-4d01-9e73-44fa5ea49667_1500x1000.png 424w, https://substackcdn.com/image/fetch/$s_!d06a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d3fafc-1149-4d01-9e73-44fa5ea49667_1500x1000.png 848w, https://substackcdn.com/image/fetch/$s_!d06a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d3fafc-1149-4d01-9e73-44fa5ea49667_1500x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!d06a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d3fafc-1149-4d01-9e73-44fa5ea49667_1500x1000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!d06a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d3fafc-1149-4d01-9e73-44fa5ea49667_1500x1000.png" width="502" height="334.7815934065934" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/b6d3fafc-1149-4d01-9e73-44fa5ea49667_1500x1000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:502,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.morningbrew.com/emerging-tech/stories/2021/05/26/regulate-ai-just-play-sandbox&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!d06a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d3fafc-1149-4d01-9e73-44fa5ea49667_1500x1000.png 424w, https://substackcdn.com/image/fetch/$s_!d06a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d3fafc-1149-4d01-9e73-44fa5ea49667_1500x1000.png 848w, https://substackcdn.com/image/fetch/$s_!d06a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d3fafc-1149-4d01-9e73-44fa5ea49667_1500x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!d06a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6d3fafc-1149-4d01-9e73-44fa5ea49667_1500x1000.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The last couple of years have been especially news-filled on the AI regulation front.&nbsp;</p><p>Last May, Norway announced plans to create an AI regulatory sandbox, and <a href="https://mlopsroundup.substack.com/p/issue-16-eu-ai-regulations-data-centric">the recently proposed EU legislation</a> mentions the word &#8220;sandbox&#8221; 38 times. The goal of such regulatory sandboxes is to:</p><blockquote><p>&#8230;allow organizations to develop and test new technologies in a low-stakes, monitored environment before rolling them out to the general public.&nbsp;</p></blockquote><h4><strong>What happened?</strong></h4><p>After Norway&#8217;s data protection authority, Datatilsynet, announced plans to create an AI regulatory sandbox, it accepted four participants (from both the public and private sector) to work in 3-6 month-long engagements with them. These participants were facing regulatory conundrums such as &#8220;applying principles like transparency, fairness, and data minimization to AI systems&#8221;.&nbsp;</p><blockquote><p>Generally, the goal of Norway&#8217;s AI regulatory sandbox is to facilitate compliance with some of these trickier provisions of the GDPR. It&#8217;s not looking to create a process that all tech developers go through, but rather to produce helpful precedent in fuzzy legal areas, and communicate those findings to organizations building AI systems. Its approach relies on a series of hands-on workshops and extended conversation and negotiation between tech developers and regulators.</p></blockquote><h4><strong>Our Thoughts</strong></h4><p>These are still very early days for both the usage of AI in industry and regulating such technology. As different countries (and regions such as the EU) try different strategies to tackle the very real risk that AI systems pose, we will get to learn from the approaches that deliver the best results.</p><h2><a href="https://docs.snowflake.com/en/release-notes/2021-09.html#unstructured-data-support-preview">Snowflake | Unstructured Data Support</a></h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rw66!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf67f9d8-2eee-46e8-a64b-c3f8e2784874_1280x306.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rw66!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf67f9d8-2eee-46e8-a64b-c3f8e2784874_1280x306.png 424w, https://substackcdn.com/image/fetch/$s_!rw66!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf67f9d8-2eee-46e8-a64b-c3f8e2784874_1280x306.png 848w, https://substackcdn.com/image/fetch/$s_!rw66!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf67f9d8-2eee-46e8-a64b-c3f8e2784874_1280x306.png 1272w, https://substackcdn.com/image/fetch/$s_!rw66!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf67f9d8-2eee-46e8-a64b-c3f8e2784874_1280x306.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rw66!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf67f9d8-2eee-46e8-a64b-c3f8e2784874_1280x306.png" width="538" height="128.615625" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/df67f9d8-2eee-46e8-a64b-c3f8e2784874_1280x306.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:306,&quot;width&quot;:1280,&quot;resizeWidth&quot;:538,&quot;bytes&quot;:42827,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rw66!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf67f9d8-2eee-46e8-a64b-c3f8e2784874_1280x306.png 424w, https://substackcdn.com/image/fetch/$s_!rw66!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf67f9d8-2eee-46e8-a64b-c3f8e2784874_1280x306.png 848w, https://substackcdn.com/image/fetch/$s_!rw66!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf67f9d8-2eee-46e8-a64b-c3f8e2784874_1280x306.png 1272w, https://substackcdn.com/image/fetch/$s_!rw66!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf67f9d8-2eee-46e8-a64b-c3f8e2784874_1280x306.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Snowflake recently announced support for unstructured data - now users can access, share, load, and process unstructured files of data in Snowflake.&nbsp;</p><p>First previewed at the <a href="https://www.snowflake.com/news/snowflake-announces-new-features-to-bring-together-the-worlds-data-in-the-data-cloud/">Snowflake Summit</a> earlier this year as being in &#8216;private preview&#8217;, you can read more about it in this <a href="https://docs.snowflake.com/en/user-guide/unstructured-intro.html">documentation</a>. The release notes and documentation are a little bit light on what&#8217;s possible to do with unstructured data beyond storing and accessing this data in Snowflake, but there is an upcoming webinar on September 22 (<a href="https://www.snowflake.com/webinar/thought-leadership/7-ways-to-start-using-unstructured-data-in-snowflake-apac-2021-09-22/">signup link</a>) that might be relevant if you wish to learn more about this.</p><h2><a href="https://github.com/online-ml/river">Github | Python Package for Online ML</a></h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://github.com/online-ml/river" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6k3f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4c031-735b-4bd6-911d-029893e0c515_841x200.svg 424w, https://substackcdn.com/image/fetch/$s_!6k3f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4c031-735b-4bd6-911d-029893e0c515_841x200.svg 848w, https://substackcdn.com/image/fetch/$s_!6k3f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4c031-735b-4bd6-911d-029893e0c515_841x200.svg 1272w, https://substackcdn.com/image/fetch/$s_!6k3f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4c031-735b-4bd6-911d-029893e0c515_841x200.svg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6k3f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4c031-735b-4bd6-911d-029893e0c515_841x200.svg" width="584" height="139.36954585930542" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/e0b4c031-735b-4bd6-911d-029893e0c515_841x200.svg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:268,&quot;width&quot;:1123,&quot;resizeWidth&quot;:584,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;river_logo&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://github.com/online-ml/river&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="river_logo" title="river_logo" srcset="https://substackcdn.com/image/fetch/$s_!6k3f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4c031-735b-4bd6-911d-029893e0c515_841x200.svg 424w, https://substackcdn.com/image/fetch/$s_!6k3f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4c031-735b-4bd6-911d-029893e0c515_841x200.svg 848w, https://substackcdn.com/image/fetch/$s_!6k3f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4c031-735b-4bd6-911d-029893e0c515_841x200.svg 1272w, https://substackcdn.com/image/fetch/$s_!6k3f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4c031-735b-4bd6-911d-029893e0c515_841x200.svg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><a href="https://riverml.xyz/latest/">River</a> looks like a neat Python library for <a href="https://www.wikiwand.com/en/Online_machine_learning">online machine learning</a>. As they put it:</p><blockquote><p>&#8203;&#8203;River's ambition is to be the go-to library for doing machine learning on streaming data.<br>Machine learning is often done in a batch setting, whereby a model is fitted to a dataset in one go. This results in a static model which has to be retrained in order to learn from new data&#8230; With River, we encourage a different approach, which is to continuously learn a stream of data. This means that the model process one observation at a time, and can therefore be updated on the fly. This allows to learn from massive datasets that don't fit in main memory.&nbsp;</p></blockquote><p>You can learn more by reading <a href="https://arxiv.org/pdf/2012.04740.pdf">this paper</a> or by watching <a href="https://www.youtube.com/watch?v=P3M6dt7bY9U&amp;list=PLGVZCDnMOq0q7_6SdrC2wRtdkojGBTAht&amp;index=12&amp;ab_channel=PyData">this video</a>.</p><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> or email us at <a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a>. If you like what we are doing please tell your friends and colleagues to spread the word.</p>]]></content:encoded></item><item><title><![CDATA[Issue #26: Concept Drift. Anomaly Detection with Self-Supervision. NLP in Legal Applications. Models Per Customer?]]></title><description><![CDATA[Welcome to the 26th issue of the MLOps newsletter.]]></description><link>https://mlopsroundup.substack.com/p/issue-26-concept-drift-anomaly-detection</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-26-concept-drift-anomaly-detection</guid><dc:creator><![CDATA[Rishabh Bhargava]]></dc:creator><pubDate>Tue, 07 Sep 2021 17:25:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wL-_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10d8b5f2-cad5-4206-ad35-4b45c1a01f69_563x376.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 26th issue of the MLOps newsletter.&nbsp;</p><p>In this issue, we deep-dive into inferring concept drift, share a paper on outlier detection using self-supervised learning, discuss NLP applications to summarize legal documents, cover a recent article about customization in B2B ML applications, and more.</p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><a href="https://concept-drift.fastforwardlabs.com/">Fast Forward Labs | Inferring Concept Drift Without Labeled Data</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://concept-drift.fastforwardlabs.com/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wL-_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10d8b5f2-cad5-4206-ad35-4b45c1a01f69_563x376.png 424w, https://substackcdn.com/image/fetch/$s_!wL-_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10d8b5f2-cad5-4206-ad35-4b45c1a01f69_563x376.png 848w, https://substackcdn.com/image/fetch/$s_!wL-_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10d8b5f2-cad5-4206-ad35-4b45c1a01f69_563x376.png 1272w, https://substackcdn.com/image/fetch/$s_!wL-_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10d8b5f2-cad5-4206-ad35-4b45c1a01f69_563x376.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wL-_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10d8b5f2-cad5-4206-ad35-4b45c1a01f69_563x376.png" width="563" height="376" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/10d8b5f2-cad5-4206-ad35-4b45c1a01f69_563x376.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:376,&quot;width&quot;:563,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://concept-drift.fastforwardlabs.com/&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wL-_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10d8b5f2-cad5-4206-ad35-4b45c1a01f69_563x376.png 424w, https://substackcdn.com/image/fetch/$s_!wL-_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10d8b5f2-cad5-4206-ad35-4b45c1a01f69_563x376.png 848w, https://substackcdn.com/image/fetch/$s_!wL-_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10d8b5f2-cad5-4206-ad35-4b45c1a01f69_563x376.png 1272w, https://substackcdn.com/image/fetch/$s_!wL-_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F10d8b5f2-cad5-4206-ad35-4b45c1a01f69_563x376.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a wonderful introduction to the problem of concept drift (and how to infer it) from the team at <a href="https://www.cloudera.com/products/fast-forward-labs-research.html">Fast Forward Labs</a>. We have covered concept drift <a href="https://mlopsroundup.substack.com/p/issue-22-github-copilot-cicd-for">before</a>, but we recommend going through this post for a detailed overview, replete with experiments and nice visualizations. In the meantime, here&#8217;s a summary.&nbsp;</p><h4><strong>Motivation</strong></h4><p>The authors put it best:</p><blockquote><p>After iterations of development and testing, deploying a well-fit machine learning model often feels like the final hurdle for an eager data science team. In practice however, a trained model is never final, and this milestone marks just the beginning of the perpetual maintenance race that is production machine learning. This is because most machine learning models are static, but the world we live in is dynamic.</p></blockquote><h4><strong>What is Concept Drift?&nbsp;</strong></h4><blockquote><p>This phenomenon in which the statistical properties of a target domain change over time is considered concept drift.</p></blockquote><p>This has two parts, feature drift and &#8220;real concept drift&#8221;.&nbsp;</p><p>Feature drift refers to changes in the input variables (i.e. changes in P(X)). Such changes to input data over time may or may not affect the actual performance of the learned ML model, as seen in the image below.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://concept-drift.fastforwardlabs.com/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vFwp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4df2dfad-efc6-4975-8f88-a4521a86b4b6_564x376.png 424w, https://substackcdn.com/image/fetch/$s_!vFwp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4df2dfad-efc6-4975-8f88-a4521a86b4b6_564x376.png 848w, https://substackcdn.com/image/fetch/$s_!vFwp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4df2dfad-efc6-4975-8f88-a4521a86b4b6_564x376.png 1272w, https://substackcdn.com/image/fetch/$s_!vFwp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4df2dfad-efc6-4975-8f88-a4521a86b4b6_564x376.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vFwp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4df2dfad-efc6-4975-8f88-a4521a86b4b6_564x376.png" width="482" height="321.3333333333333" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/4df2dfad-efc6-4975-8f88-a4521a86b4b6_564x376.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:376,&quot;width&quot;:564,&quot;resizeWidth&quot;:482,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://concept-drift.fastforwardlabs.com/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vFwp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4df2dfad-efc6-4975-8f88-a4521a86b4b6_564x376.png 424w, https://substackcdn.com/image/fetch/$s_!vFwp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4df2dfad-efc6-4975-8f88-a4521a86b4b6_564x376.png 848w, https://substackcdn.com/image/fetch/$s_!vFwp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4df2dfad-efc6-4975-8f88-a4521a86b4b6_564x376.png 1272w, https://substackcdn.com/image/fetch/$s_!vFwp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4df2dfad-efc6-4975-8f88-a4521a86b4b6_564x376.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Real concept drift refers to changes in the learned relationships between the inputs and the target variables (i.e. changes in P(y|X)). This type of drift always causes a drop in model performance. &#8220;Real concept drift&#8221; can happen at the same time as Feature drift - as seen in the image below.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://concept-drift.fastforwardlabs.com/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IUUb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5765082d-a172-4852-8276-06fd7c7601a3_564x376.png 424w, https://substackcdn.com/image/fetch/$s_!IUUb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5765082d-a172-4852-8276-06fd7c7601a3_564x376.png 848w, https://substackcdn.com/image/fetch/$s_!IUUb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5765082d-a172-4852-8276-06fd7c7601a3_564x376.png 1272w, https://substackcdn.com/image/fetch/$s_!IUUb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5765082d-a172-4852-8276-06fd7c7601a3_564x376.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IUUb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5765082d-a172-4852-8276-06fd7c7601a3_564x376.png" width="468" height="312" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/5765082d-a172-4852-8276-06fd7c7601a3_564x376.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:376,&quot;width&quot;:564,&quot;resizeWidth&quot;:468,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://concept-drift.fastforwardlabs.com/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IUUb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5765082d-a172-4852-8276-06fd7c7601a3_564x376.png 424w, https://substackcdn.com/image/fetch/$s_!IUUb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5765082d-a172-4852-8276-06fd7c7601a3_564x376.png 848w, https://substackcdn.com/image/fetch/$s_!IUUb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5765082d-a172-4852-8276-06fd7c7601a3_564x376.png 1272w, https://substackcdn.com/image/fetch/$s_!IUUb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5765082d-a172-4852-8276-06fd7c7601a3_564x376.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Supervised Drift Detection</strong></h4><p>Often, teams will:</p><blockquote><p>monitor a task-dependent performance metric like accuracy, F-score, or precision/recall. If the metric of interest deviates from an acceptable level (as determined during training evaluation on the reference window), a drift is signaled.</p></blockquote><p>At this point, teams will retrain the model with fresh data, and performance levels will improve. However, there is a flawed assumption here: that true labels are instantaneously available after inference. Annotations can be expensive to obtain, sometimes requiring hired domain expertise. Aside from the cost, labels can sometimes take a long time to become available -- for example, it can take days to months for fraud to be reported or defaults on a loan to occur.&nbsp;</p><p>In such cases, detection of drift without any actual labels can be very helpful for ML teams.&nbsp;</p><h4><strong>Unsupervised Drift Detection</strong></h4><p>Without ground truth labels, any drift detection will be prone to some errors - both false positives (signaling drift when there is no impact on model performance) and false negatives (missing crucial concept drift problems). However, some techniques that might prove helpful:</p><ul><li><p><strong>Statistical test for change in feature space</strong>: Compare a time window for each input feature with a reference time window (or from the training data) to see if they come from the same distribution (using tests like the <a href="https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test">Kolmogorov&#8211;Smirnov test</a>)</p></li><li><p><strong>Statistical test for change in response variable</strong>: Similar to input features, we can run the KS test on the predicted probability values on fresh data and compare it with predicted probabilities from the past.</p></li><li><p><strong>Statistical test for change in margin density of response variable</strong>: In the previous techniques, we were looking at changes in the entire probability distribution. However, changes in the probability values might not matter too much when the model is very confident (p &lt; 0.1 or p &gt; 0.9). In those cases, we might look for changes within a margin of probability values close to the decision threshold.&nbsp;</p></li></ul><p>For more details (and their experimental setup on a toy problem), we recommend reading the entire post <a href="https://concept-drift.fastforwardlabs.com/">here</a>. This is a really important topic that is going to gather more mindshare as more production ML systems are built. </p><h2><a href="http://ai.googleblog.com/2021/09/discovering-anomalous-data-with-self.html">Google AI: Discovering Anomalous Data with Self-Supervised Learning</a></h2><p>Anomaly detection (or outlier detection) is a common application of Machine Learning with use cases across multiple domains - detecting fraudulent financial transactions, manufacturing defects, tumors in X-rays, and so on. Researchers at Google recently published a novel approach (subsequently <a href="https://arxiv.org/pdf/2011.02578.pdf">published</a> in ICLR &#8216;21) for anomaly detection based on self-supervised representation learning.</p><h4><strong>The Approach</strong></h4><p>Google&#8217;s approach is based on a two-stage framework for deep one-class classification:</p><ol><li><p>In the first stage, the model learns self-supervised representations from one-class data.</p></li><li><p>In the second stage, a one-class classifier, such as <a href="https://papers.nips.cc/paper/1999/file/8725fb777f25776ffa9076e44fcfd776-Paper.pdf">OC-SVM</a> or <a href="https://en.wikipedia.org/wiki/Kernel_density_estimation">kernel density estimator</a>, is trained using the learned representations from the first stage.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://ai.googleblog.com/2021/09/discovering-anomalous-data-with-self.html" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rwHy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7c38efe3-40d1-4db4-ac1a-207cbbea11aa_1728x424.png 424w, https://substackcdn.com/image/fetch/$s_!rwHy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7c38efe3-40d1-4db4-ac1a-207cbbea11aa_1728x424.png 848w, https://substackcdn.com/image/fetch/$s_!rwHy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7c38efe3-40d1-4db4-ac1a-207cbbea11aa_1728x424.png 1272w, https://substackcdn.com/image/fetch/$s_!rwHy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7c38efe3-40d1-4db4-ac1a-207cbbea11aa_1728x424.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rwHy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7c38efe3-40d1-4db4-ac1a-207cbbea11aa_1728x424.png" width="1456" height="357" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7c38efe3-40d1-4db4-ac1a-207cbbea11aa_1728x424.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:357,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://ai.googleblog.com/2021/09/discovering-anomalous-data-with-self.html&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rwHy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7c38efe3-40d1-4db4-ac1a-207cbbea11aa_1728x424.png 424w, https://substackcdn.com/image/fetch/$s_!rwHy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7c38efe3-40d1-4db4-ac1a-207cbbea11aa_1728x424.png 848w, https://substackcdn.com/image/fetch/$s_!rwHy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7c38efe3-40d1-4db4-ac1a-207cbbea11aa_1728x424.png 1272w, https://substackcdn.com/image/fetch/$s_!rwHy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7c38efe3-40d1-4db4-ac1a-207cbbea11aa_1728x424.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>In addition to the above framework, their work also includes a distribution-augmented contrastive learning algorithm to learn self-supervised representations that are more suited to outlier detection downstream tasks.</p><h4><strong>The Results</strong></h4><p>The researchers experimented with two representative self-supervised representation learning algorithms, <a href="https://arxiv.org/abs/1803.07728">rotation prediction</a>, and <a href="https://ai.googleblog.com/2020/04/advancing-self-supervised-and-semi.html">contrastive learning</a>, and evaluated the performance of one-class classification on the commonly used datasets in computer vision, including <a href="https://www.cs.toronto.edu/~kriz/cifar.html">CIFAR10 and CIFAR-100</a>, <a href="https://arxiv.org/pdf/1708.07747.pdf">Fashion MNIST</a>, and <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2007/10/CCS2007.pdf">Cat vs Dog</a>. Images from one class are given as inliers and those from remaining classes are given as outliers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://ai.googleblog.com/2021/09/discovering-anomalous-data-with-self.html" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jsnV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6bc9737-485c-4830-bf58-96ac59dc25fe_1370x498.png 424w, https://substackcdn.com/image/fetch/$s_!jsnV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6bc9737-485c-4830-bf58-96ac59dc25fe_1370x498.png 848w, https://substackcdn.com/image/fetch/$s_!jsnV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6bc9737-485c-4830-bf58-96ac59dc25fe_1370x498.png 1272w, https://substackcdn.com/image/fetch/$s_!jsnV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6bc9737-485c-4830-bf58-96ac59dc25fe_1370x498.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jsnV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6bc9737-485c-4830-bf58-96ac59dc25fe_1370x498.png" width="1370" height="498" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/b6bc9737-485c-4830-bf58-96ac59dc25fe_1370x498.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:498,&quot;width&quot;:1370,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://ai.googleblog.com/2021/09/discovering-anomalous-data-with-self.html&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jsnV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6bc9737-485c-4830-bf58-96ac59dc25fe_1370x498.png 424w, https://substackcdn.com/image/fetch/$s_!jsnV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6bc9737-485c-4830-bf58-96ac59dc25fe_1370x498.png 848w, https://substackcdn.com/image/fetch/$s_!jsnV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6bc9737-485c-4830-bf58-96ac59dc25fe_1370x498.png 1272w, https://substackcdn.com/image/fetch/$s_!jsnV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6bc9737-485c-4830-bf58-96ac59dc25fe_1370x498.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For a more in-depth technical overview, refer to their ICLR &#8216;21 paper <a href="https://arxiv.org/pdf/2011.02578.pdf">Learning and Evaluating Representations for Deep One-class Classification</a>. Additionally, the accompanying code for this paper can be found on <a href="https://github.com/google-research/deep_representation_one_class">GitHub</a>.</p><h2><a href="https://hai.stanford.edu/news/natural-language-processing-ready-take-legal-hearings">Stanford HAI | Is Natural Language Processing ready to take on legal hearings?</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://hai.stanford.edu/news/natural-language-processing-ready-take-legal-hearings" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4xSW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe437e0f5-e7bd-4e68-ac49-35bf9e2a08f9_960x638.png 424w, https://substackcdn.com/image/fetch/$s_!4xSW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe437e0f5-e7bd-4e68-ac49-35bf9e2a08f9_960x638.png 848w, https://substackcdn.com/image/fetch/$s_!4xSW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe437e0f5-e7bd-4e68-ac49-35bf9e2a08f9_960x638.png 1272w, https://substackcdn.com/image/fetch/$s_!4xSW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe437e0f5-e7bd-4e68-ac49-35bf9e2a08f9_960x638.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4xSW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe437e0f5-e7bd-4e68-ac49-35bf9e2a08f9_960x638.png" width="960" height="638" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/e437e0f5-e7bd-4e68-ac49-35bf9e2a08f9_960x638.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:638,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://hai.stanford.edu/news/natural-language-processing-ready-take-legal-hearings&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4xSW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe437e0f5-e7bd-4e68-ac49-35bf9e2a08f9_960x638.png 424w, https://substackcdn.com/image/fetch/$s_!4xSW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe437e0f5-e7bd-4e68-ac49-35bf9e2a08f9_960x638.png 848w, https://substackcdn.com/image/fetch/$s_!4xSW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe437e0f5-e7bd-4e68-ac49-35bf9e2a08f9_960x638.png 1272w, https://substackcdn.com/image/fetch/$s_!4xSW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe437e0f5-e7bd-4e68-ac49-35bf9e2a08f9_960x638.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We&#8217;ve covered Stanford HAI&#8217;s work <a href="https://mlopsroundup.substack.com/p/issue-9-mlops-tooling-landscape-ai">previously</a> in our newsletter. In a recent article, Stanford researchers <a href="https://profiles.stanford.edu/catalin-voss">Catalin Voss</a> and <a href="https://profiles.stanford.edu/yun-hong">Jenny Hong</a> discussed the opportunity and challenges associated with applying natural language understanding techniques in reviewing legal hearings.</p><h4><strong>Problem</strong></h4><p>A <a href="https://www.justice.gov/uspc/parole-hearings">parole hearing</a> is a hearing to determine whether an inmate should be released from prison to parole supervision to serve the remainder of the sentence. During this hearing, a parole commissioner and a deputy review all the relevant history and life circumstances of the candidate, and then decide whether or not to grant parole. As outlined in the article, each such hearing generates a ~150-page transcript of the entire conversation so it can be reviewed if needed later on. In California alone, there are on average 5000-6000 parole hearings each year.&nbsp;</p><blockquote><p>The governor&#8217;s office and parole review unit are tasked with checking parole decisions, but they lack the resources to read every transcript, so as a matter of practicality, they generally only read transcripts for parole approvals. If parole is denied, unless an appellate attorney or another influential stakeholder pushes for a review, the transcript is usually just archived.</p></blockquote><h4><strong>Recon Approach</strong></h4><p>While manually reading the transcripts in their entirety is not feasible, the authors propose using NLP to &#8220;read&#8221; and summarize these transcripts (flag important factors and data points for each case). This can help scale parole review, and analyzing the outcomes of parole reviews at scale can help understand whether the process is fair/what kinds of biases exist. Further, it can help flag individual cases that seem like outliers. This proposal is discussed in more detail in a forthcoming paper <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3834710">available here</a>.</p><h4><strong>NLP Challenges</strong></h4><p>While massive language models like BERT and GPT-3 have shown performance improvements across a large variety of tasks, the article outlines some existing challenges that would need to be solved for applying NLP to this problem:</p><ol><li><p>Ability to maintain &#8220;state&#8221; over long texts. Legal transcripts can often be in the range of tens of thousands of words</p></li><li><p>Synthesizing information across various pieces of text to answer concrete questions</p></li><li><p>Multi-step (multi-hop) reasoning and query planning to answer questions.</p></li></ol><h4><strong>Our view</strong></h4><p>It is far more common to read about the potential dangers and biases of applying machine learning and AI to new real-world use cases. It is refreshing, in our view, to read about this proposal which aims to reduce bias and improve the fairness of the parole process which currently has limited transparency and oversight. There was a conversation about this on Twitter:</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/chrmanning/status/1430676436596117504&quot;,&quot;full_text&quot;:&quot;This reaction strikes me as knee-jerk not considered. How are parole decisions made now in CA? A parole officer and their deputy interview the candidate, briefly confer and then give a judgment. It&#8217;s a very human system, but if they deny parole is there any review for fairness? &quot;,&quot;username&quot;:&quot;chrmanning&quot;,&quot;name&quot;:&quot;Christopher Manning&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Wed Aug 25 23:40:17 +0000 2021&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{&quot;full_text&quot;:&quot;What on earth? No. https://t.co/3jGJ1081vv&quot;,&quot;username&quot;:&quot;mer__edith&quot;,&quot;name&quot;:&quot;Meredith Whittaker&quot;},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:9,&quot;like_count&quot;:120,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><h2><a href="https://towardsdatascience.com/should-i-train-a-model-for-each-customer-or-use-one-model-for-all-of-my-customers-f9e8734d991 ">Towards Data Science |&nbsp; Should I Train a Model for Each Customer or Use One Model for All of My Customers?</a></h2><p>In <a href="https://towardsdatascience.com/should-i-train-a-model-for-each-customer-or-use-one-model-for-all-of-my-customers-f9e8734d991">this article</a>, <a href="https://www.linkedin.com/in/yonatan-hadar-5842a365/">Yonatan Hadar</a> discusses an interesting challenge faced by many ML teams building B2B software: what are the pros and cons of training a single model across all customers vs training a model per customer (and all strategies that lie in between)?&nbsp;</p><p>Here, we dissect the lessons from the post and add some of our own thoughts.&nbsp;</p><h4><strong>What&#8217;s the goal?</strong></h4><p>At a high level, the goal for any ML team is to create business value through models that perform well (i.e. generalize well) when deployed to production.&nbsp;</p><p>Some good properties of ML systems:</p><ul><li><p>Models should work well for all customers</p></li><li><p>Engineering complexity should be minimized -- training, optimizing, deployment, and monitoring should be as easy as possible</p></li><li><p>Onboarding new customers should be easy (no &#8220;cold start&#8221;)</p></li><li><p>Legal and security constraints should be met&nbsp;</p></li></ul><h4><strong>One Model Per Customer</strong></h4><p>This often performs the best for each customer, since the data distribution between train and test is closest when dealing with just one customer. In certain domains, this might be the only option available to teams, especially when mixing of data across customers isn&#8217;t allowed.&nbsp;</p><p>However, there is less data per customer, and teams have to be more careful when adding a new customer (is there enough data available, should simple heuristics be used while data is collected, etc). The engineering complexity might also be much higher since many more models need to be trained (and complexity only rises if model types and hyperparameter tuning strategies can be changed across customers).&nbsp;</p><h4><strong>One Global Model</strong></h4><p>From an engineering perspective, this is usually the simplest approach. The model is typically trained on a much larger dataset (comprising data from many, if not all, customers) and there is no additional training to be done when onboarding a new customer (the model is always ready to go).&nbsp;</p><p>However, the model is now trained on potentially different distributions (from different customers) leading to an overall loss of accuracy. There might also be subtle problems in the model that affect a small subset of customers (thus not impacting overall KPIs), and without proper monitoring will be hard to debug and fix.&nbsp;</p><h4><strong>One Model Per Customer Segment</strong></h4><p>This has the potential to be the best of both previous worlds. If segments are chosen appropriately, the data distributions will be relatively consistent across customers in a segment, and each segment may also end up with a large training dataset. With the right number of segments, the engineering complexity of the system can also be tuned.&nbsp;</p><p>On the flip side, choosing segments correctly is difficult since there are many ways to segment customers (geography, industry, etc). If done improperly, one might end up in a situation of the worst of both previous worlds.&nbsp;</p><h4><strong>Global Model + Transfer Learning</strong></h4><p>This is only applicable to deep learning models, but this has the potential for high performance on all customers with much fewer data points.&nbsp;</p><p>However, deep learning might not be the right solution in many cases (for example, when you need very low serving latency or easy explainability). This method is still fairly new, so some exploration might be needed for your use cases.&nbsp;</p><h4>Our Take</h4><p>ML teams are often operating with constraints that are unique to their industry and company. While there are many good ways to build and deploy ML systems, having an understanding of the trade-offs involved is useful. </p><h2><a href="https://twitter.com/fishnets88/status/1434067077124562945">Twitter | Bad Labels in Public Datasets</a></h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/fishnets88/status/1434067077124562945&quot;,&quot;full_text&quot;:&quot;Last week I figured I'd try out some techniques to find bad labels in public datasets meant for benchmarking. \n\nIt led me to find plenty of bad labels in the Google Emotions dataset. I'm sad to say it wasn't hard to find them either.\n\n&quot;,&quot;username&quot;:&quot;fishnets88&quot;,&quot;name&quot;:&quot;Vincent D. Warmerdam&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Sat Sep 04 08:13:29 +0000 2021&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:56,&quot;like_count&quot;:263,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{&quot;url&quot;:&quot;https://koaning.io/posts/labels/&quot;,&quot;image&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/9ed791aa-342c-4bac-83f3-0a750ceb2edf_1076x346.png&quot;,&quot;title&quot;:&quot;koaning.io: Bad Labels&quot;,&quot;description&quot;:&quot;GridSearch is Not Enough: Part Six&quot;,&quot;domain&quot;:&quot;koaning.io&quot;},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>In this Twitter thread (and <a href="https://koaning.io/posts/labels/">associated blog post</a>), <a href="https://twitter.com/fishnets88">Vincent Warmerdam</a> explores how easy it is to find incorrect labels in a publicly available dataset, even one that was curated by researchers from Stanford and Google.&nbsp;</p><p>On this <a href="https://arxiv.org/abs/2005.00547">Google Emotions dataset</a>, the author trains a simple high bias, low variance model and then ranks examples where a very low confidence score was given to the correct class. This resulted in plenty of mislabeled examples, such as in the image below:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://koaning.io/posts/labels/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0KGj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2265ca83-4aeb-4658-b767-f23abf476e08_1492x350.png 424w, https://substackcdn.com/image/fetch/$s_!0KGj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2265ca83-4aeb-4658-b767-f23abf476e08_1492x350.png 848w, https://substackcdn.com/image/fetch/$s_!0KGj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2265ca83-4aeb-4658-b767-f23abf476e08_1492x350.png 1272w, https://substackcdn.com/image/fetch/$s_!0KGj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2265ca83-4aeb-4658-b767-f23abf476e08_1492x350.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0KGj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2265ca83-4aeb-4658-b767-f23abf476e08_1492x350.png" width="1456" height="342" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/2265ca83-4aeb-4658-b767-f23abf476e08_1492x350.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:342,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://koaning.io/posts/labels/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0KGj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2265ca83-4aeb-4658-b767-f23abf476e08_1492x350.png 424w, https://substackcdn.com/image/fetch/$s_!0KGj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2265ca83-4aeb-4658-b767-f23abf476e08_1492x350.png 848w, https://substackcdn.com/image/fetch/$s_!0KGj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2265ca83-4aeb-4658-b767-f23abf476e08_1492x350.png 1272w, https://substackcdn.com/image/fetch/$s_!0KGj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2265ca83-4aeb-4658-b767-f23abf476e08_1492x350.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>This is similar to (and inspired by) the work of the team behind <a href="https://labelerrors.com/">labelerrors.com</a>, where they show problems with many popular datasets such as CIFAR, MNIST, etc.&nbsp;If you&#8217;re directly using publicly available data, you might want to use some simple heuristics to ensure that label quality is at an acceptable level.&nbsp;</p><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> or email us at <a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a>. If you like what we are doing please tell your friends and colleagues to spread the word.</p>]]></content:encoded></item><item><title><![CDATA[Issue #25: Tesla AI Day. Feature Stores. NIST on AI bias. Model monitoring tips. AI and COVID.   ]]></title><description><![CDATA[Welcome to the 25th issue of the MLOps newsletter. In this issue, we cover Tesla AI Day, a short paper about gaps in feature stores, updates about the NIST proposal to reduce bias in AI, tips on ML monitoring, and a tech review about the challenges of deploying AI tools for diagnosing COVID.]]></description><link>https://mlopsroundup.substack.com/p/issue-25-tesla-ai-day-feature-stores</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-25-tesla-ai-day-feature-stores</guid><dc:creator><![CDATA[Nihit Desai]]></dc:creator><pubDate>Mon, 23 Aug 2021 17:05:42 GMT</pubDate><enclosure url="https://cdn.substack.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3e868ab3-968c-4d4e-a9b5-c1ac0b815909_1600x531.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oqXR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d4e4b8-7cff-4ee5-b8d5-2875b14d3d16_1000x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oqXR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d4e4b8-7cff-4ee5-b8d5-2875b14d3d16_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!oqXR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d4e4b8-7cff-4ee5-b8d5-2875b14d3d16_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!oqXR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d4e4b8-7cff-4ee5-b8d5-2875b14d3d16_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!oqXR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d4e4b8-7cff-4ee5-b8d5-2875b14d3d16_1000x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oqXR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d4e4b8-7cff-4ee5-b8d5-2875b14d3d16_1000x400.png" width="1000" height="400" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/b4d4e4b8-7cff-4ee5-b8d5-2875b14d3d16_1000x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oqXR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d4e4b8-7cff-4ee5-b8d5-2875b14d3d16_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!oqXR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d4e4b8-7cff-4ee5-b8d5-2875b14d3d16_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!oqXR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d4e4b8-7cff-4ee5-b8d5-2875b14d3d16_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!oqXR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d4e4b8-7cff-4ee5-b8d5-2875b14d3d16_1000x400.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Welcome to the 25th issue of the MLOps newsletter. </p><p>In this issue, we cover Tesla AI Day, a short paper about gaps in feature stores, updates about the NIST proposal to reduce bias in AI, tips on ML monitoring, and a tech review about the challenges of deploying AI tools for diagnosing COVID.</p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><a href="https://www.youtube.com/watch?v=j0z4FweCy4M">Tesla | AI Day</a></h2><div id="youtube2-j0z4FweCy4M" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;j0z4FweCy4M&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/j0z4FweCy4M?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>We&#8217;ve covered updates about Tesla&#8217;s AI for self-driving cars in a <a href="https://mlopsroundup.substack.com/p/issue-15-ai-for-self-driving-at-tesla">prior</a> edition of the newsletter. Tesla recently held an AI Day event to share its research and innovations across the stack for solving self-driving and the general real-world robotics perception &amp; planning problem. It was a fascinating event and highlighted the scope and scale of Tesla&#8217;s AI efforts. We recommend watching it in its entirety. A brief summary of major innovations presented:&nbsp;</p><h4><strong>Network Architecture for Perception &amp; Planning</strong></h4><ul><li><p><a href="https://twitter.com/karpathy">Andrej Karpathy</a> gave an overview of how deep learning models for perception have evolved over the last four years, starting from residual nets for lane-keeping to a fairly complex multi-task architecture trained to predict in the vector space (as opposed to static images)</p></li><li><p>Fusion of sensor data across all 8 car cameras and training the network to do the final prediction on the joint data. This brought a step function improvement compared to the earlier (arguably more straightforward) approach of making predictions on streams from each camera and then combining the predictions.</p></li><li><p>Perception on 4D video for modeling time progression (e.g. which objects are moving and which ones are stationary; what is the speed and trajectory of moving objects)</p></li><li><p>Tesla has started using neural networks not just for perception but also to improve the efficiency of searching through the action space for better planning and control.</p></li></ul><h4><strong>Data Labeling &amp; Augmentation</strong></h4><ul><li><p>Annotating in the vector space means you annotate once, simultaneously creating labels for streams from all 8 input cameras across many timestamps.</p></li><li><p>Autolabeling is used to massively scale up the size of labeled data that the neural net can learn from. The details of how autolabeling is done (by leveraging video clips across different cars in the same physical location) are shared in <a href="https://youtu.be/j0z4FweCy4M?t=5291">this part</a> of the talk.</p></li><li><p>Tesla leverages simulations for constructing training data for rare edge cases that are very unlikely in the real world.</p></li><li><p>Interesting note: Tesla hires &gt;1000 human annotators in-house to work full-time on just data labeling tasks.&nbsp;</p></li></ul><h4><strong>Dojo</strong></h4><ul><li><p>Dojo is Tesla&#8217;s next-generation chip &amp; data center architecture for fast training of large neural networks. It is still under development and expected to be rolled out next year.</p></li><li><p>The &#8220;exapod&#8221; setup, which brings together 3000 of these D1 chips, should enable 1.1 exaflops of configurable-FP8 compute.</p></li><li><p>From an ML engineer&#8217;s point of view, training on Dojo would involve just a one-line change to the code which makes it super easy to switch over.&nbsp;</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tyTB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faecdf3fc-1cf4-4b1b-aeda-4949e1e385f1_1600x827.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tyTB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faecdf3fc-1cf4-4b1b-aeda-4949e1e385f1_1600x827.png 424w, https://substackcdn.com/image/fetch/$s_!tyTB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faecdf3fc-1cf4-4b1b-aeda-4949e1e385f1_1600x827.png 848w, https://substackcdn.com/image/fetch/$s_!tyTB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faecdf3fc-1cf4-4b1b-aeda-4949e1e385f1_1600x827.png 1272w, https://substackcdn.com/image/fetch/$s_!tyTB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faecdf3fc-1cf4-4b1b-aeda-4949e1e385f1_1600x827.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tyTB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faecdf3fc-1cf4-4b1b-aeda-4949e1e385f1_1600x827.png" width="1456" height="753" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/aecdf3fc-1cf4-4b1b-aeda-4949e1e385f1_1600x827.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:753,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tyTB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faecdf3fc-1cf4-4b1b-aeda-4949e1e385f1_1600x827.png 424w, https://substackcdn.com/image/fetch/$s_!tyTB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faecdf3fc-1cf4-4b1b-aeda-4949e1e385f1_1600x827.png 848w, https://substackcdn.com/image/fetch/$s_!tyTB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faecdf3fc-1cf4-4b1b-aeda-4949e1e385f1_1600x827.png 1272w, https://substackcdn.com/image/fetch/$s_!tyTB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faecdf3fc-1cf4-4b1b-aeda-4949e1e385f1_1600x827.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><a href="https://arxiv.org/pdf/2108.05053.pdf">Paper | Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://arxiv.org/pdf/2108.05053.pdf" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sJ31!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3e868ab3-968c-4d4e-a9b5-c1ac0b815909_1600x531.png 424w, https://substackcdn.com/image/fetch/$s_!sJ31!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3e868ab3-968c-4d4e-a9b5-c1ac0b815909_1600x531.png 848w, https://substackcdn.com/image/fetch/$s_!sJ31!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3e868ab3-968c-4d4e-a9b5-c1ac0b815909_1600x531.png 1272w, https://substackcdn.com/image/fetch/$s_!sJ31!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3e868ab3-968c-4d4e-a9b5-c1ac0b815909_1600x531.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sJ31!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3e868ab3-968c-4d4e-a9b5-c1ac0b815909_1600x531.png" width="1456" height="483" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/3e868ab3-968c-4d4e-a9b5-c1ac0b815909_1600x531.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:483,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://arxiv.org/pdf/2108.05053.pdf&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sJ31!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3e868ab3-968c-4d4e-a9b5-c1ac0b815909_1600x531.png 424w, https://substackcdn.com/image/fetch/$s_!sJ31!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3e868ab3-968c-4d4e-a9b5-c1ac0b815909_1600x531.png 848w, https://substackcdn.com/image/fetch/$s_!sJ31!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3e868ab3-968c-4d4e-a9b5-c1ac0b815909_1600x531.png 1272w, https://substackcdn.com/image/fetch/$s_!sJ31!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3e868ab3-968c-4d4e-a9b5-c1ac0b815909_1600x531.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a short paper that discusses one of the shortcomings of today&#8217;s conception of feature stores -- that of dealing with embeddings as features, given that feature stores have typically been built for tabular data.&nbsp;</p><h4><strong>What is a Feature Store?&nbsp;</strong></h4><p>Feature Stores provide a centralized repository of reusable features during ML training and inference (you can also read our discussion <a href="https://mlopsroundup.substack.com/p/issue-13-feature-stores-information">here</a>, <a href="https://mlopsroundup.substack.com/p/issue-17-ftc-guidance-on-ai-feature">here</a>, and <a href="https://mlopsroundup.substack.com/p/issue-6-mlops-resources-feature-stores">here</a>). Features can be constructed using tabular data or stream data and are saved for training and serving time:</p><blockquote><p>Users provide simple definitional metadata, e.g., the feature update cadence and a definition SQL query, and upload the definition to the FS. When the underlying data changes, the FS orchestrates the updates to the features based on the user-defined cadence.</p><p>For streaming features, users provide aggregation functions that are applied on the raw streaming features. The aggregated features are persisted to the online store and logged to the offline store&#8230;</p><p>Once a model is deployed, features need to be continuously provided to deployed models even as the feature data is updated over time. To provide low latency feature serving, FSs are typically a dual datastore: one for offline training (e.g., SQL warehouse) and for online serving (e.g., in-memory DBMS).</p></blockquote><h4><strong>Where&#8217;s the gap?&nbsp;</strong></h4><p>Many ML systems today rely on embeddings that are derived from raw data (see examples from <a href="https://blog.twitter.com/engineering/en_us/topics/insights/2018/embeddingsattwitter">Twitter</a> and <a href="https://doordash.engineering/2018/04/02/personalized-store-feed-with-vector-embeddings/">Doordash</a>). These embeddings are learned in a self-supervised manner; there are no labels and traditional metrics for data quality don&#8217;t apply here. These embeddings also need to be saved for training and inference time, and the dual datastores might not be an ideal fit for such embeddings. Finally, monitoring and detecting drifts with embeddings is a very different challenge compared to tabular data.&nbsp;</p><h4><strong>Solutions</strong></h4><p>In the past, we have covered vector databases (such as the <a href="https://cloud.google.com/blog/products/ai-machine-learning/vertex-matching-engine-blazing-fast-and-massively-scalable-nearest-neighbor-search">Google Vertex Matching Engine</a> covered <a href="https://mlopsroundup.substack.com/p/issue-23-ai-regulations-efficient">here</a>). These datastores are optimized for storage, retrieval, and search over embeddings and are a more natural fit for embeddings.&nbsp;</p><p>We are excited to see new technologies develop in this space to address the varieties of data types used by the ML community!&nbsp;</p><h2><a href="https://www.nist.gov/news-events/news/2021/06/nist-proposes-approach-reducing-risk-bias-artificial-intelligence">NIST Proposal: Reducing Risk of Bias in AI</a></h2><p>Potential bias in machine learning models is a <a href="https://www.mckinsey.com/featured-insights/artificial-intelligence/tackling-bias-in-artificial-intelligence-and-in-humans">well-recognized problem</a> in academia and industry today. In an effort to counter this, the National Institute of Standards and Technology (NIST) is advancing a proposal to identify and reduce the downstream risk of bias in AI.&nbsp;</p><p>The draft proposal is shared in <a href="https://doi.org/10.6028/NIST.SP.1270-draft">NIST Special Publication 1270</a>. The approach proposed in this report consists of three distinct stages, and inputs from accompanying stakeholders at each stage:&nbsp;</p><blockquote><p>1. PRE-DESIGN: where the technology is devised, defined and elaborated&nbsp;</p><p>2. DESIGN AND DEVELOPMENT: where the technology is constructed&nbsp;</p><p>3. DEPLOYMENT: where technology is used by, or applied to, various individuals or groups.&nbsp;</p></blockquote><p>NIST is accepting comments on the document until Sept. 10, 2021. Authors seek to use these responses to help shape the agenda of collaborative virtual events in the coming months. Comments can be submitted by downloading and completing this <a href="https://www.nist.gov/document/draft-nist-sp-1270-public-comment-template-excel">template form</a> and sending it to ai-bias@list.nist.gov</p><h2><a href="https://building.nubank.com.br/ml-model-monitoring-9-tips-from-the-trenches/">Nubank Blog | ML Model Monitoring &#8211; 9 Tips From the Trenches</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://building.nubank.com.br/ml-model-monitoring-9-tips-from-the-trenches/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mYB5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8dad7b3-6e26-4d45-82d5-cc832b9a87a7_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!mYB5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8dad7b3-6e26-4d45-82d5-cc832b9a87a7_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!mYB5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8dad7b3-6e26-4d45-82d5-cc832b9a87a7_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!mYB5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8dad7b3-6e26-4d45-82d5-cc832b9a87a7_1200x675.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mYB5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8dad7b3-6e26-4d45-82d5-cc832b9a87a7_1200x675.png" width="1200" height="675" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/c8dad7b3-6e26-4d45-82d5-cc832b9a87a7_1200x675.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://building.nubank.com.br/ml-model-monitoring-9-tips-from-the-trenches/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mYB5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8dad7b3-6e26-4d45-82d5-cc832b9a87a7_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!mYB5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8dad7b3-6e26-4d45-82d5-cc832b9a87a7_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!mYB5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8dad7b3-6e26-4d45-82d5-cc832b9a87a7_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!mYB5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8dad7b3-6e26-4d45-82d5-cc832b9a87a7_1200x675.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We always enjoy reading articles from ML teams in industry -- their lessons are hard-won and extremely applicable. This week we cover tips on ML monitoring from the Latin American company Nubank (now the <a href="https://techcrunch.com/2021/06/08/fintech-all-star-nubank-raises-a-750m-mega-round/">largest digital bank in the world</a> by number of customers: 40 million).</p><h4><strong>Why ML Monitoring?</strong></h4><blockquote><p>&#8220;Machine learning (ML) models are very sensitive pieces of software; their successful use needs careful monitoring to make sure they are working correctly.</p><p>This is especially true when business decisions are made automatically using the outputs of said models. This means that faulty models will very often have a real impact on the end-customer experience.</p><p>Models are only as good as the data they consume, so monitoring the input data (and the outputs) is critical for the model to fulfill its real objective: to be useful to drive good decisions and help the business reach its objectives.&#8221;</p></blockquote><h4><strong>Lessons</strong></h4><ul><li><p><strong>Averages don&#8217;t tell the full story</strong>: Monitoring average values for numerical features can lead to an incomplete picture. Tools often don&#8217;t know how to deal with missing (or null data) and assume that data problems will be large enough to move averages significantly. It&#8217;s better to monitor percentiles (99th, 95th, 90th, 10th, 5th, 1st, etc) along with missing value rates for all features.&nbsp;</p></li><li><p><strong>Break monitoring into subpopulations for better insights</strong>: It can be easier to understand data by breaking it up into subpopulations that are monitored separately.&nbsp;</p></li><li><p><strong>Consistency reduces the mental burden of monitoring</strong>: Monitoring tools and dashboards should be consistent and standardized. A single tool for monitoring is best with consistent naming of files/datasets/dashboards and these artifacts should be ordered based on priority to end-users.&nbsp;</p></li><li><p><strong>Patterns for alerting</strong>: Real-time alerts (emails, Slack messages) can easily end up being &#8220;too noisy&#8221; (going off too often and people not taking them seriously) or &#8220;not sensitive at all&#8221; (not going off even when they should). The authors recommend always including the time frame and specific data points, for example: </p><p><em>&#8220;Average value of Feature X in model Y for the past 15 minutes was too high (expected between 0.4 and 0.5, but got 100.0 instead)&#8221;</em></p></li><li><p><strong>Monitor monitoring jobs/routines themselves (meta-monitoring)</strong>: Model monitoring tools are just another piece of software and they can stop working from time to time. It&#8217;s good to monitor execution times for monitoring jobs and have heartbeat-style alerts.&nbsp;</p></li></ul><h4><strong>Conclusion</strong></h4><p>As the authors say:</p><blockquote><p>&#8220;These are a couple of tips we found useful for monitoring several ML models here at Nubank.&nbsp;</p><p>They are used in a variety of business contexts (credit, fraud, CX, Operations, etc) and we believe they are general enough to be applicable in other companies too.&#8221;</p></blockquote><h2><a href="https://www.technologyreview.com/2021/07/30/1030329/machine-learning-ai-failed-covid-hospital-diagnosis-pandemic/">MIT Technology Review | Hundreds of AI tools have been built to catch covid. None of them helped.</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.technologyreview.com/2021/07/30/1030329/machine-learning-ai-failed-covid-hospital-diagnosis-pandemic/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QuiO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F98760723-53c7-4948-a4ff-51f07deae46b_1600x1069.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QuiO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F98760723-53c7-4948-a4ff-51f07deae46b_1600x1069.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QuiO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F98760723-53c7-4948-a4ff-51f07deae46b_1600x1069.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QuiO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F98760723-53c7-4948-a4ff-51f07deae46b_1600x1069.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QuiO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F98760723-53c7-4948-a4ff-51f07deae46b_1600x1069.jpeg" width="1456" height="973" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/98760723-53c7-4948-a4ff-51f07deae46b_1600x1069.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:973,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.technologyreview.com/2021/07/30/1030329/machine-learning-ai-failed-covid-hospital-diagnosis-pandemic/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QuiO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F98760723-53c7-4948-a4ff-51f07deae46b_1600x1069.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QuiO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F98760723-53c7-4948-a4ff-51f07deae46b_1600x1069.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QuiO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F98760723-53c7-4948-a4ff-51f07deae46b_1600x1069.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QuiO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F98760723-53c7-4948-a4ff-51f07deae46b_1600x1069.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A recent article published in MIT Technology Review highlights the challenges of building AI-powered diagnosis and treatment tools for COVID-19, but many of the observations and learnings can be generalized to healthcare.</p><h4><strong>The Problem&nbsp;</strong></h4><p>Early in March/April 2020, hospitals around the world scrambled to treat potential COVID-19 cases. In order to help hospitals scale and better allocate resources, many <a href="https://www.technologyreview.com/2020/04/23/1000410/ai-triage-covid-19-patients-health-care/">AI efforts</a> were initiated to diagnose the disease. A recent <a href="https://www.turing.ac.uk/sites/default/files/2021-06/data-science-and-ai-in-the-age-of-covid_full-report_2.pdf">report</a> by Turing Institute in the UK however concluded that AI tools had very little to zero positive impact in helping hospitals fight COVID.</p><p>This conclusion is largely in line with a similar one reached by Derek Driggs, a machine-learning researcher at the University of Cambridge. They looked at deep learning models for diagnosing covid from medical images (X-rays, CT scans). Out of the 415 such published tools, they <a href="https://www.nature.com/articles/s42256-021-00307-0">concluded</a> that none were fit for being deployed in clinics (we had covered this briefly <a href="https://mlopsroundup.substack.com/p/issue-20-ai-playbook-curating-data">here</a>).</p><h4><strong>What caused it?&nbsp;</strong></h4><p>In a nutshell, data quality. The article illustrates some failure cases due to data quality and data skew issues that seemed to systematically plague these models. A couple of examples:</p><ul><li><p>Some approaches unknowingly used datasets that contained medical images of children (where the prevalence of COVID is much lower). As a result, the model inadvertently learned to classify kids vs adults</p></li><li><p>The source of ground truth labels was the doctor&#8217;s diagnosis of whether the chest scans showed signs of covid (versus actual ground truth from test results such as PCR). This introduced noise and bias in the data.</p></li></ul><h4><strong>What can we learn?&nbsp;</strong></h4><p>The article summarizes suggestions from healthcare professionals about what we can learn and how to fix this problem:&nbsp;&nbsp;</p><ul><li><p><strong>Transparency and data sharing</strong>: Disclosing the source of your data, and if possible releasing the datasets along with the models. While this might not always be possible in healthcare contexts, at the very least sharing the source with doctors and healthcare professionals who intend to use these models in the field can be helpful for them to make informed judgments.</p></li><li><p><strong>Data standardization</strong>: Lack of standardization makes it hard for healthcare data from different sources (hospitals, countries) to be integrated.&nbsp;</p></li><li><p><strong>Model validation</strong>:&nbsp;Validating models from other teams might not be the most glamorous work, but it can help take <em>&#8220;tech from &#8216;lab bench to bedside.&#8217;&#8221;</em></p></li></ul><h2><a href="https://twitter.com/_brohrer_/status/1425770502321283073">Twitter: ML Strategy tip</a></h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/_brohrer_/status/1425770502321283073&quot;,&quot;full_text&quot;:&quot;ML strategy tip\n\nWhen you have a problem, build two solutions - a deep Bayesian transformer running on multicloud Kubernetes and a SQL query built on a stack of egregiously oversimplifying assumptions. Put one on your resume, the other in production. Everyone goes home happy.&quot;,&quot;username&quot;:&quot;_brohrer_&quot;,&quot;name&quot;:&quot;Brandon Rohrer&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Thu Aug 12 10:45:51 +0000 2021&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:585,&quot;like_count&quot;:3690,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>We love this tweet by <a href="https://twitter.com/_brohrer_">Brandon Rohrer</a>. We have both built &#8220;very simple models&#8221; (read: heuristics) in the past to solve problems, and cannot agree with the sentiment more. As ML practitioners in industry, our job is to create business value -- only <strong>sometimes</strong> does it require ML!&nbsp;</p><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> or email us at <a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a>. If you like what we are doing please tell your friends and colleagues to spread the word.</p>]]></content:encoded></item><item><title><![CDATA[Issue #24: AI at Porsche. Efficient Inference. Bootstrapping Labels. Nearest-Neighbor Benchmarks. ]]></title><description><![CDATA[Welcome to the 24th issue of the MLOps newsletter.]]></description><link>https://mlopsroundup.substack.com/p/issue-24-ai-at-porsche-efficient</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-24-ai-at-porsche-efficient</guid><dc:creator><![CDATA[Rishabh Bhargava]]></dc:creator><pubDate>Mon, 09 Aug 2021 17:14:02 GMT</pubDate><enclosure url="https://cdn.substack.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe18862fc-bf40-4ab1-aeda-eb27a55e9b66_1048x647.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 24th issue of the MLOps newsletter.&nbsp;</p><p>In this issue, we cover an interesting approach to autonomous vehicles at Porsche, discuss strategies for the efficient serving of ML models, explore the bootstrapping of labels with weak supervision and share some recent resources. </p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><a href="https://newsroom.porsche.com/en/2021/innovation/porsche-engineering-big-data-loop-25029.html">The Big Loop: artificial intelligence and machine learning</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://newsroom.porsche.com/en/2021/innovation/porsche-engineering-big-data-loop-25029.html" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NSFc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2023e1c-6c52-4440-a607-ed8b106a8aad_1257x1600.png 424w, https://substackcdn.com/image/fetch/$s_!NSFc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2023e1c-6c52-4440-a607-ed8b106a8aad_1257x1600.png 848w, https://substackcdn.com/image/fetch/$s_!NSFc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2023e1c-6c52-4440-a607-ed8b106a8aad_1257x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!NSFc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2023e1c-6c52-4440-a607-ed8b106a8aad_1257x1600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NSFc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2023e1c-6c52-4440-a607-ed8b106a8aad_1257x1600.png" width="678" height="863.0071599045347" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/a2023e1c-6c52-4440-a607-ed8b106a8aad_1257x1600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1600,&quot;width&quot;:1257,&quot;resizeWidth&quot;:678,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://newsroom.porsche.com/en/2021/innovation/porsche-engineering-big-data-loop-25029.html&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NSFc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2023e1c-6c52-4440-a607-ed8b106a8aad_1257x1600.png 424w, https://substackcdn.com/image/fetch/$s_!NSFc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2023e1c-6c52-4440-a607-ed8b106a8aad_1257x1600.png 848w, https://substackcdn.com/image/fetch/$s_!NSFc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2023e1c-6c52-4440-a607-ed8b106a8aad_1257x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!NSFc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2023e1c-6c52-4440-a607-ed8b106a8aad_1257x1600.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Porsche Engineering recently shared this exploration of how an autonomous vehicle could learn from experience and react intuitively to new circumstances. The article describes a very specific use case -- detecting lane changes in front of a car. While this article doesn&#8217;t talk about any other elements of self-driving, the process they describe is interesting and reminiscent of Tesla, which we discussed <a href="https://mlopsroundup.substack.com/p/issue-15-ai-for-self-driving-at-tesla">earlier</a>.&nbsp;</p><h4><strong>The Problem</strong></h4><p>Their vehicles are equipped with an enhanced Adaptive Cruise Control (ACC) system which ensures that a safe distance is maintained from the vehicle in front, detecting early when other road users are cutting in, etc. With ACC: </p><blockquote><p>&#8220;A likely lane change is detected half a second to a second earlier &#8211; the equivalent of 30 metres of driving on the motorway,&#8221; explains Dr. Joachim Schaper, Senior Manager AI and Big Data at Porsche Engineering.</p></blockquote><p>The model needs to be continuously improved, but going through all the data that is being generated is too consuming.&nbsp;</p><blockquote><p>&#8220;&#8230;we only want to record the data that really helps the system move forward,&#8221; says project manager Philipp Wustmann, an expert in longitudinal and lateral control at Porsche Engineering. &#8220;That's no easy task, because radar sensors and cameras generate an immense amount of data, most of which is not relevant to the function under consideration.&#8221;</p></blockquote><h4><strong>Solution</strong></h4><p>They follow the following process to improve the model:</p><ul><li><p>Find scenes where the ACC is not reacting optimally in the car (using some heuristics) and send them to their servers in the cloud</p></li><li><p>Augment this with additional simulated scenes that generate extra training data without more drives</p></li><li><p>Train a new model and validate it on unseen data</p></li><li><p>Push the model to the car and allow the new version of the model to be activated by the driver</p></li></ul><p>They are now using this technical approach for other development projects, and we believe that elements of this approach are much more broadly useful.&nbsp;</p><h2><a href="https://www.oreilly.com/content/efficient-machine-learning-inference/">Efficient Machine Learning Inference</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.oreilly.com/content/efficient-machine-learning-inference/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sovx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe18862fc-bf40-4ab1-aeda-eb27a55e9b66_1048x647.png 424w, https://substackcdn.com/image/fetch/$s_!sovx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe18862fc-bf40-4ab1-aeda-eb27a55e9b66_1048x647.png 848w, https://substackcdn.com/image/fetch/$s_!sovx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe18862fc-bf40-4ab1-aeda-eb27a55e9b66_1048x647.png 1272w, https://substackcdn.com/image/fetch/$s_!sovx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe18862fc-bf40-4ab1-aeda-eb27a55e9b66_1048x647.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sovx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe18862fc-bf40-4ab1-aeda-eb27a55e9b66_1048x647.png" width="1048" height="647" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/e18862fc-bf40-4ab1-aeda-eb27a55e9b66_1048x647.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:647,&quot;width&quot;:1048,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.oreilly.com/content/efficient-machine-learning-inference/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sovx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe18862fc-bf40-4ab1-aeda-eb27a55e9b66_1048x647.png 424w, https://substackcdn.com/image/fetch/$s_!sovx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe18862fc-bf40-4ab1-aeda-eb27a55e9b66_1048x647.png 848w, https://substackcdn.com/image/fetch/$s_!sovx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe18862fc-bf40-4ab1-aeda-eb27a55e9b66_1048x647.png 1272w, https://substackcdn.com/image/fetch/$s_!sovx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe18862fc-bf40-4ab1-aeda-eb27a55e9b66_1048x647.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a fascinating article that discusses strategies for serving multiple ML models in scenarios where latencies matter.&nbsp;</p><h4><strong>Why is this important?&nbsp;</strong></h4><p>Let&#8217;s imagine your team is responsible for serving multiple ML models, and having low latency inference is important for your business. There could be a wide variation in the kinds of models and use cases they are satisfying.</p><blockquote><p>Some models are large while others are small...Other models get periodic bursts of traffic, while others have consistent load. Yet another dimension to think about is the cost per query of a model: some models are expensive to run, others are quite cheap.</p></blockquote><p>Typically, teams will provision for one model per host -- this ensures (relatively) predictable latency since you just have to track per-host throughput. This can then be horizontally scaled for peak traffic per model, and you should have sufficient capacity. However, these provisioned servers are often in excess of what is necessary at most times, and if you choose to reduce costs by reducing the size of the VM, that often has a negative impact on latency.&nbsp;</p><h4><strong>So what should one do?</strong></h4><blockquote><p>Multi-model serving, defined as hosting multiple models in the same host (or in the same VM), can help mitigate this waste. Sharing the compute capacity of each server across multiple models can dramatically reduce costs, especially when there is insufficient load to saturate a minimally replicated set of servers. With proper load balancing, a single server could potentially serve many models receiving few queries alongside a few models receiving more queries,&nbsp; taking advantage of idle cycles.</p></blockquote><p>They then go on to show numbers from a real-world example. They gather the traffic in queries per second (QPS) for 19 models over a period of a week.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.oreilly.com/content/efficient-machine-learning-inference/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4IlW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1fd39da6-7503-428d-929b-9dcf3e67ff1f_1048x648.png 424w, https://substackcdn.com/image/fetch/$s_!4IlW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1fd39da6-7503-428d-929b-9dcf3e67ff1f_1048x648.png 848w, https://substackcdn.com/image/fetch/$s_!4IlW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1fd39da6-7503-428d-929b-9dcf3e67ff1f_1048x648.png 1272w, https://substackcdn.com/image/fetch/$s_!4IlW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1fd39da6-7503-428d-929b-9dcf3e67ff1f_1048x648.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4IlW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1fd39da6-7503-428d-929b-9dcf3e67ff1f_1048x648.png" width="1048" height="648" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/1fd39da6-7503-428d-929b-9dcf3e67ff1f_1048x648.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:648,&quot;width&quot;:1048,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.oreilly.com/content/efficient-machine-learning-inference/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4IlW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1fd39da6-7503-428d-929b-9dcf3e67ff1f_1048x648.png 424w, https://substackcdn.com/image/fetch/$s_!4IlW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1fd39da6-7503-428d-929b-9dcf3e67ff1f_1048x648.png 848w, https://substackcdn.com/image/fetch/$s_!4IlW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1fd39da6-7503-428d-929b-9dcf3e67ff1f_1048x648.png 1272w, https://substackcdn.com/image/fetch/$s_!4IlW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1fd39da6-7503-428d-929b-9dcf3e67ff1f_1048x648.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As can be seen, most models have infrequent traffic (and some spikes), but one model has high constant traffic and large spikes. When these models are deployed on a standard VM (in Google Cloud, which is where the example is from), the 99th percentile latencies for the models fall into different ranges (depending on the type of the model) ranging from 1ms to 1000ms.</p><p>Now, if all the models were to be hosted on a single, much larger VM, the 99th percentile latencies range between 4ms - 40ms (see the article for all the charts). Keeping all 19 models in memory consumes 40GB of RAM, but it is easy to find machines in the cloud that easily satisfy that requirement. All the more, this is done while keeping monthly costs 1-2 orders of magnitude lower.&nbsp;</p><p>Your mileage may vary (especially if you have only models with consistently high QPS), but these results are worth keeping in mind.&nbsp;</p><h4><strong>Conclusion</strong></h4><p>The authors put it well:</p><blockquote><p>Multi-model serving enables lower cost while maintaining high availability and acceptable latency, by better using the RAM capacity of large VMs. While it is common and simple to deploy only one model per server, instead load a large number of models on a large VM that offers low latency, which should offer acceptable latency at a lower cost. These cost savings also apply to serving on accelerators such as GPUs.</p></blockquote><p>If you&#8217;re also responsible for deploying a large number of models and need to serve inferences in real-time, this advice may be for you.&nbsp;One caveat: this approach leads to a coupling between models to some extent. Any correlation between the traffic experienced by a few models will have an outsized impact on the system. </p><h2><a href="https://eugeneyan.com/writing/bootstrapping-data-labels/">Bootstrapping Labels via ___ Supervision &amp; Human-In-The-Loop</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://eugeneyan.com/writing/bootstrapping-data-labels/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2xro!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc7f8014-d698-4eb3-8361-ca5b154264e2_800x766.png 424w, https://substackcdn.com/image/fetch/$s_!2xro!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc7f8014-d698-4eb3-8361-ca5b154264e2_800x766.png 848w, https://substackcdn.com/image/fetch/$s_!2xro!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc7f8014-d698-4eb3-8361-ca5b154264e2_800x766.png 1272w, https://substackcdn.com/image/fetch/$s_!2xro!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc7f8014-d698-4eb3-8361-ca5b154264e2_800x766.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2xro!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc7f8014-d698-4eb3-8361-ca5b154264e2_800x766.png" width="454" height="434.705" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/fc7f8014-d698-4eb3-8361-ca5b154264e2_800x766.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:766,&quot;width&quot;:800,&quot;resizeWidth&quot;:454,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://eugeneyan.com/writing/bootstrapping-data-labels/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2xro!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc7f8014-d698-4eb3-8361-ca5b154264e2_800x766.png 424w, https://substackcdn.com/image/fetch/$s_!2xro!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc7f8014-d698-4eb3-8361-ca5b154264e2_800x766.png 848w, https://substackcdn.com/image/fetch/$s_!2xro!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc7f8014-d698-4eb3-8361-ca5b154264e2_800x766.png 1272w, https://substackcdn.com/image/fetch/$s_!2xro!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc7f8014-d698-4eb3-8361-ca5b154264e2_800x766.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Human-in-the-loop process for annotation QA</figcaption></figure></div><p>In a recent article, Eugene Yan shared a summary of prevailing approaches to bootstrap labeled training data in real-world machine learning applications&nbsp; - an underappreciated but important problem in our view. It discusses semi-supervised, active, and weakly-supervised learning, along with examples from DoorDash, Facebook, Google, and Apple.</p><ul><li><p><strong>Semi-supervised learning: </strong>This approach combines a small amount of labeled data with a larger amount of unlabeled data during training, and aims to improve upon the performance that can be obtained from just the labeled data (supervised learning) or unlabeled data (unsupervised learning).&nbsp;</p></li><li><p><strong>Active Learning: </strong>Active learning aims to select the most interesting/informative unlabeled examples that should be labeled to improve the model performance. While multiple metrics can be used in active learning, one that&#8217;s commonly used is to select examples that are the &#8220;hardest&#8221; for the model (one where the model is the most uncertain). Active learning is often used to solve the &#8220;cold start&#8221; problem and set up a feedback loop to iteratively improve the performance of ML models. In <a href="https://doordash.engineering/2020/08/28/overcome-the-cold-start-problem-in-menu-item-tagging/">this blog post</a>, Doordash shared details of their human-in-the-loop active learning system for menu item tagging (a version of which we covered <a href="https://mlopsroundup.substack.com/p/issue-16-eu-ai-regulations-data-centric">here</a>). In <a href="https://arxiv.org/abs/2007.00077">this paper</a>, researchers at <a href="https://arxiv.org/abs/2007.00077">Facebook AI</a> shared details of an approach to couple active learning with similarity search around positive labels in cases of skewed datasets. This is especially relevant when the underlying detection problem of interest has a low prevalence (e.g. integrity problems like bullying, hate speech, misinformation) and we covered this in <a href="https://mlopsroundup.substack.com/p/issue-23-ai-regulations-efficient">our last issue</a>.</p></li><li><p><strong>Weak Supervision</strong>: Related to semi-supervised learning, weak supervision aims to combine multiple (often noisy and imprecise) sources of labels to generate an additional source of information. These sources could include heuristics, regex rules, prior trained models, etc. In this joint <a href="https://arxiv.org/pdf/1812.00417.pdf">paper</a>, researchers at Google and Snorkel share some case studies of Snorkel Drybell, a weak supervision system deployed internally at Google.&nbsp;&nbsp;&nbsp;</p></li></ul><h4><strong>Putting it all together</strong></h4><p>Given a large variety of constraints and priorities in real-world ML scenarios, it&#8217;s unlikely that there will be a single &#8220;globally optimal&#8221; path to acquire good quality and a large volume of labels over time. Nonetheless, the article shares some heuristics for when to apply which technique: On day 1, perhaps using heuristics or weak supervision is a good approach to get your product working end to end. Afterward, some combination of active learning and label denoising might work well as evidenced in the several examples in the article.</p><h2>New Resources for Machine Learning and MLOps</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://github.com/erikbern/ann-benchmarks#evaluated" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!V0c0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F296cdaba-866c-4b7e-a1c1-8fb0e23c371b_1168x778.png 424w, https://substackcdn.com/image/fetch/$s_!V0c0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F296cdaba-866c-4b7e-a1c1-8fb0e23c371b_1168x778.png 848w, https://substackcdn.com/image/fetch/$s_!V0c0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F296cdaba-866c-4b7e-a1c1-8fb0e23c371b_1168x778.png 1272w, https://substackcdn.com/image/fetch/$s_!V0c0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F296cdaba-866c-4b7e-a1c1-8fb0e23c371b_1168x778.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!V0c0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F296cdaba-866c-4b7e-a1c1-8fb0e23c371b_1168x778.png" width="1168" height="778" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/296cdaba-866c-4b7e-a1c1-8fb0e23c371b_1168x778.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:778,&quot;width&quot;:1168,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://github.com/erikbern/ann-benchmarks#evaluated&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!V0c0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F296cdaba-866c-4b7e-a1c1-8fb0e23c371b_1168x778.png 424w, https://substackcdn.com/image/fetch/$s_!V0c0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F296cdaba-866c-4b7e-a1c1-8fb0e23c371b_1168x778.png 848w, https://substackcdn.com/image/fetch/$s_!V0c0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F296cdaba-866c-4b7e-a1c1-8fb0e23c371b_1168x778.png 1272w, https://substackcdn.com/image/fetch/$s_!V0c0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F296cdaba-866c-4b7e-a1c1-8fb0e23c371b_1168x778.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We&#8217;ve recently come across a few useful resources that we&#8217;d like to share:</p><p><strong><a href="https://github.com/erikbern/ann-benchmarks#evaluated">Approximate Near-Neighbor search benchmarks</a>:</strong>&nbsp; We&#8217;ve covered applications related to approximate nearest-neighbor (ANN) similarity search earlier in a <a href="https://mlopsroundup.substack.com/p/issue-23-ai-regulations-efficient">prior</a> issue of the newsletter. <a href="https://github.com/erikbern/ann-benchmarks#evaluated">This Github repo</a> aims to benchmark various implementations of ANN search on a collection of public datasets. We liked the attention to usability in this repository -- datasets are already pre-generated and each implementation has a docker container. If you&#8217;re considering implementing ANN for your use case, we highly recommend checking this out to understand which implementation might best suit your needs.</p><p><strong><a href="https://github.com/MAIF/shapash#readme">Shapash</a>:</strong> A neat Python library for ML interpretability based on SHAP and LIME. This library allows data scientists or end-users of a machine learning model to generate local and global explanations for predictions (features contributions) and understand feature correlations.&nbsp;</p><p><strong><a href="https://questdb.io/">QuestDB</a>:</strong> QuestDB is an open-source database for time-series or events data with a focus on performance. In <a href="https://questdb.io/time-series-benchmark-suite/">benchmarks</a> shared on their website, QuestDB outperforms other popular alternatives like InfluxDB and might be suitable for your use case especially if you&#8217;re looking for real-time query-ability on a high volume of data.</p><h2><a href="https://twitter.com/neal_lathia/status/1423265179513630721">Twitter | Is pushing ML models to production really the hard part?&nbsp;</a></h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/neal_lathia/status/1423265179513630721&quot;,&quot;full_text&quot;:&quot;The two worlds of ML in production:\n\nMLOps startup &#127757;: the biggest challenge in ML is shipping models to production\n\nMedium blogger &#127757;: here&#8217;s how easy it is to wrap an ML model in a flask API&quot;,&quot;username&quot;:&quot;neal_lathia&quot;,&quot;name&quot;:&quot;Neal Lathia&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Thu Aug 05 12:50:36 +0000 2021&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:71,&quot;like_count&quot;:644,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p><a href="https://twitter.com/neal_lathia">Neal Lathia</a>&#8217;s recent Twitter thread highlighted the two (and one might say somewhat at odds) points of view in the MLOps world, namely:&nbsp;</p><ol><li><p>The biggest challenge is shipping models to production</p></li><li><p>Shipping models to production is the easy part: wrap it behind an API and call it from your service (for e.g. using Flask or FastAPI)</p></li></ol><p>The thread generated interesting discussions. We recommend reading the various reply threads and conversations to get a complete picture, but we highlight a couple that we found especially relevant:&nbsp;</p><ul><li><p>Batch and point predictions might need different approaches (<a href="https://twitter.com/CHARLESNIKOV/status/1423370061159571474?s=20">thread</a>)</p></li><li><p>The biggest question for companies is really the value added by deploying machine learning (<a href="https://twitter.com/eugeneyan/status/1423341385323749385?s=20">thread</a>)</p></li><li><p>While deploying a model behind a flask API (or equivalent) is easy, questions around scaling, authentication and reliability still need to be solved (<a href="https://twitter.com/_inc0_/status/1423396450063437826?s=20">thread</a>)</p></li></ul><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> or email us at <a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a>. If you like what we are doing please tell your friends and colleagues to spread the word.</p>]]></content:encoded></item><item><title><![CDATA[Issue #23: AI Regulations. Efficient Active Learning. Model Health Assurance. Vertex Matching Engine.]]></title><description><![CDATA[Welcome to the 23rd edition of the MLOps newsletter. In this issue, we cover updates on AI regulations and frameworks in the EU and the US, a recent paper on web-scale active learning, details about LinkedIn&#8217;s internal platform for model observability, and a summary of Google&#8217;s new scalable vector similarity search.]]></description><link>https://mlopsroundup.substack.com/p/issue-23-ai-regulations-efficient</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-23-ai-regulations-efficient</guid><dc:creator><![CDATA[Rishabh Bhargava]]></dc:creator><pubDate>Mon, 26 Jul 2021 17:02:47 GMT</pubDate><enclosure url="https://cdn.substack.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1709b70-015c-4fec-b687-e1cb42706c84_1372x606.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OrwA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb48b261-c419-4d08-8294-2c1fbc9ee547_1000x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OrwA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb48b261-c419-4d08-8294-2c1fbc9ee547_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!OrwA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb48b261-c419-4d08-8294-2c1fbc9ee547_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!OrwA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb48b261-c419-4d08-8294-2c1fbc9ee547_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!OrwA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb48b261-c419-4d08-8294-2c1fbc9ee547_1000x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OrwA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb48b261-c419-4d08-8294-2c1fbc9ee547_1000x400.png" width="1000" height="400" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/db48b261-c419-4d08-8294-2c1fbc9ee547_1000x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OrwA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb48b261-c419-4d08-8294-2c1fbc9ee547_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!OrwA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb48b261-c419-4d08-8294-2c1fbc9ee547_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!OrwA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb48b261-c419-4d08-8294-2c1fbc9ee547_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!OrwA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb48b261-c419-4d08-8294-2c1fbc9ee547_1000x400.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Welcome to the 23rd edition of the MLOps newsletter.&nbsp;</p><p>In this issue, we cover updates on AI regulations and frameworks in the EU and the US, a recent paper on web-scale active learning, details about LinkedIn&#8217;s internal platform for model observability, and a summary of Google&#8217;s new scalable vector similarity search.</p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2>AI Regulations in the EU and the United States</h2><h4><a href="https://osf.io/preprints/socarxiv/38p5f">Demystifying the Draft EU AI Act</a></h4><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/mikarv/status/1412367513321676800&quot;,&quot;full_text&quot;:&quot;New &#128240;: There's more to the EU AI regulation than meets the eye: big loopholes, private rulemaking, powerful deregulatory effects. Analysis needs connection to broad&#8212;sometimes pretty arcane&#8212;EU law\n\n<span class=\&quot;tweet-fake-link\&quot;>@fborgesius</span> &amp;amp; I have done it so you don't have to: long &#129525;\n<a class=\&quot;tweet-url\&quot; href=\&quot;https://osf.io/preprints/socarxiv/38p5f\&quot;>osf.io/preprints/soca&#8230;</a> &quot;,&quot;username&quot;:&quot;mikarv&quot;,&quot;name&quot;:&quot;Michael Veale&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Tue Jul 06 11:07:10 +0000 2021&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/E5m6n9EXoAUylzl.png&quot;,&quot;link_url&quot;:&quot;https://t.co/7SAeR6bVq9&quot;,&quot;alt_text&quot;:&quot;Demystifying the Draft EU Artificial Intelligence Act\nIn April 2021, the European Commission proposed a Regulation on Artificial Intelligence, known as the AI Act. We present an overview of the Act and analyse its implications, drawing on scholarship ranging from the study of contemporary AI practices to the structure of EU product safety regimes over the last four decades. Aspects of the AI Act, such as different rules for different risk-levels of AI, make sense. But we also find that some provisions of the draft AI Act have surprising legal implications, whilst others may be largely ineffective at achieving their stated goals. Several overarching aspects, including the enforcement regime and the effect of maximum harmonisation on the space for AI policy more generally, engender significant concern. These issues should be addressed as a priority in the legislative process.&quot;}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:395,&quot;like_count&quot;:892,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:false}" data-component-name="Twitter2ToDOM"></div><p>A few months ago, <a href="https://mlopsroundup.substack.com/p/issue-16-eu-ai-regulations-data-centric">we had shared</a> the AI regulations from the EU when they were still being considered (and some of the <a href="https://twitter.com/yoavgo/status/1382759762786390017?s=20">early reactions</a>).&nbsp;</p><p>Now, <a href="https://twitter.com/mikarv">Michael Veale</a> has put together a fascinating deep dive into the Artificial Intelligence Act. We recommend reading through the Twitter thread (or <a href="https://osf.io/preprints/socarxiv/38p5f">their paper</a> for the full analysis) -- our overwhelming takeaway is that writing policy is hard!&nbsp;</p><p>Let&#8217;s look at a couple of the specific shortcomings in the AI Act they write about:</p><ul><li><p>The Act prohibits manipulative systems. In their formulation, being &#8220;manipulative&#8221; requires &#8220;intent&#8221; from the system and the system must exert influence through a &#8220;vulnerability&#8221; such as age, disability, or through &#8220;subliminal techniques&#8221;. However, there is no clause for whether the user was harmed in any way or not. They also explicitly exclude manipulation from systems based on ratings and reputation. The end result is that almost all online systems will be excluded from being judged under this Act (and, of course, no vendor will admit an &#8220;intent&#8221; to manipulate).</p></li><li><p>For high-risk AI systems (biometric identification, law enforcement, etc), the Act lays out &#8220;essential requirements&#8221; including data quality criteria of accuracy, representativeness, and completeness. However, the European Commission plans to ask two private organizations who will create a paid &#8220;harmonized standard&#8221;. Such private organizations are heavily lobbied and could drift away from the &#8220;essential requirements&#8221;, which is likely to make this regulation difficult to enforce.&nbsp;</p></li></ul><h4><strong><a href="https://www.gao.gov/products/gao-21-519sp">United States GAO AI Accountability Framework</a></strong></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.gao.gov/products/gao-21-519sp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nHqx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd654ae9d-61be-4558-af5e-0741203d50e2_1586x1052.png 424w, https://substackcdn.com/image/fetch/$s_!nHqx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd654ae9d-61be-4558-af5e-0741203d50e2_1586x1052.png 848w, https://substackcdn.com/image/fetch/$s_!nHqx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd654ae9d-61be-4558-af5e-0741203d50e2_1586x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!nHqx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd654ae9d-61be-4558-af5e-0741203d50e2_1586x1052.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nHqx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd654ae9d-61be-4558-af5e-0741203d50e2_1586x1052.png" width="1456" height="966" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/d654ae9d-61be-4558-af5e-0741203d50e2_1586x1052.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:966,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.gao.gov/products/gao-21-519sp&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nHqx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd654ae9d-61be-4558-af5e-0741203d50e2_1586x1052.png 424w, https://substackcdn.com/image/fetch/$s_!nHqx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd654ae9d-61be-4558-af5e-0741203d50e2_1586x1052.png 848w, https://substackcdn.com/image/fetch/$s_!nHqx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd654ae9d-61be-4558-af5e-0741203d50e2_1586x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!nHqx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd654ae9d-61be-4558-af5e-0741203d50e2_1586x1052.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://www.gao.gov/products/gao-21-519sp">This</a> is a report from the Government Accountability Office (GAO) that details an accountability framework for federal agencies when it comes to using AI responsibly.&nbsp;</p><p>We won&#8217;t go into the details here, but their focus on data quality, governance, monitoring, and performance tracking is spot on!</p><h2><a href="https://arxiv.org/pdf/2007.00077.pdf">Paper | Efficient Active Learning with Similarity Search</a></h2><p>In a <a href="https://arxiv.org/pdf/2007.00077.pdf">recent paper</a>, researchers from Stanford, Facebook AI, and the University of Wisconsin-Madison introduced SEALS (Similarity Search for Efficient Active Learning and Search).</p><h4><strong>Problem</strong></h4><p>&#8203;&#8203;For web-scale datasets, active learning approaches are often intractable due to size - billions of unlabeled examples. Existing approaches search globally for the optimal examples to label, scaling linearly or even quadratically with the size of unlabeled data. In effect, the time complexity of active learning for such use cases is O(dataset size * number of sampling iterations).&nbsp;</p><h4><strong>Proposed Solution</strong></h4><p>This paper improves the computational efficiency of active learning and search methods by restricting the candidate pool for labeling to the nearest neighbors of the currently labeled set. The intuition behind this idea is that learned embeddings from generalized pre-trained models can cluster many rare concepts. This structure can then be exploited to improve the efficiency of active learning: only considering the nearest neighbors of the currently labeled examples in each selection round allows us to eventually get to all, or most positive labels for a given concept, without having to exhaustively scan over all of the unlabeled data.</p><h4><strong>Results</strong></h4><p>The paper evaluated the proposed selection strategy on three datasets: ImageNet, OpenImages, and an anonymized and aggregated dataset of 10 billion publicly shared images on the web. We recommend reading the paper to learn more about the experiment details and parameters for each dataset, but the overall takeaway was that this approach achieved similar mean average precision and recall from downstream classifiers trained on the labeled datasets, as the traditional global approach (baseline) while<strong> reducing the computational cost by up to three orders of magnitude</strong>.</p><h4><strong>Risks</strong></h4><p>As identified by authors in the paper, this gain in computational efficiency does come at the cost of some added complexity that should not be ignored:</p><ol><li><p>This approach requires a similarity search index that can index and search over the entire dataset. There are some open-source implementations of <a href="https://github.com/erikbern/ann-benchmarks#evaluated">approximate nearest neighbor search</a> that can be considered here.</p></li><li><p>There is a risk that embeddings used for this task do not effectively cluster examples that belong to a given concept. Just restricting labeling selections to nearest neighbors of the currently labeled data might not be an effective strategy.</p></li></ol><h2><a href="https://engineering.linkedin.com/blog/2021/model-health-assurance-at-linkedin">LinkedIn Engineering Blog | Model Health Assurance at LinkedIn</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://engineering.linkedin.com/blog/2021/model-health-assurance-at-linkedin" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0HR1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1709b70-015c-4fec-b687-e1cb42706c84_1372x606.png 424w, https://substackcdn.com/image/fetch/$s_!0HR1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1709b70-015c-4fec-b687-e1cb42706c84_1372x606.png 848w, https://substackcdn.com/image/fetch/$s_!0HR1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1709b70-015c-4fec-b687-e1cb42706c84_1372x606.png 1272w, https://substackcdn.com/image/fetch/$s_!0HR1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1709b70-015c-4fec-b687-e1cb42706c84_1372x606.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0HR1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1709b70-015c-4fec-b687-e1cb42706c84_1372x606.png" width="1372" height="606" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/e1709b70-015c-4fec-b687-e1cb42706c84_1372x606.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:606,&quot;width&quot;:1372,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://engineering.linkedin.com/blog/2021/model-health-assurance-at-linkedin&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0HR1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1709b70-015c-4fec-b687-e1cb42706c84_1372x606.png 424w, https://substackcdn.com/image/fetch/$s_!0HR1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1709b70-015c-4fec-b687-e1cb42706c84_1372x606.png 848w, https://substackcdn.com/image/fetch/$s_!0HR1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1709b70-015c-4fec-b687-e1cb42706c84_1372x606.png 1272w, https://substackcdn.com/image/fetch/$s_!0HR1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1709b70-015c-4fec-b687-e1cb42706c84_1372x606.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This recent <a href="https://engineering.linkedin.com/blog/2021/model-health-assurance-at-linkedin">article</a> by LinkedIn shares some of the design principles and details of Model Health Assurance, LinkedIn&#8217;s in-house platform for ML model monitoring and observability (which is, in turn, part of its <a href="https://engineering.linkedin.com/blog/2019/01/scaling-machine-learning-productivity-at-linkedin">Pro-ML</a> platform for training and serving ML models online). We share some highlights from the article below:</p><ul><li><p><strong>Motivation</strong>: Monitoring ML models post-deployment is a recurring problem for ML engineers at LinkedIn. Health Assurance (HA) is a platform to provide engineers with tools and systems that help them identify issues with production models faster and help debug them.</p></li><li><p><strong>Core functionality: </strong>HA plugs into the online model inference code path to measure a variety of system indicators (e.g. traffic volume, latency, resource usage of hosts, etc) and data indicators (feature statistics, prediction drift, etc).&nbsp;</p></li><li><p><strong>Consumption</strong>: A batch job computes feature and prediction statistics for all deployed models and pushes them to <a href="https://github.com/apache/pinot">Pinot</a> (an open-source distributed OLAP data store) and ThirdEye (LinkedIn&#8217;s in-house monitoring &amp; alerting tool) which then alerts the downstream team that owns the model.&nbsp;</p></li><li><p><strong>Easy onboarding</strong>: HA has two types of functionalities: a basic set of metrics &amp; alerts that are autoconfigured and supported out of the box for any model that goes to production; and another set of custom metrics &amp; features to track for a specific model (configured by ML engineers).</p></li></ul><h2><a href="https://cloud.google.com/blog/products/ai-machine-learning/vertex-matching-engine-blazing-fast-and-massively-scalable-nearest-neighbor-search">New Product Alert: Vertex Matching Engine by Google</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://cloud.google.com/blog/products/ai-machine-learning/vertex-matching-engine-blazing-fast-and-massively-scalable-nearest-neighbor-search" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ITH2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fda7ec310-0d69-461b-bedf-12193a7e5063_1070x502.png 424w, https://substackcdn.com/image/fetch/$s_!ITH2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fda7ec310-0d69-461b-bedf-12193a7e5063_1070x502.png 848w, https://substackcdn.com/image/fetch/$s_!ITH2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fda7ec310-0d69-461b-bedf-12193a7e5063_1070x502.png 1272w, https://substackcdn.com/image/fetch/$s_!ITH2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fda7ec310-0d69-461b-bedf-12193a7e5063_1070x502.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ITH2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fda7ec310-0d69-461b-bedf-12193a7e5063_1070x502.png" width="1070" height="502" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/da7ec310-0d69-461b-bedf-12193a7e5063_1070x502.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:502,&quot;width&quot;:1070,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://cloud.google.com/blog/products/ai-machine-learning/vertex-matching-engine-blazing-fast-and-massively-scalable-nearest-neighbor-search&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ITH2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fda7ec310-0d69-461b-bedf-12193a7e5063_1070x502.png 424w, https://substackcdn.com/image/fetch/$s_!ITH2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fda7ec310-0d69-461b-bedf-12193a7e5063_1070x502.png 848w, https://substackcdn.com/image/fetch/$s_!ITH2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fda7ec310-0d69-461b-bedf-12193a7e5063_1070x502.png 1272w, https://substackcdn.com/image/fetch/$s_!ITH2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fda7ec310-0d69-461b-bedf-12193a7e5063_1070x502.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Google Cloud recently announced the release of <a href="https://cloud.google.com/blog/products/ai-machine-learning/vertex-matching-engine-blazing-fast-and-massively-scalable-nearest-neighbor-search">Vertex Matching Engine</a>, a completely managed cloud solution for fast and scalable approximate nearest neighbor (vector similarity) search.&nbsp;</p><h4><strong>What is Nearest Neighbor Search?&nbsp;</strong></h4><p>Embedding representations, a learned mapping of raw content to a dense n-dimensional space, are a foundational tool for any ML engineer. While embeddings are useful for all kinds of machine learning tasks as general feature representations, one class of powerful applications they enable is embedding-based search.</p><p>Compared to traditional tf-idf based retrieval which searches a database of documents based on token match, an embedding-based search can rely on matching queries and documents based on semantic properties learned by the embeddings. To answer a query, we map the query to the embedding space and then find, among all database embeddings, the ones closest to the query (i.e. <a href="https://en.wikipedia.org/wiki/Nearest_neighbor_search">nearest neighbor search</a>).</p><h4><strong>Overview of Vertex Matching Engine</strong></h4><p>While many models used to generate embeddings are open-source to used (e.g. <a href="https://www.tensorflow.org/hub/tutorials/semantic_similarity_with_tf_hub_universal_encoder">Universal Sentence Encoder</a> for text or <a href="https://tfhub.dev/tensorflow/resnet_50/feature_vector/1">ResNet</a> for images), using them for search/similarity applications is still hard because traditional databases are not optimized for nearest-neighbor search queries.&nbsp;</p><p>Vertex Matching Engine is a managed solution by Google cloud for fast and scalable approximate nearest neighbor search. It is powered by <a href="https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html">ScaNN</a> (the same technology that it uses internally for Google search, youtube, etc). It enables real-time searching - O(millisecond) latency - over O(billions) database size. Based on empirical observations from teams within Google, Vertex Matching Engine can achieve 95-98% recall compared to brute force nearest neighbor search. Furthermore, GCP handles all the scaling requirements based on database size and query load and allows for index updates with zero downtime.&nbsp;</p><h4><strong>To Get Started</strong></h4><p>If you&#8217;re interested in experimenting with Vertex Matching Engine or are evaluating it for a potential production use case, you can take a look at the documentation <a href="https://cloud.google.com/vertex-ai/docs/matching-engine">here</a>, or <a href="https://github.com//GoogleCloudPlatform/ai-platform-samples/blob/master/ai-platform-unified/notebooks/unofficial/matching_engine/matching_engine_for_indexing.ipynb">this</a> sample notebook.</p><h2><a href="https://twitter.com/sarahookr/status/1418592662794481666">Twitter Thread: Dealing with Uncertainty in ML</a></h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/sarahookr/status/1418592662794481666&quot;,&quot;full_text&quot;:&quot;How do you distinguish between sources of uncertainty? \n\nThis is important because the downstream remedies for atypical and noisy examples are very different.\n\nTwo of our workshop papers explore this from different perspectives. &quot;,&quot;username&quot;:&quot;sarahookr&quot;,&quot;name&quot;:&quot;Sara Hooker&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Fri Jul 23 15:23:41 +0000 2021&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/E6_Z62vWUAAWRxs.jpg&quot;,&quot;link_url&quot;:&quot;https://t.co/hR31ecRU2v&quot;,&quot;alt_text&quot;:null},{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/E6_Z65bWYAUslxU.jpg&quot;,&quot;link_url&quot;:&quot;https://t.co/hR31ecRU2v&quot;,&quot;alt_text&quot;:null}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:38,&quot;like_count&quot;:291,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p><a href="https://twitter.com/sarahookr">Sarah Hooker</a> shares two interesting papers that were presented at <a href="https://icml.cc/">ICML</a>. Both papers explore what happens when real-world, noisy data is brought to bear on the ML training process.&nbsp;</p><p>The <a href="https://arxiv.org/pdf/2107.07741.pdf">first paper</a> discusses the process of up-weighting samples with a higher loss during training time with the assumption is that the model has more to learn from such examples. However, they find that when data is corrupted (or the level of noise is high enough), up-weighting can end up hurting the final trained model. High-quality data is a prerequisite for such training acceleration techniques.&nbsp;</p><p>The <a href="http://www.gatsby.ucl.ac.uk/~balaji/udl2021/accepted-papers/UDL2021-paper-101.pdf">second paper</a> breaks down error cases into reducible errors (outliers and edge cases) and irreducible errors (super noisy or mislabeled examples). If such errors can be categorized, then model training could be improved by cleaning the noisy examples and augmenting the outliers or atypical examples. </p><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> or email us at <a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a>. If you like what we are doing please tell your friends and colleagues to spread the word.</p>]]></content:encoded></item><item><title><![CDATA[Issue #22: Github Copilot. CI/CD for ML at Scale. Concept Drift in Healthcare. Behavioral Testing. ]]></title><description><![CDATA[Welcome to the 22nd issue of the MLOps newsletter.]]></description><link>https://mlopsroundup.substack.com/p/issue-22-github-copilot-cicd-for</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-22-github-copilot-cicd-for</guid><dc:creator><![CDATA[Nihit Desai]]></dc:creator><pubDate>Mon, 12 Jul 2021 17:06:36 GMT</pubDate><enclosure url="https://cdn.substack.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab67b17-90d0-4db9-baf4-ccc9df70b10f_688x450.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ce7T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F33880415-c06b-4b5b-b990-ca4a9ccfa8b7_1000x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ce7T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F33880415-c06b-4b5b-b990-ca4a9ccfa8b7_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!Ce7T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F33880415-c06b-4b5b-b990-ca4a9ccfa8b7_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!Ce7T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F33880415-c06b-4b5b-b990-ca4a9ccfa8b7_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!Ce7T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F33880415-c06b-4b5b-b990-ca4a9ccfa8b7_1000x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ce7T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F33880415-c06b-4b5b-b990-ca4a9ccfa8b7_1000x400.png" width="1000" height="400" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/33880415-c06b-4b5b-b990-ca4a9ccfa8b7_1000x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ce7T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F33880415-c06b-4b5b-b990-ca4a9ccfa8b7_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!Ce7T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F33880415-c06b-4b5b-b990-ca4a9ccfa8b7_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!Ce7T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F33880415-c06b-4b5b-b990-ca4a9ccfa8b7_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!Ce7T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F33880415-c06b-4b5b-b990-ca4a9ccfa8b7_1000x400.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Welcome to the 22nd issue of the MLOps newsletter.&nbsp;</p><p>In this issue, we cover OpenAI&#8217;s paper introducing Codex (their fine-tuned GPT language model for code generation that powers Github CoPilot), share Uber&#8217;s CI/CD for deploying production ML models, discuss concept drift challenges with production ML models, and share Andrej Karpathy&#8217;s recent tweets about challenges with designing data labeling workflows.</p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><a href="https://arxiv.org/pdf/2107.03374.pdf">Paper | Evaluating Large Language Models Trained on Code</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://arxiv.org/pdf/2107.03374.pdf" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GQLy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab67b17-90d0-4db9-baf4-ccc9df70b10f_688x450.png 424w, https://substackcdn.com/image/fetch/$s_!GQLy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab67b17-90d0-4db9-baf4-ccc9df70b10f_688x450.png 848w, https://substackcdn.com/image/fetch/$s_!GQLy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab67b17-90d0-4db9-baf4-ccc9df70b10f_688x450.png 1272w, https://substackcdn.com/image/fetch/$s_!GQLy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab67b17-90d0-4db9-baf4-ccc9df70b10f_688x450.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GQLy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab67b17-90d0-4db9-baf4-ccc9df70b10f_688x450.png" width="688" height="450" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/bab67b17-90d0-4db9-baf4-ccc9df70b10f_688x450.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:688,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://arxiv.org/pdf/2107.03374.pdf&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GQLy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab67b17-90d0-4db9-baf4-ccc9df70b10f_688x450.png 424w, https://substackcdn.com/image/fetch/$s_!GQLy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab67b17-90d0-4db9-baf4-ccc9df70b10f_688x450.png 848w, https://substackcdn.com/image/fetch/$s_!GQLy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab67b17-90d0-4db9-baf4-ccc9df70b10f_688x450.png 1272w, https://substackcdn.com/image/fetch/$s_!GQLy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab67b17-90d0-4db9-baf4-ccc9df70b10f_688x450.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Github recently announced Copilot, an AI-powered pair programming tool. Check out the official<a href="https://copilot.github.com/"> webpage</a> or this <a href="https://www.youtube.com/watch?v=St2CMvK4hK0">intro video</a> if you want to get a gist (ha!). Copilot is powered by Codex, which is a GPT language model fine-tuned on code from Github, and the subject of this paper from OpenAI.</p><h4><strong>Overview</strong></h4><p>This paper makes two important contributions:</p><ol><li><p>It introduces Codex, a GPT language model that is finetuned to generate code using publicly available code from GitHub. It evaluates Codex on its Python code code-writing capabilities, given a docstring as the &#8220;prompt&#8221;.</p></li><li><p>It introduces HumanEval, a new evaluation dataset and associated code released by OpenAI to measure the functional correctness of generated code programs from docstrings.</p></li></ol><h4><strong>Codex: Under the hood</strong></h4><ul><li><p><strong>Training data</strong>: As shared in the paper, Codex is fine-tuned on code publicly available on Github. The training dataset for the model was collected in May 2020 from 54 million public software repositories hosted on GitHub. After applying heuristic filters to filter out likely auto-generated code files, the total size of the training data was 159 GB.&nbsp;</p></li><li><p><strong>Task</strong>: The paper focuses on the task of generating standalone Python functions from docstrings and evaluating the correctness of code samples automatically through unit tests.</p></li><li><p><strong>Evaluation Methodology:</strong> To benchmark the model, the authors create a dataset of 164 original programming problems with unit tests. As outlined in the paper, the complexity of these programs can be compared to easy interview questions. This dataset is released and publicly viewable on their <a href="https://www.github.com/openai/human-eval">Github repo</a>. The metric used is &#8220;pass rate @ K&#8221;: K code samples are generated per problem, and a problem is considered solved if any sample passes all the unit tests for that problem.</p></li><li><p><strong>Performance:</strong> Codex fine-tuned on Github data solves 28.8% of the problems with just 1 sample per problem. Further, the authors find that repeated sampling is an effective strategy for producing working solutions to difficult prompts. Using this method, the model solves 70.2% of problems with 100 samples per problem.</p></li></ul><h4><strong>Limitations &amp; Risks</strong></h4><p>In the latter half of the paper, the authors examine and discuss Codex&#8217;s limitations and risks. We definitely recommend reading it in more detail in the paper, and summarize the major takeaways here:&nbsp;&nbsp;</p><ul><li><p><strong>Sample efficiency</strong>: Codex&#8217;s training dataset is basically all publicly available Python code on GitHub, which is hundreds of millions of lines of code. This is many orders of magnitude more than the amount of code a human software engineer will encounter over their career.</p></li><li><p><strong>Error-prone: </strong>As noted in the paper, programs generated by Codex often have syntax errors, and invoke functions or call class attributes that are undefined.</p></li><li><p><strong>Alignment problem: </strong>Codex, like all language models, generates code output that is drawn from the training distribution. This means the code output, in some real-world scenarios might add negative value if the input prompt contains subtle bugs or errors</p></li><li><p><strong>Bias &amp; representation:</strong> Codex can be prompted in ways to generate denigratory outputs as code comments.</p></li></ul><p>Codex is a fairly new tool in the software engineer&#8217;s toolkit. Github Copilot, for example, is not yet publicly available to use. Over the next decade, as the capabilities of tools like Copilot improve, we believe they could result in a multiplier on engineering productivity (especially for tasks that are below a certain threshold of complexity) assuming that the risks are sufficiently mitigated. </p><h2><a href="https://eng.uber.com/continuous-integration-deployment-ml/">Uber Engineering Blog | Continuous Integration and Deployment for Machine Learning Online Serving and Models</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://eng.uber.com/continuous-integration-deployment-ml/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!japS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3b97bcc4-2414-43ae-92d4-4add7247b014_768x379.png 424w, https://substackcdn.com/image/fetch/$s_!japS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3b97bcc4-2414-43ae-92d4-4add7247b014_768x379.png 848w, https://substackcdn.com/image/fetch/$s_!japS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3b97bcc4-2414-43ae-92d4-4add7247b014_768x379.png 1272w, https://substackcdn.com/image/fetch/$s_!japS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3b97bcc4-2414-43ae-92d4-4add7247b014_768x379.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!japS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3b97bcc4-2414-43ae-92d4-4add7247b014_768x379.png" width="768" height="379" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/3b97bcc4-2414-43ae-92d4-4add7247b014_768x379.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:379,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://eng.uber.com/continuous-integration-deployment-ml/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!japS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3b97bcc4-2414-43ae-92d4-4add7247b014_768x379.png 424w, https://substackcdn.com/image/fetch/$s_!japS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3b97bcc4-2414-43ae-92d4-4add7247b014_768x379.png 848w, https://substackcdn.com/image/fetch/$s_!japS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3b97bcc4-2414-43ae-92d4-4add7247b014_768x379.png 1272w, https://substackcdn.com/image/fetch/$s_!japS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3b97bcc4-2414-43ae-92d4-4add7247b014_768x379.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It&#8217;s always fascinating to read about the ML systems at large consumer tech companies -- they are often solving problems that most companies will never face. Certain elements from <a href="https://eng.uber.com/continuous-integration-deployment-ml/">this post</a> by the Uber team feel like that -- let&#8217;s jump into it!</p><h4><strong>Problems</strong></h4><p>Uber relies on a large number of ML models to deliver a good customer experience. As they&nbsp; scaled their services, they needed to address the following MLOps challenges:</p><ul><li><p>Develop a CI/CD system for their ML model deployments so that a large volume of model deployments could happen on a daily basis.&nbsp;</p></li><li><p>Dealing with the latencies of fetching and loading models and the memory footprint of old models in their Prediction Service.&nbsp;</p></li><li><p>Rolling out models for a subset of traffic, or in shadow mode (where predictions are not being directly used but captured for experimentation/analysis)</p></li><li><p>CI/CD for their Prediction Service itself is key to prevent issues such as incompatibilities between models and the Prediction service.</p></li></ul><h4><strong>Model Deployment</strong></h4><p>Historically, the Uber team used to include model artifacts in the Docker image for their Prediction service. With the growth in model deployments, they started dynamic loading of models where the Prediction service would periodically check the local model version with the latest version in the Model Artifact and Config Store, and fetch the latest model as needed.&nbsp;</p><p>Before a model is pushed into the Model Artifact and Config Store, the artifacts are validated and the compiled model file is used to run predictions on some sample data to ensure that model is working.&nbsp;</p><p>The deployment process can be tracked by ML Engineers and a health check on each model is automatically deployed.&nbsp;</p><h4><strong>Model Auto-Retirement</strong></h4><p>While a model retirement API is provided, engineers can forget to retire old models. This leads to unnecessary memory consumption. The Uber team built an expiration date into their model deployment process such that old models, if unused, would be automatically retired with alerts sent to the relevant teams.&nbsp;</p><h4><strong>Model Rollout and Shadowing</strong></h4><p>The Uber team found that different ML engineers chose to roll out models with different strategies -- gradual rollout to production traffic to shadowing current model for a period of time. They ended up building these mechanisms into the Prediction Service directly such that users could easily set up model rollout and shadowing strategies. This also allowed simpler sharing of features between primary and shadow models from their Feature Store, allowing auto-shadowing of models and better infrastructure management during times of high traffic.&nbsp;</p><h4><strong>Continuous Integration and Deployment of Prediction Service</strong></h4><p>It&#8217;s not enough for models themselves to have CI/CD, but the service that is running predictions across all models required full CI/CD as well. This is important to make sure that code changes or dependency changes don&#8217;t lead to compatibility issues when the model is being loaded or being run. The Uber team had a three-step validation process: staging and canary integration tests against a non-production environment followed by a gradual rollout to production workloads.&nbsp;</p><h4><strong>Our Take</strong></h4><p>There are interesting lessons here for companies that might be scaling to a large number of models across a large amount of traffic. We&#8217;d love to hear from you about what resonated with you!</p><h2><a href="https://www.forbes.com/sites/forbestechcouncil/2019/04/03/why-machine-learning-models-crash-and-burn-in-production/">Forbes Technology Council | Why Machine Learning Models Crash And Burn In Production</a></h2><p><a href="https://www.forbes.com/sites/forbestechcouncil/2019/04/03/why-machine-learning-models-crash-and-burn-in-production/">This</a> is a great post from <a href="https://twitter.com/davidtalby">David Talby</a>, the CTO of John Snow Labs, a company that builds ML and NLP solutions for the healthcare world. Even though the post was written in 2019, the lessons remain relevant, especially since they&#8217;re backed by his own experiences.&nbsp;</p><h4><strong>What&#8217;s the problem?</strong></h4><blockquote><p>&#8220;One magical aspect of software is that it just keeps working. If you code a calculator app, it will still correctly add and multiply numbers a month, a year, or 10 years later. The fact that the marginal cost of software approaches zero has been a bedrock of the software industry&#8217;s business model since the 1980s.</p><p>This is no longer the case when you are deploying machine learning (ML) models. The moment you put a model in production, it starts degrading.&#8221;</p></blockquote><p>The author says that this performance degradation is largely down to a change in the data in the real world, or concept and data drift (a topic <a href="https://mlopsroundup.substack.com/p/issue-21-selecting-mlops-capabilities">we have</a> <a href="https://mlopsroundup.substack.com/p/issue-9-mlops-tooling-landscape-ai">covered</a> <a href="https://mlopsroundup.substack.com/p/issue-7-trustworthy-ai-in-the-govt">extensively</a>).&nbsp;</p><h4><strong>Real-world example</strong></h4><p>The author was trying to predict 30-day readmission rates, which is a well-studied problem thanks to the <a href="https://www.cms.gov/medicare/medicare-fee-for-service-payment/acuteinpatientpps/readmissions-reduction-program">US Medicare&#8217;s Hospital Readmissions Reduction Program</a>. However, in their specific project, they found that:</p><blockquote><p>&#8220;A predictive readmission model that was trained, optimized and deployed at a hospital would start sharply degrading within two to three months. Models would change in different ways at different hospitals &#8212; or even buildings within the same hospital.&#8221;</p></blockquote><p>There were many reasons for this drop in performance. Changing fields in electronic health records would make some workflow better, but leave old fields blank. Important codes would change when a different lab was used and starting to use a new type of insurance would change the type of patients going to the ER.&nbsp;</p><h4><strong>Lessons</strong></h4><ul><li><p><strong>Online measurement of accuracy</strong>: We need a better understanding of the performance of models in production, and logging real-world results is &#8220;an elementary requirement.&#8221;</p></li><li><p><strong>Mind the gap</strong>: We need to watch out for &#8220;gaps between the distributions of your training and online data sets&#8221;.&nbsp;</p></li><li><p><strong>Online data quality alerts: </strong>If the input data distribution changes in a meaningful way, an alert should go to the operations team - it might also be time to retrain your model.&nbsp;</p></li></ul><p><a href="https://www.oreilly.com/content/lessons-learned-building-natural-language-processing-systems-in-health-care/">Here</a> is another great post on the lessons from deploying ML models in healthcare from the author. </p><h2><a href="https://www.youtube.com/watch?v=Cse-3MM7mso">Jay Alammar | Behavioral Testing of ML Models </a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.youtube.com/watch?v=Cse-3MM7mso" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Z0QK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6237b0cf-d8bf-4226-8308-11ddba56b3c0_2762x1466.png 424w, https://substackcdn.com/image/fetch/$s_!Z0QK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6237b0cf-d8bf-4226-8308-11ddba56b3c0_2762x1466.png 848w, https://substackcdn.com/image/fetch/$s_!Z0QK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6237b0cf-d8bf-4226-8308-11ddba56b3c0_2762x1466.png 1272w, https://substackcdn.com/image/fetch/$s_!Z0QK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6237b0cf-d8bf-4226-8308-11ddba56b3c0_2762x1466.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Z0QK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6237b0cf-d8bf-4226-8308-11ddba56b3c0_2762x1466.png" width="1456" height="773" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/6237b0cf-d8bf-4226-8308-11ddba56b3c0_2762x1466.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:773,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1443093,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.youtube.com/watch?v=Cse-3MM7mso&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Z0QK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6237b0cf-d8bf-4226-8308-11ddba56b3c0_2762x1466.png 424w, https://substackcdn.com/image/fetch/$s_!Z0QK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6237b0cf-d8bf-4226-8308-11ddba56b3c0_2762x1466.png 848w, https://substackcdn.com/image/fetch/$s_!Z0QK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6237b0cf-d8bf-4226-8308-11ddba56b3c0_2762x1466.png 1272w, https://substackcdn.com/image/fetch/$s_!Z0QK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6237b0cf-d8bf-4226-8308-11ddba56b3c0_2762x1466.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Evaluating models using an aggregate metric on a test set (like accuracy, F1-score), while helpful to benchmark research approaches, is often insufficient for production machine learning applications. This video by <a href="https://twitter.com/JayAlammar">Jay Alammar </a>discusses&nbsp; <a href="https://www.aclweb.org/anthology/2020.acl-main.442/">Beyond Accuracy: Behavioral Testing of NLP Models with CheckList</a>, a paper we covered <a href="https://mlopsroundup.substack.com/p/issue-3-state-of-ai-behavioral-testing-ml-models-dynamic-benchmarks-data-versioning-madewithml-283540">previously in our newsletter</a> as well. The paper (and the video) introduces us to the practice of behavioral testing for ML models (analogous to unit tests for traditional software). </p><h4><strong>The Problem</strong></h4><p>The traditional practice of evaluating models using an aggregate standardized metric (like accuracy, F1-score) on a standardized test set is helpful to benchmark research approaches. However, it runs into a few problems in the real world:</p><ol><li><p><strong>Overestimation</strong>: Traditional approach of testing ML models on a held-out subset of the complete labeled dataset, usually does not generalize to examples &#8220;in the wild&#8221;.</p></li><li><p><strong>Resolution</strong>: A single metric produces at best a low-resolution picture of model performance. Given two models, say A and B with test set accuracies of 65% and 68%, it is often insufficient to tell which one should be launched in production. Some follow-up questions that ML practitioners often care about:</p><ol><li><p>How was the test dataset constructed? Does this look like my production data in any way?</p></li><li><p>What are the examples that model A gets right, but B gets wrong? And vice-versa?</p></li><li><p>There are these 20 examples that any model going to production HAS to get right. How do models A and B perform on these?</p></li></ol></li></ol><h4><strong>The Solution</strong></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.youtube.com/watch?v=Cse-3MM7mso" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5rCZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe94fe3b-bd5a-406d-bb7d-044471061ac1_1600x930.png 424w, https://substackcdn.com/image/fetch/$s_!5rCZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe94fe3b-bd5a-406d-bb7d-044471061ac1_1600x930.png 848w, https://substackcdn.com/image/fetch/$s_!5rCZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe94fe3b-bd5a-406d-bb7d-044471061ac1_1600x930.png 1272w, https://substackcdn.com/image/fetch/$s_!5rCZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe94fe3b-bd5a-406d-bb7d-044471061ac1_1600x930.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5rCZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe94fe3b-bd5a-406d-bb7d-044471061ac1_1600x930.png" width="1456" height="846" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/fe94fe3b-bd5a-406d-bb7d-044471061ac1_1600x930.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:846,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.youtube.com/watch?v=Cse-3MM7mso&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5rCZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe94fe3b-bd5a-406d-bb7d-044471061ac1_1600x930.png 424w, https://substackcdn.com/image/fetch/$s_!5rCZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe94fe3b-bd5a-406d-bb7d-044471061ac1_1600x930.png 848w, https://substackcdn.com/image/fetch/$s_!5rCZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe94fe3b-bd5a-406d-bb7d-044471061ac1_1600x930.png 1272w, https://substackcdn.com/image/fetch/$s_!5rCZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe94fe3b-bd5a-406d-bb7d-044471061ac1_1600x930.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Behavioral tests can give us a higher resolution evaluation of a model's capabilities. While the video focuses specifically on behavioral testing for NLP models, the idea we believe can generalize very well to domains beyond NLP.&nbsp;</p><ul><li><p><strong>Minimum Functionality Tests</strong>: Equivalent to unit tests on tiny test datasets. Often test the model&#8217;s performance on obvious test cases or cases where the cost of incorrect prediction is extremely high.&nbsp;</p></li><li><p><strong>Invariance Tests</strong>: Test the model&#8217;s performance on &#8220;label preserving&#8221; perturbations to the input. A variation of this is often used to do adversarial robustness testing&nbsp;</p></li><li><p><strong>Directional Expectation Tests</strong>: Test whether the model&#8217;s predictions move in a manner that is directionally consistent with one&#8217;s expectations.&nbsp;</p></li></ul><p>In addition to this video, which gives a good high-level introduction to behavioral testing, you can also check out a talk by the paper&#8217;s author <a href="https://www.youtube.com/watch?ab_channel=StanfordMLSysSeminars&amp;utm_campaign=Machine%20Learning%20Ops%20Roundup&amp;utm_medium=email&amp;utm_source=Revue%20newsletter&amp;v=VqiTtdY58Ts">here</a> from the <a href="https://mlsys.stanford.edu/">Stanford MLSys Seminar Series</a>.</p><h2><a href="https://huggingface.co/course/chapter1">New ML Resource | Course on Transformers from Hugging Face</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://huggingface.co/course/chapter1" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!b5-N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faebba132-e5e7-4794-ab5d-8109a1771e1d_1600x525.png 424w, https://substackcdn.com/image/fetch/$s_!b5-N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faebba132-e5e7-4794-ab5d-8109a1771e1d_1600x525.png 848w, https://substackcdn.com/image/fetch/$s_!b5-N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faebba132-e5e7-4794-ab5d-8109a1771e1d_1600x525.png 1272w, https://substackcdn.com/image/fetch/$s_!b5-N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faebba132-e5e7-4794-ab5d-8109a1771e1d_1600x525.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!b5-N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faebba132-e5e7-4794-ab5d-8109a1771e1d_1600x525.png" width="1456" height="478" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/aebba132-e5e7-4794-ab5d-8109a1771e1d_1600x525.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:478,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://huggingface.co/course/chapter1&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!b5-N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faebba132-e5e7-4794-ab5d-8109a1771e1d_1600x525.png 424w, https://substackcdn.com/image/fetch/$s_!b5-N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faebba132-e5e7-4794-ab5d-8109a1771e1d_1600x525.png 848w, https://substackcdn.com/image/fetch/$s_!b5-N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faebba132-e5e7-4794-ab5d-8109a1771e1d_1600x525.png 1272w, https://substackcdn.com/image/fetch/$s_!b5-N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faebba132-e5e7-4794-ab5d-8109a1771e1d_1600x525.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This looks like a super interesting course from the <a href="https://huggingface.co/">Hugging Face</a> team. It focuses on Transformer models, which are slowly becoming the standard in NLP, and delves into both the theory and usage with their <a href="https://github.com/huggingface/transformers">transformers library</a>. The first part of the course is out now and will help you get started with either Tensorflow or PyTorch, and more advanced lessons will become available later this year.&nbsp;</p><p>For folks who are thinking about deploying this in AWS, <a href="https://aws.amazon.com/blogs/compute/hosting-hugging-face-models-on-aws-lambda/">here</a> is a handy tutorial for running a serverless inference service using AWS Lambda.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://aws.amazon.com/blogs/compute/hosting-hugging-face-models-on-aws-lambda/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0Nhr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b014be3-78fc-4ae2-ba1a-75627496d0d5_483x552.png 424w, https://substackcdn.com/image/fetch/$s_!0Nhr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b014be3-78fc-4ae2-ba1a-75627496d0d5_483x552.png 848w, https://substackcdn.com/image/fetch/$s_!0Nhr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b014be3-78fc-4ae2-ba1a-75627496d0d5_483x552.png 1272w, https://substackcdn.com/image/fetch/$s_!0Nhr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b014be3-78fc-4ae2-ba1a-75627496d0d5_483x552.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0Nhr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b014be3-78fc-4ae2-ba1a-75627496d0d5_483x552.png" width="359" height="410.2857142857143" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/5b014be3-78fc-4ae2-ba1a-75627496d0d5_483x552.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:552,&quot;width&quot;:483,&quot;resizeWidth&quot;:359,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://aws.amazon.com/blogs/compute/hosting-hugging-face-models-on-aws-lambda/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0Nhr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b014be3-78fc-4ae2-ba1a-75627496d0d5_483x552.png 424w, https://substackcdn.com/image/fetch/$s_!0Nhr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b014be3-78fc-4ae2-ba1a-75627496d0d5_483x552.png 848w, https://substackcdn.com/image/fetch/$s_!0Nhr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b014be3-78fc-4ae2-ba1a-75627496d0d5_483x552.png 1272w, https://substackcdn.com/image/fetch/$s_!0Nhr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5b014be3-78fc-4ae2-ba1a-75627496d0d5_483x552.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><a href="https://twitter.com/karpathy/status/1413242233394929664">Twitter | Challenges with annotating data for ML</a></h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/karpathy/status/1413242233394929664&quot;,&quot;full_text&quot;:&quot;Even after 4 years I still haven't \&quot;solved\&quot; labeling workflows. Labeling, QA, Final QA, auto-labeling, error-spotting, diversity massaging, labeling docs + versioning, ppl training, escalations, data cleaning, throughput &amp;amp; quality stats, eval sets + categorization &amp;amp; boosting, ...&quot;,&quot;username&quot;:&quot;karpathy&quot;,&quot;name&quot;:&quot;Andrej Karpathy&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Thu Jul 08 21:02:59 +0000 2021&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:176,&quot;like_count&quot;:1853,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>Andrej Karpathy, Director of AI at Tesla, has shared some of the details and (fun) challenges with building Tesla&#8217;s self-driving suite of capabilities on multiple occasions. We covered his talk about the Tesla data engine in a <a href="https://mlopsroundup.substack.com/p/issue-15-ai-for-self-driving-at-tesla">previous edition of the newsletter</a>. His recent tweets about designing good labeling/annotation workflows to collect labeled data for training models are worth reading.&nbsp;</p><p>The challenges he highlighted - writing good labeling docs (aka the &#8220;algorithm&#8221; human labelers should follow), training labelers, designing tools to do this task with good quality and throughput, deciding which data should be labeled - are all relatable and something we have experienced first hand at some of the companies we&#8217;ve worked at. A couple of recent examples of products trying to tackle this problem that we&#8217;ve come across include <a href="https://labelbox.com/?utm_source=google&amp;utm_medium=cpc&amp;utm_term=labelbox&amp;utm_content=517787527859&amp;utm_campaign=10865713093&amp;gclid=Cj0KCQjwraqHBhDsARIsAKuGZeEHZ59-hjNgingwd05eZ6DNP75iEVeNmUvYK2QrFSZiHQZwFHPBSY4aAiA9EALw_wcB">LabelBox</a> and <a href="https://scale.com/nucleus">Scale Nucleus</a>. </p><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> or email us at <a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a>. If you like what we are doing please tell your friends and colleagues to spread the word.</p>]]></content:encoded></item><item><title><![CDATA[Issue #21: Selecting MLOps Capabilities. Continuous Delivery For ML. Static Language Modelling. Sharing Data with AWS. ]]></title><description><![CDATA[Welcome to the 21st issue of the MLOps newsletter. In this issue, we cover some tips from Google Cloud on choosing the right MLOps capabilities, share what continuous delivery for ML systems looks like, deep dive into the performance of language models over time, discuss the implications of AWS terms of service, and much more.]]></description><link>https://mlopsroundup.substack.com/p/issue-21-selecting-mlops-capabilities</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-21-selecting-mlops-capabilities</guid><dc:creator><![CDATA[Nihit Desai]]></dc:creator><pubDate>Mon, 28 Jun 2021 17:04:20 GMT</pubDate><enclosure url="https://cdn.substack.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9ee7f6-bfd4-49da-853b-a833bc9135a8_1600x824.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mhxa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc12c4ae2-a101-477a-80e1-cad58b878081_1000x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mhxa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc12c4ae2-a101-477a-80e1-cad58b878081_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!mhxa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc12c4ae2-a101-477a-80e1-cad58b878081_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!mhxa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc12c4ae2-a101-477a-80e1-cad58b878081_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!mhxa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc12c4ae2-a101-477a-80e1-cad58b878081_1000x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mhxa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc12c4ae2-a101-477a-80e1-cad58b878081_1000x400.png" width="1000" height="400" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/c12c4ae2-a101-477a-80e1-cad58b878081_1000x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mhxa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc12c4ae2-a101-477a-80e1-cad58b878081_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!mhxa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc12c4ae2-a101-477a-80e1-cad58b878081_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!mhxa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc12c4ae2-a101-477a-80e1-cad58b878081_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!mhxa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc12c4ae2-a101-477a-80e1-cad58b878081_1000x400.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Welcome to the 21st issue of the MLOps newsletter.&nbsp;</p><p>In this issue, we cover some tips from Google Cloud on choosing the right MLOps capabilities, share what continuous delivery for ML systems looks like, deep dive into the performance of language models over time, discuss the implications of AWS terms of service, and much more.&nbsp;</p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><a href="https://cloud.google.com/blog/products/ai-machine-learning/select-the-right-mlops-capabilities-for-your-ml-use-case">Google Cloud Blog | Getting started with MLOps: Selecting the right capabilities for your use case</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://cloud.google.com/blog/products/ai-machine-learning/select-the-right-mlops-capabilities-for-your-ml-use-case" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oaQ4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2f2e5d7-5ede-476e-89d4-1bda5594880c_1600x1150.jpeg 424w, https://substackcdn.com/image/fetch/$s_!oaQ4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2f2e5d7-5ede-476e-89d4-1bda5594880c_1600x1150.jpeg 848w, https://substackcdn.com/image/fetch/$s_!oaQ4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2f2e5d7-5ede-476e-89d4-1bda5594880c_1600x1150.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!oaQ4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2f2e5d7-5ede-476e-89d4-1bda5594880c_1600x1150.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oaQ4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2f2e5d7-5ede-476e-89d4-1bda5594880c_1600x1150.jpeg" width="1456" height="1047" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/b2f2e5d7-5ede-476e-89d4-1bda5594880c_1600x1150.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1047,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://cloud.google.com/blog/products/ai-machine-learning/select-the-right-mlops-capabilities-for-your-ml-use-case&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oaQ4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2f2e5d7-5ede-476e-89d4-1bda5594880c_1600x1150.jpeg 424w, https://substackcdn.com/image/fetch/$s_!oaQ4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2f2e5d7-5ede-476e-89d4-1bda5594880c_1600x1150.jpeg 848w, https://substackcdn.com/image/fetch/$s_!oaQ4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2f2e5d7-5ede-476e-89d4-1bda5594880c_1600x1150.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!oaQ4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2f2e5d7-5ede-476e-89d4-1bda5594880c_1600x1150.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In a previous edition of the newsletter, we covered <a href="https://mlopsroundup.substack.com/p/issue-19-mlops-tooling-vertex-ai">VertexAI</a>, a managed machine learning platform on top of the Google Cloud Platform for companies to train, deploy and maintain their AI models. As a follow-up, Google has released a structured framework to think through what a mature <a href="https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf">production MLOps stack</a> can look like. In this article, <a href="https://www.linkedin.com/in/aniftos/">Christos Aniftos</a> shares valuable insights about how to map the various capabilities on offer to your company&#8217;s use cases.&nbsp;</p><ul><li><p><strong>Pilots: </strong>These use cases are about rapid iteration and testing a proof of concept. The capabilities most helpful for such applications are experimentation tracking and data processing &amp; transformation capabilities.</p></li><li><p><strong>Production/Mission Critical: </strong>These applications are on the critical path of serving customer needs and creating business value. Failures can have a significant negative impact (legal, ethical, reputational, or financial risks). In such cases, robust offline evaluation is important (to identify poor performance or potential bias before a model goes live). Additionally, production monitoring is critical to keep an eye on post-launch performance</p></li><li><p><strong>Reusability &amp; Collaboration: </strong>In cases where the same data source, feature sets, or model architecture powers multiple production ML use cases, it is helpful to standardize and have a single source of truth for these assets. E.g. a feature store helps the processes of registering, storing, and serving features for multiple ML models.&nbsp;</p></li><li><p><strong>Ad-hoc vs Frequent retraining: </strong>For use cases where ML models are not recurringly trained, production monitoring is critical to detect drifts, outliers, and anomalies. For use cases where recurring training is part of the ML workflow, it is important to have end-to-end and reproducible ML workflows that stitch together various parts of the ML development and evaluation lifecycle. Additionally, online experimentation might be required too depending on your use cases.</p></li></ul><p>The article goes into a few more examples and we recommend reading it. Our takeaway from the article, which we have expressed previously when covering the MLOps ecosystem as well, was that there isn&#8217;t a &#8220;one-size fits all&#8221; solution stack for production machine learning. Given multiple companies and open source tools specializing in one or few aspects of the ML development cycle, and the need for flexibility, we think initiatives like <a href="https://ai-infrastructure.org/">AI Infrastructure Alliance</a> can help define a common standard of interoperability among these products in the long term.</p><h2><a href="https://martinfowler.com/articles/cd4ml.html">Martin Fowler Blog | Continuous Delivery for Machine Learning</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://martinfowler.com/articles/cd4ml.html" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8ml3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9ee7f6-bfd4-49da-853b-a833bc9135a8_1600x824.png 424w, https://substackcdn.com/image/fetch/$s_!8ml3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9ee7f6-bfd4-49da-853b-a833bc9135a8_1600x824.png 848w, https://substackcdn.com/image/fetch/$s_!8ml3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9ee7f6-bfd4-49da-853b-a833bc9135a8_1600x824.png 1272w, https://substackcdn.com/image/fetch/$s_!8ml3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9ee7f6-bfd4-49da-853b-a833bc9135a8_1600x824.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8ml3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9ee7f6-bfd4-49da-853b-a833bc9135a8_1600x824.png" width="1456" height="750" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/ec9ee7f6-bfd4-49da-853b-a833bc9135a8_1600x824.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:750,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://martinfowler.com/articles/cd4ml.html&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8ml3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9ee7f6-bfd4-49da-853b-a833bc9135a8_1600x824.png 424w, https://substackcdn.com/image/fetch/$s_!8ml3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9ee7f6-bfd4-49da-853b-a833bc9135a8_1600x824.png 848w, https://substackcdn.com/image/fetch/$s_!8ml3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9ee7f6-bfd4-49da-853b-a833bc9135a8_1600x824.png 1272w, https://substackcdn.com/image/fetch/$s_!8ml3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9ee7f6-bfd4-49da-853b-a833bc9135a8_1600x824.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The process for developing, deploying, and improving ML applications is more complex compared to traditional software. <a href="https://martinfowler.com/bliki/ContinuousDelivery.html">Continuous Delivery</a> has been the approach to bring automation and standardization to create a reliable process to release software into production. In this article, Martin Fowler discusses Continuous Delivery for Machine Learning (CD4ML), the discipline of bringing Continuous Delivery principles and practices to Machine Learning applications.</p><h4><strong>Why is it hard?</strong></h4><p>ML applications have three underlying components that can cause the end output to change:&nbsp; the code (application code), the model (learned parameters), or the data (training or validation dataset used for model training). Their interaction is often complex and hard to predict which is what makes the problem harder compared to traditional software.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://martinfowler.com/articles/cd4ml.html" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DEGl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F560d0b8f-1710-48d8-9580-8f68f2d9f7ae_1266x540.png 424w, https://substackcdn.com/image/fetch/$s_!DEGl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F560d0b8f-1710-48d8-9580-8f68f2d9f7ae_1266x540.png 848w, https://substackcdn.com/image/fetch/$s_!DEGl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F560d0b8f-1710-48d8-9580-8f68f2d9f7ae_1266x540.png 1272w, https://substackcdn.com/image/fetch/$s_!DEGl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F560d0b8f-1710-48d8-9580-8f68f2d9f7ae_1266x540.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DEGl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F560d0b8f-1710-48d8-9580-8f68f2d9f7ae_1266x540.png" width="1266" height="540" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/560d0b8f-1710-48d8-9580-8f68f2d9f7ae_1266x540.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:1266,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://martinfowler.com/articles/cd4ml.html&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DEGl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F560d0b8f-1710-48d8-9580-8f68f2d9f7ae_1266x540.png 424w, https://substackcdn.com/image/fetch/$s_!DEGl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F560d0b8f-1710-48d8-9580-8f68f2d9f7ae_1266x540.png 848w, https://substackcdn.com/image/fetch/$s_!DEGl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F560d0b8f-1710-48d8-9580-8f68f2d9f7ae_1266x540.png 1272w, https://substackcdn.com/image/fetch/$s_!DEGl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F560d0b8f-1710-48d8-9580-8f68f2d9f7ae_1266x540.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Components of CD4ML</strong></h4><ol><li><p><strong>Discoverable and Accessible Data: </strong>it is important that the data is easily discoverable and accessible. The harder it is to find the data, the longer it will take for them to build useful models. This means having a well-defined and functioning infrastructure that makes the flow of logging/production data to a warehouse, data lake, or traditional database.</p></li><li><p><strong>Reproducible Model Training: </strong>It is critical to be able to reproduce and version artifacts in the ML training workflow (e.g. data clearing &amp; transformation, splitting into training vs validation sets, etc). Tools like <a href="https://dvc.org/">DVC</a> (Data Science Version Control) and <a href="https://www.pachyderm.io/open_source.html">Pachyderm</a> that we have covered previously can be a possible solution to solve this problem.&nbsp;</p></li><li><p><strong>Model Serving: </strong>There are three possible ways to deploy a model to do predictions online - embedded in the end application, deployed as a service (exposed as an API that is called by the consuming application), or published as data (which is ingested as a data stream by the consuming application).&nbsp;</p></li><li><p><strong>Testing:</strong> This step is equivalent to running quality checks, unit &amp; integration tests for software releases. However, in the case of ML applications, it is important to test not just the code but also the model and data. E.g. data validation checks to ensure schema consistency or statistical properties; model quality with validation checks with predefined unit test cases or offline performance thresholds.&nbsp;</p></li><li><p><strong>Experiments Tracking: </strong>To understand the performance of various experimental models that might be running online and their associated stability and performance. Tools like <a href="https://mlflow.org/docs/latest/tracking.html">MLflow</a> that we have covered previously can be a good option to accomplish this</p></li><li><p><strong>Model Monitoring:</strong> Once the model is deployed in production, we need to understand how it performs and close the data feedback loop. This consists of logging model inputs and predictions and creating relevant alerts, detecting outliers and drifts, monitoring various sub-populations of the traffic for possible model bias, etc.&nbsp;</p></li></ol><p>The article takes the example of a sales forecasting application to walk through a possible CD4ML setup and we highly recommend checking it out in detail. You can also check out the associated <a href="https://github.com/ThoughtWorksInc/cd4ml-workshop">code repository on Github.</a></p><h2><a href="https://arxiv.org/pdf/2102.01951.pdf">Paper | Pitfalls of Static Language Modelling</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://arxiv.org/pdf/2102.01951.pdf" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!V7aB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb3c33b-ad22-4900-8c3d-680802ba08c2_904x832.png 424w, https://substackcdn.com/image/fetch/$s_!V7aB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb3c33b-ad22-4900-8c3d-680802ba08c2_904x832.png 848w, https://substackcdn.com/image/fetch/$s_!V7aB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb3c33b-ad22-4900-8c3d-680802ba08c2_904x832.png 1272w, https://substackcdn.com/image/fetch/$s_!V7aB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb3c33b-ad22-4900-8c3d-680802ba08c2_904x832.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!V7aB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb3c33b-ad22-4900-8c3d-680802ba08c2_904x832.png" width="616" height="566.9380530973451" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/8fb3c33b-ad22-4900-8c3d-680802ba08c2_904x832.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:832,&quot;width&quot;:904,&quot;resizeWidth&quot;:616,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://arxiv.org/pdf/2102.01951.pdf&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!V7aB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb3c33b-ad22-4900-8c3d-680802ba08c2_904x832.png 424w, https://substackcdn.com/image/fetch/$s_!V7aB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb3c33b-ad22-4900-8c3d-680802ba08c2_904x832.png 848w, https://substackcdn.com/image/fetch/$s_!V7aB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb3c33b-ad22-4900-8c3d-680802ba08c2_904x832.png 1272w, https://substackcdn.com/image/fetch/$s_!V7aB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8fb3c33b-ad22-4900-8c3d-680802ba08c2_904x832.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a really interesting paper from researchers at Deepmind (one of whom is a subscriber to this newsletter!) that discussed the performance degradation of large language models over time. Given the explosion of large language models in the past few years (BERT, GPT-3, etc), and how excitedly they&#8217;re being adopted by practitioners, it&#8217;s important to understand their performance characteristics and potential failure points.&nbsp;</p><h4><strong>The Problem</strong></h4><p>As the authors put it:</p><blockquote><p>Our world is open-ended, non-stationary and constantly evolving; thus what we talk about and how we talk about it changes over time. This inherent dynamic nature of language comes in stark contrast to the current static language modelling paradigm, which constructs training and evaluation sets from overlapping time periods.</p></blockquote><p>A language model is a probability distribution over utterances (such as words or characters) that is learned from a set of observations. A good language model assigns a high probability to an utterance in the future, such as predicting the next word in a sentence.&nbsp;</p><p>Many NLP applications (such as question and answering systems) rely on language model pretraining and &#8220;require up-to-date factual knowledge of our ever-changing world.&#8221; Now imagine answering questions like &#8220;How many people in the world have died from COVID-19?&#8221; or &#8220;Who is the current President of the United States?&#8221; and you&#8217;ll see that the answer depends on when the question was asked. Using a language model that was trained on data from the past might be a recipe for failure.&nbsp;</p><h4><strong>Findings</strong></h4><p>The authors trained language models on news and scientific datasets, and tested the models in two ways: first on documents that were in the same time period as the training data, and then on documents that were in the future, relative to the training data. For all datasets, they saw a significant increase in perplexity (which means that it was harder to predict the future utterance).&nbsp;</p><p>Further, they observed that:</p><blockquote><p>the model performs increasingly badly when it is asked to make predictions about test documents that are further away from the training period, demonstrating that model performance degrades more substantially with time.</p></blockquote><p>They also noticed that the drop in performance was much more significant when trying to predict words that were nouns - both proper nouns and common nouns (imagine a newly elected politician), or when the documents were about rapidly changing topics like sports or when they discussed emerging topics like Covid or 5G.&nbsp;</p><p>They also found that training even larger language models doesn&#8217;t address this problem.&nbsp;</p><h4><strong>Solutions?</strong></h4><p>One way to solve this problem is to retrain the language models frequently, but this is an expensive proposition (as an example, training a model like GPT-3 costs a few million dollars).&nbsp;</p><p>The authors report that a technique such as &#8220;<a href="http://proceedings.mlr.press/v80/krause18a.html">dynamic evaluation</a>&#8221; could help. Dynamic evaluation is a form of online learning that updates the parameters of an already trained model as new data becomes available. They find that this provides a benefit with successfully predicting emerging new words and topics. However, this might not be good enough -- dynamic evaluation might lead to the language model forgetting important concepts from the past (&#8220;<a href="https://towardsdatascience.com/forgetting-in-deep-learning-4672e8843a7f">catastrophic forgetting</a>&#8221;).&nbsp;</p><p>It&#8217;s best to be careful when using these language models, especially if they were trained a while ago. </p><h2><a href="https://techmonitor.ai/techonology/cloud/aws-user-data">Techmonitor Blog | AWS Customers are Opting in to Sharing AI Data Sets with Amazon Outside their Chosen Regions and Many Didn&#8217;t Know</a></h2><p>This is a post from July 2020 that remains quite relevant today. It gets into the weeds of AWS service terms and agreements but the legal implications are super interesting.&nbsp;</p><h4><strong>What happened?</strong></h4><p>As they report:</p><blockquote><p>The cloud provider is using customers&#8217; &#8220;AI content&#8221; for its own product development purposes. It also reserves the right in its small print to store this material outside the geographic regions that AWS customers have explicitly selected.</p></blockquote><p>Digging into the <a href="https://aws.amazon.com/service-terms/">service terms of AWS</a>, &#8220;AI Content&#8221; is &#8220;Your Content that is processed by an AI Service&#8221;. AI Services include AWS Services such as Amazon Comprehend (which provides NLP algorithms such as Sentiment Analysis), Amazon Polly (text to speech), Amazon Transcribe (speech to text), Amazon Translate (machine translation), etc.&nbsp;</p><p>AWS&#8217;s service terms allow them to not only store the data that folks run through such AI services but move it to different AWS regions (which has data privacy implications) and use that data for the improvement of AWS&#8217;s AI services. It&#8217;s unlikely that most users of such AI services are aware that their data is being used in such a way, and AWS&#8217;s terms also say that AWS&#8217;s customers are responsible for informing their End Users of such usage! Sneaky!</p><h4><strong>What can you do?</strong></h4><p>Well, you can <a href="https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_ai-opt-out_create.html">opt out</a> of such usage by AI services. If you are using AWS&#8217;s AI services on any sensitive data, you might want to ask your DevOps team to enable this policy. </p><h2>Resources</h2><h4><a href="https://kevin-hanselman.github.io/dud/">New Library Alert: Dud</a></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://kevin-hanselman.github.io/dud/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!polJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0b508022-5ab7-4e1f-b224-e7705bc48703_1440x1156.png 424w, https://substackcdn.com/image/fetch/$s_!polJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0b508022-5ab7-4e1f-b224-e7705bc48703_1440x1156.png 848w, https://substackcdn.com/image/fetch/$s_!polJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0b508022-5ab7-4e1f-b224-e7705bc48703_1440x1156.png 1272w, https://substackcdn.com/image/fetch/$s_!polJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0b508022-5ab7-4e1f-b224-e7705bc48703_1440x1156.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!polJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0b508022-5ab7-4e1f-b224-e7705bc48703_1440x1156.png" width="470" height="377.30555555555554" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/0b508022-5ab7-4e1f-b224-e7705bc48703_1440x1156.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1156,&quot;width&quot;:1440,&quot;resizeWidth&quot;:470,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://kevin-hanselman.github.io/dud/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!polJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0b508022-5ab7-4e1f-b224-e7705bc48703_1440x1156.png 424w, https://substackcdn.com/image/fetch/$s_!polJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0b508022-5ab7-4e1f-b224-e7705bc48703_1440x1156.png 848w, https://substackcdn.com/image/fetch/$s_!polJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0b508022-5ab7-4e1f-b224-e7705bc48703_1440x1156.png 1272w, https://substackcdn.com/image/fetch/$s_!polJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0b508022-5ab7-4e1f-b224-e7705bc48703_1440x1156.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Dud is a tool for storing, versioning and reproducing large files alongside source code (and it has cool design principles). As the creators say:</p><blockquote><p>Dud is heavily inspired by DVC. If DVC is Django, Dud aims to be Flask. Dud is much faster, it has a smaller feature set, and it is distributed as a single executable.</p></blockquote><p>If you&#8217;re dealing with large datasets, and want a quick and easy tool, this might be it! </p><h4><a href="https://engineering.fb.com/2021/06/21/open-source/kats/">New Library Alert | Kats</a></h4><p>In an <a href="https://mlopsroundup.substack.com/p/issue-18-mlops-on-coursera-datalift">earlier issue</a>, we had shared a Python library for time series forecasting called <a href="https://engineering.linkedin.com/blog/2021/greykite--a-flexible--intuitive--and-fast-forecasting-library">GreyKite</a>. Kats is a new Python library from Facebook that looks really neat - it provides forecasting, seasonality and outlier detection, and feature extraction capabilities for time series data. </p><h4><a href="https://https-deeplearning-ai.github.io/data-centric-comp/">Competition for data-centric AI by Andrew Ng</a></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://https-deeplearning-ai.github.io/data-centric-comp/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MRgr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2fe234ef-2d75-4675-af86-0b79e2396938_1296x338.png 424w, https://substackcdn.com/image/fetch/$s_!MRgr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2fe234ef-2d75-4675-af86-0b79e2396938_1296x338.png 848w, https://substackcdn.com/image/fetch/$s_!MRgr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2fe234ef-2d75-4675-af86-0b79e2396938_1296x338.png 1272w, https://substackcdn.com/image/fetch/$s_!MRgr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2fe234ef-2d75-4675-af86-0b79e2396938_1296x338.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MRgr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2fe234ef-2d75-4675-af86-0b79e2396938_1296x338.png" width="1296" height="338" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/2fe234ef-2d75-4675-af86-0b79e2396938_1296x338.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:338,&quot;width&quot;:1296,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://https-deeplearning-ai.github.io/data-centric-comp/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MRgr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2fe234ef-2d75-4675-af86-0b79e2396938_1296x338.png 424w, https://substackcdn.com/image/fetch/$s_!MRgr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2fe234ef-2d75-4675-af86-0b79e2396938_1296x338.png 848w, https://substackcdn.com/image/fetch/$s_!MRgr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2fe234ef-2d75-4675-af86-0b79e2396938_1296x338.png 1272w, https://substackcdn.com/image/fetch/$s_!MRgr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2fe234ef-2d75-4675-af86-0b79e2396938_1296x338.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We have <a href="https://mlopsroundup.substack.com/p/issue-16-eu-ai-regulations-data-centric">previously covered</a> Andrew Ng&#8217;s talk about striking the right balance between data and modeling improvements in real-world ML applications. <a href="https://landing.ai/">Landing.AI</a> and DeepLearning.AI, two initiatives cofounded by Andrew Ng recently announced the Data-Centric AI Competition. Traditionally, ML competitions invite participants to innovate on model architectures while keeping the dataset fixed. However, this competition aims to incentivize innovation in the creation and curation of high-quality data for ML by inverting the format. We welcome this competition and look forward to covering the progress and innovations that come out of it in future editions of the newsletter.</p><blockquote><p>&#8220;Machine learning has matured to the point that high-performance model architectures are widely available, while approaches to engineering datasets have lagged. The Data-Centric AI Competition inverts the traditional format and instead asks you to improve a dataset given a fixed model.&#8221;</p></blockquote><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> or email us at <a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a>. If you like what we are doing please tell your friends and colleagues to spread the word.</p>]]></content:encoded></item><item><title><![CDATA[Issue #20: AI Playbook. Curating Data. MLOps Resources. ML for Covid. ]]></title><description><![CDATA[Welcome to the 20th issue of the MLOps newsletter. In this issue, we will cover a McKinsey report on an AI playbook for executives, analyze a recent paper on curating data for NLP research, share some recent MLOps resources we&#8217;ve come across, and discuss ML for COVID detection.]]></description><link>https://mlopsroundup.substack.com/p/issue-20-ai-playbook-curating-data</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-20-ai-playbook-curating-data</guid><dc:creator><![CDATA[Nihit Desai]]></dc:creator><pubDate>Mon, 14 Jun 2021 17:08:00 GMT</pubDate><enclosure url="https://cdn.substack.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9174887-9022-4a09-84e1-827f5fc41982_1600x566.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WRSI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2521ef52-567d-4fad-8507-0da913b978da_1000x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WRSI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2521ef52-567d-4fad-8507-0da913b978da_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!WRSI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2521ef52-567d-4fad-8507-0da913b978da_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!WRSI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2521ef52-567d-4fad-8507-0da913b978da_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!WRSI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2521ef52-567d-4fad-8507-0da913b978da_1000x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WRSI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2521ef52-567d-4fad-8507-0da913b978da_1000x400.png" width="1000" height="400" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/2521ef52-567d-4fad-8507-0da913b978da_1000x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WRSI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2521ef52-567d-4fad-8507-0da913b978da_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!WRSI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2521ef52-567d-4fad-8507-0da913b978da_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!WRSI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2521ef52-567d-4fad-8507-0da913b978da_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!WRSI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2521ef52-567d-4fad-8507-0da913b978da_1000x400.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Welcome to the 20th issue of the MLOps newsletter.&nbsp;</p><p>In this issue, we will cover a McKinsey report on an AI playbook for executives, analyze a recent paper on curating data for NLP research, share some recent MLOps resources we&#8217;ve come across, and discuss ML for COVID detection.</p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><a href="https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/the-executives-ai-playbook">McKinsey Report | The Executive AI Playbook</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/the-executives-ai-playbook" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rjOa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9174887-9022-4a09-84e1-827f5fc41982_1600x566.png 424w, https://substackcdn.com/image/fetch/$s_!rjOa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9174887-9022-4a09-84e1-827f5fc41982_1600x566.png 848w, https://substackcdn.com/image/fetch/$s_!rjOa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9174887-9022-4a09-84e1-827f5fc41982_1600x566.png 1272w, https://substackcdn.com/image/fetch/$s_!rjOa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9174887-9022-4a09-84e1-827f5fc41982_1600x566.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rjOa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9174887-9022-4a09-84e1-827f5fc41982_1600x566.png" width="1456" height="515" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/c9174887-9022-4a09-84e1-827f5fc41982_1600x566.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:515,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/the-executives-ai-playbook&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rjOa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9174887-9022-4a09-84e1-827f5fc41982_1600x566.png 424w, https://substackcdn.com/image/fetch/$s_!rjOa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9174887-9022-4a09-84e1-827f5fc41982_1600x566.png 848w, https://substackcdn.com/image/fetch/$s_!rjOa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9174887-9022-4a09-84e1-827f5fc41982_1600x566.png 1272w, https://substackcdn.com/image/fetch/$s_!rjOa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9174887-9022-4a09-84e1-827f5fc41982_1600x566.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We recently came across <a href="https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/the-executives-ai-playbook">this interesting resource</a> from <a href="https://www.mckinsey.com/">McKinsey</a> for executives who are thinking about AI and the impact it can have on organizations.&nbsp;While we would take the $$ numbers for AI opportunities with a grain of salt, there might be some compelling arguments you might want to use in your organization.&nbsp;</p><p>For example, &#8220;breakaway&#8221; companies (organizations achieving better scale and value than others) are 13X more likely to spend &gt;25% of their budget on IT. (Let us know if you&#8217;re able to get the budget for a new MLOps tool using this! &#128515;)</p><p>Or that &#8220;breakaway&#8221; companies are 3X more likely to have well-defined analytics roles and career paths.&nbsp;</p><p>If you&#8217;d like to see more in-depth articles from McKinsey, read <a href="https://www.mckinsey.com/featured-insights/artificial-intelligence/notes-from-the-ai-frontier-applications-and-value-of-deep-learning">this article</a> on market sizing for AI in different industries, <a href="https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/breaking-away-the-secrets-to-scaling-analytics">this article</a> on how to scale analytics and ML at your company, and <a href="https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/ten-red-flags-signaling-your-analytics-program-will-fail">this one</a> on red flags with analytics in your organization. </p><h2><a href="https://aws.amazon.com/blogs/aws/amazon-sagemaker-named-as-the-outright-leader-in-enterprise-mlops-platforms">AWS Blog | &#8220;Amazon SageMaker Named as the Outright Leader in Enterprise MLOps Platforms&#8221;</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://aws.amazon.com/blogs/aws/amazon-sagemaker-named-as-the-outright-leader-in-enterprise-mlops-platforms" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DzcX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F583b0069-1bb2-45c5-a151-c9e6f2d58b68_1512x1110.png 424w, https://substackcdn.com/image/fetch/$s_!DzcX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F583b0069-1bb2-45c5-a151-c9e6f2d58b68_1512x1110.png 848w, https://substackcdn.com/image/fetch/$s_!DzcX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F583b0069-1bb2-45c5-a151-c9e6f2d58b68_1512x1110.png 1272w, https://substackcdn.com/image/fetch/$s_!DzcX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F583b0069-1bb2-45c5-a151-c9e6f2d58b68_1512x1110.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DzcX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F583b0069-1bb2-45c5-a151-c9e6f2d58b68_1512x1110.png" width="614" height="450.8008241758242" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/583b0069-1bb2-45c5-a151-c9e6f2d58b68_1512x1110.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1069,&quot;width&quot;:1456,&quot;resizeWidth&quot;:614,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://aws.amazon.com/blogs/aws/amazon-sagemaker-named-as-the-outright-leader-in-enterprise-mlops-platforms&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DzcX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F583b0069-1bb2-45c5-a151-c9e6f2d58b68_1512x1110.png 424w, https://substackcdn.com/image/fetch/$s_!DzcX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F583b0069-1bb2-45c5-a151-c9e6f2d58b68_1512x1110.png 848w, https://substackcdn.com/image/fetch/$s_!DzcX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F583b0069-1bb2-45c5-a151-c9e6f2d58b68_1512x1110.png 1272w, https://substackcdn.com/image/fetch/$s_!DzcX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F583b0069-1bb2-45c5-a151-c9e6f2d58b68_1512x1110.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Well, it&#8217;s not uncommon for companies to toot their own horn. However, we don&#8217;t see it this blatantly from AWS, so we thought we&#8217;d share it. </p><p>AWS claims (well, it&#8217;s actually a marketing research company called Omdia) that Amazon Sagemaker is the top Enterprise MLOps platform today. It&#8217;s no secret that  Sagemaker is a great ML platform, and the speed at which they&#8217;ve added functionality is breathtaking.&nbsp;Over the course of four years, they&#8217;ve managed to add all the features you can see below. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://aws.amazon.com/blogs/aws/amazon-sagemaker-named-as-the-outright-leader-in-enterprise-mlops-platforms" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Crim!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0027c6c7-aee5-44dd-afd6-02f3ba414b3a_1024x544.png 424w, https://substackcdn.com/image/fetch/$s_!Crim!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0027c6c7-aee5-44dd-afd6-02f3ba414b3a_1024x544.png 848w, https://substackcdn.com/image/fetch/$s_!Crim!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0027c6c7-aee5-44dd-afd6-02f3ba414b3a_1024x544.png 1272w, https://substackcdn.com/image/fetch/$s_!Crim!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0027c6c7-aee5-44dd-afd6-02f3ba414b3a_1024x544.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Crim!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0027c6c7-aee5-44dd-afd6-02f3ba414b3a_1024x544.png" width="1024" height="544" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/0027c6c7-aee5-44dd-afd6-02f3ba414b3a_1024x544.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:544,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://aws.amazon.com/blogs/aws/amazon-sagemaker-named-as-the-outright-leader-in-enterprise-mlops-platforms&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Crim!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0027c6c7-aee5-44dd-afd6-02f3ba414b3a_1024x544.png 424w, https://substackcdn.com/image/fetch/$s_!Crim!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0027c6c7-aee5-44dd-afd6-02f3ba414b3a_1024x544.png 848w, https://substackcdn.com/image/fetch/$s_!Crim!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0027c6c7-aee5-44dd-afd6-02f3ba414b3a_1024x544.png 1272w, https://substackcdn.com/image/fetch/$s_!Crim!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0027c6c7-aee5-44dd-afd6-02f3ba414b3a_1024x544.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is what the marketing research report had to say:</p><blockquote><p>&#8220;AWS is the outright leader in the Omdia comparative review of enterprise MLOps platforms. Across almost every measure, the company significantly outscored its rivals, delivering consistent value across the entire ML lifecycle. AWS delivers highly differentiated functionality that targets highly impactful areas of concern for enterprise AI practitioners seeking to not just operationalize but also scale AI across the business.&#8221;</p></blockquote><h4><strong>Our Take</strong>:</h4><p>This report only does a comparison between the cloud ML platforms and the end-to-end ML platforms. We are seeing a lot of new MLOps tools which are very specialized (just for labeling, or deployment, or monitoring), and we expect teams to make the best decisions for their very unique circumstances. This might naturally lead to teams picking the best-in-breed solution for each part of the ML workflow, rather than a generic ML platform. We are excited to see how this market develops. </p><h2><a href="https://arxiv.org/pdf/2105.13947.pdf">Paper | Changing the World by Changing the Data</a></h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/annargrs/status/1399290146495860739&quot;,&quot;full_text&quot;:&quot;&#127880; <span class=\&quot;tweet-fake-link\&quot;>#NLPaperAlert</span>: Changing the World &#127757;  by Changing the Data &#128451;\n<a class=\&quot;tweet-url\&quot; href=\&quot;https://arxiv.org/abs/2105.13947\&quot;>arxiv.org/abs/2105.13947</a>\nA soul-searching piece that made it to ACL 2021:\n- how NLP resources affect the world\n- what does it even mean to 'work in NLP'\n- how we can make better use of our subcommunities.\n/1&quot;,&quot;username&quot;:&quot;annargrs&quot;,&quot;name&quot;:&quot;Anna Rogers @ NAACL2021&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Mon May 31 09:02:23 +0000 2021&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:91,&quot;like_count&quot;:394,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{&quot;url&quot;:&quot;https://arxiv.org/abs/2105.13947&quot;,&quot;image&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/0c869b92-4cae-4270-913b-b69ed0720564_16x16.png&quot;,&quot;title&quot;:&quot;Changing the World by Changing the Data&quot;,&quot;description&quot;:&quot;NLP community is currently investing a lot more research and resources into\ndevelopment of deep learning models than training data. While we have made a\nlot of progress, it is now clear that our models learn all kinds of spurious\npatterns, social biases, and annotation artifacts. Algorithmic solutio&#8230;&quot;,&quot;domain&quot;:&quot;arxiv.org&quot;},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>We have <a href="https://mlopsroundup.substack.com/p/issue-16-eu-ai-regulations-data-centric">previously covered</a> Andrew Ng&#8217;s talk about striking the right balance between data and modeling improvements in real-world ML applications. Here we cover a related paper by <a href="https://twitter.com/annargrs">Anna Rogers</a> arguing about the importance of data curation for robust, inclusive, and secure NLP models.&nbsp;</p><blockquote><p>NLP community is currently investing a lot more research and resources into the development of deep learning models than training data. While we have made a lot of progress, it is now clear that our models learn all kinds of spurious patterns, social biases, and annotation artifacts.&nbsp;</p></blockquote><h4><strong>Arguments for curating data</strong></h4><ul><li><p><strong>Bias</strong>: Human written text contains all kinds of social biases based on gender, race, religion, class, age, etc. Models can learn these biases and even amplify them if we don&#8217;t curate the datasets that models learn from</p></li><li><p><strong>Privacy</strong>: Memorization of training data is a known issue in machine learning, including possibly personally identifiable information. This can be a privacy risk.</p></li><li><p><strong>Performance gap with respect to human-level NLU</strong>: The distributions of data in the current NLP resources such as web texts do not seem to provide enough signal for current models to do human-level language understanding</p></li><li><p><strong>Security</strong>: Certain concerns from adversarial attacks could be mitigated by having greater control over datasets. </p></li></ul><h4><strong>Arguments against curating data</strong></h4><ul><li><p><strong>Faithfully representing the world as it is</strong>: A language model should reflect how the language is used in the real world. Any data curation means that the input distribution to the model does not faithfully reflect the real world. </p></li><li><p><strong>Dataset is already the entire data universe</strong>: There are cases where the training data is not drawn from some distribution but represents the entirety of the data universe.</p></li><li><p><strong>An algorithmic approach to correct biases</strong>: Perhaps the way to tackle models learning biases is not to curate the data but to curate model training</p></li><li><p><strong>Against the long-term direction of AI</strong>: As the paper describes it, &#8220;The great promise of DL (Deep Learning) was to stop trying to define everything, and let the machine to identify and leverage patterns from huge datasets&#8221;. Explicitly curating data seems to go against this ethos.</p></li></ul><h4><strong>Our take</strong></h4><p>Irrespective of how one might feel about the role of data curation in NLP (and ML in general), let&#8217;s first agree on the desired outcome: we do want more robust, capable, and generalizable models, and we do not want models to learn/amplify stereotypes and biases. At the end of the day, most ML practitioners are pragmatists and will adopt any technique so long as it helps solve these problems at a reasonable cost. We share the author&#8217;s view: in most real-world ML applications, the constraints are about data quality &amp; quantity. </p><h2>New Resources for MLOps</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.eventbrite.ca/e/mlops-world-machine-learning-in-production-2021-tickets-141471026649" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4qC2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0216b177-02fb-48c3-96dc-c10b7f8be1f8_1402x680.png 424w, https://substackcdn.com/image/fetch/$s_!4qC2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0216b177-02fb-48c3-96dc-c10b7f8be1f8_1402x680.png 848w, https://substackcdn.com/image/fetch/$s_!4qC2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0216b177-02fb-48c3-96dc-c10b7f8be1f8_1402x680.png 1272w, https://substackcdn.com/image/fetch/$s_!4qC2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0216b177-02fb-48c3-96dc-c10b7f8be1f8_1402x680.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4qC2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0216b177-02fb-48c3-96dc-c10b7f8be1f8_1402x680.png" width="640" height="310.41369472182595" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/0216b177-02fb-48c3-96dc-c10b7f8be1f8_1402x680.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:680,&quot;width&quot;:1402,&quot;resizeWidth&quot;:640,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.eventbrite.ca/e/mlops-world-machine-learning-in-production-2021-tickets-141471026649&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4qC2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0216b177-02fb-48c3-96dc-c10b7f8be1f8_1402x680.png 424w, https://substackcdn.com/image/fetch/$s_!4qC2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0216b177-02fb-48c3-96dc-c10b7f8be1f8_1402x680.png 848w, https://substackcdn.com/image/fetch/$s_!4qC2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0216b177-02fb-48c3-96dc-c10b7f8be1f8_1402x680.png 1272w, https://substackcdn.com/image/fetch/$s_!4qC2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0216b177-02fb-48c3-96dc-c10b7f8be1f8_1402x680.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We&#8217;ve recently come across a few useful resources that we&#8217;d like to share with our readers:</p><ul><li><p><a href="https://madewithml.com/courses/mlops/">MLOps course</a>: We have shared <a href="https://madewithml.com/">MadeWithML</a> in a <a href="https://mlopsroundup.substack.com/p/issue-3-state-of-ai-behavioral-testing-ml-models-dynamic-benchmarks-data-versioning-madewithml-283540">previous</a> newsletter, and here we wanted to highlight a new course on their website focused on all aspects of &#8220;ML in production&#8221;. It is fairly hands-on, accompanied by tutorials, code snippets, and relies on open-source tools. We&#8217;d recommend checking it out if you are looking to get your hands dirty with anything MLOps.</p></li><li><p><a href="https://www.oreilly.com/library/view/practical-mlops/9781098103002/">Practical MLOps Book</a>: This book, written by <a href="https://noahgift.com/">Noah Gift </a>and <a href="https://www.linkedin.com/in/alfredodeza/">Alfredo Deza</a>, takes you through what MLOps is, how it differs from DevOps, and how to put it into practice to operationalize your machine learning models.&nbsp;</p></li><li><p><a href="https://www.eventbrite.ca/e/mlops-world-machine-learning-in-production-2021-tickets-141471026649">MLOps World 2021 Event</a>: The second annual MLOps World event is being held virtually this year from June 14-17. In addition to invited talks, we are also looking forward to the interactive workshops. You can check out the full list <a href="https://mlopsworld.com/#workshops">here</a>. </p></li></ul><h2><a href="https://www.bodyworkml.com/">New Product Alert | Bodywork</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.bodyworkml.com/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!S68U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cf241a-575e-453d-b28c-bb295433ddf1_1092x576.png 424w, https://substackcdn.com/image/fetch/$s_!S68U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cf241a-575e-453d-b28c-bb295433ddf1_1092x576.png 848w, https://substackcdn.com/image/fetch/$s_!S68U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cf241a-575e-453d-b28c-bb295433ddf1_1092x576.png 1272w, https://substackcdn.com/image/fetch/$s_!S68U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cf241a-575e-453d-b28c-bb295433ddf1_1092x576.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!S68U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cf241a-575e-453d-b28c-bb295433ddf1_1092x576.png" width="1092" height="576" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/d2cf241a-575e-453d-b28c-bb295433ddf1_1092x576.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:576,&quot;width&quot;:1092,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.bodyworkml.com/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!S68U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cf241a-575e-453d-b28c-bb295433ddf1_1092x576.png 424w, https://substackcdn.com/image/fetch/$s_!S68U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cf241a-575e-453d-b28c-bb295433ddf1_1092x576.png 848w, https://substackcdn.com/image/fetch/$s_!S68U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cf241a-575e-453d-b28c-bb295433ddf1_1092x576.png 1272w, https://substackcdn.com/image/fetch/$s_!S68U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2cf241a-575e-453d-b28c-bb295433ddf1_1092x576.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://www.bodyworkml.com/">Bodywork</a> looks like a really useful product to help deploy ML projects to Kubernetes. Many of the challenges involved with bringing ML projects to production are DevOps related, and tools that can automate the deployment process with simple APIs or commands are very handy.&nbsp;</p><p><a href="https://www.bodyworkml.com/posts/serving-uncertainty">Here</a> is a very well-written article by Alex Ioannides, one of the creators of Bodywork, that shows how a probabilistic model can be &#8220;trained&#8221; using Bayesian inference and then easily deployed using Bodywork. We learned some fun, new ML concepts by reading through this post!</p><h2><a href="https://twitter.com/haltakov/status/1402717293801508867">Twitter: Can you detect COVID-19 using Machine Learning? </a></h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/haltakov/status/1402717293801508867&quot;,&quot;full_text&quot;:&quot;Can you detect COVID-19 using Machine Learning? &#129300;\n\nYou have an X-ray or CT scan and the task is to detect if the patient has COVID-19 or not. Sounds doable, right?\n\nNone of the 415 ML papers published on the subject in 2020 was usable. Not a single one!\n\nLet's see why &#128071; &quot;,&quot;username&quot;:&quot;haltakov&quot;,&quot;name&quot;:&quot;Vladimir Haltakov&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Wed Jun 09 20:00:38 +0000 2021&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/E3dz8ASWEAYpba4.png&quot;,&quot;link_url&quot;:&quot;https://t.co/Vrd91ZpXy3&quot;,&quot;alt_text&quot;:null}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:1154,&quot;like_count&quot;:3502,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>We covered this story in an <a href="https://mlopsroundup.substack.com/p/issue-14-ai-index-report-2021-multimodal">earlier issue</a>, but it&#8217;s important enough to discuss again.</p><p>Here is an insightful thread by <a href="https://twitter.com/haltakov">Vladimir Haltakov</a>, a machine learning engineer based in Germany, about the challenges with applying Machine Learning to detect COVID-19 from data such as X-rays and CT scans (many of these learnings can be generalized to healthcare more broadly). Three key takeaways for us were:</p><ol><li><p>Researchers from Cambridge considered 415 papers on the topic published from January to October 2020. <strong>0 had any potential for clinical use!</strong></p></li><li><p>Most papers examined had problems with the quality and quantity of data - datasets were small, unbalanced and in some cases biased (e.g. Some papers used a dataset that contained non-COVID images from children and COVID images from adults).</p></li><li><p>For ML practitioners and medical professionals alike, if you&#8217;re interested to explore the applicability of machine learning to COVID detection, this Kaggle <a href="https://www.kaggle.com/c/siim-covid19-detection/data">challenge</a> is a good start.</p></li></ol><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> or email us at <a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a>. If you like what we are doing please tell your friends and colleagues to spread the word.</p>]]></content:encoded></item><item><title><![CDATA[Issue #19: MLOps Tooling. Vertex AI. Explainable ML in Deployment. Algorithmic Justice.]]></title><description><![CDATA[Welcome to the 19th issue of the MLOps newsletter.]]></description><link>https://mlopsroundup.substack.com/p/issue-19-mlops-tooling-vertex-ai</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-19-mlops-tooling-vertex-ai</guid><dc:creator><![CDATA[Nihit Desai]]></dc:creator><pubDate>Mon, 31 May 2021 17:03:19 GMT</pubDate><enclosure url="https://cdn.substack.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F27754ddd-13f9-49ac-ba00-f67029e7f409_1600x1110.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DDms!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ffc0a8-0a48-4410-bbb0-818cf3b8fd2f_1000x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DDms!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ffc0a8-0a48-4410-bbb0-818cf3b8fd2f_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!DDms!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ffc0a8-0a48-4410-bbb0-818cf3b8fd2f_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!DDms!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ffc0a8-0a48-4410-bbb0-818cf3b8fd2f_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!DDms!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ffc0a8-0a48-4410-bbb0-818cf3b8fd2f_1000x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DDms!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ffc0a8-0a48-4410-bbb0-818cf3b8fd2f_1000x400.png" width="462" height="184.8" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f7ffc0a8-0a48-4410-bbb0-818cf3b8fd2f_1000x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:1000,&quot;resizeWidth&quot;:462,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DDms!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ffc0a8-0a48-4410-bbb0-818cf3b8fd2f_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!DDms!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ffc0a8-0a48-4410-bbb0-818cf3b8fd2f_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!DDms!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ffc0a8-0a48-4410-bbb0-818cf3b8fd2f_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!DDms!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ffc0a8-0a48-4410-bbb0-818cf3b8fd2f_1000x400.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>Welcome to the 19th issue of the MLOps newsletter.&nbsp;</p><p>In this issue, we will cover an insightful perspective on the MLOps tooling landscape, dive into a recent announcement from Google, discuss explainability in real deployments, share the news on a proposed piece of legislation, and much more.</p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><a href="https://ljvmiranda921.github.io/notebook/2021/05/30/navigating-the-mlops-landscape-part-3/">Lj Miranda Blog | Navigating the MLOps tooling landscape</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://ljvmiranda921.github.io/notebook/2021/05/30/navigating-the-mlops-landscape-part-3/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-3kY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F27754ddd-13f9-49ac-ba00-f67029e7f409_1600x1110.png 424w, https://substackcdn.com/image/fetch/$s_!-3kY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F27754ddd-13f9-49ac-ba00-f67029e7f409_1600x1110.png 848w, https://substackcdn.com/image/fetch/$s_!-3kY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F27754ddd-13f9-49ac-ba00-f67029e7f409_1600x1110.png 1272w, https://substackcdn.com/image/fetch/$s_!-3kY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F27754ddd-13f9-49ac-ba00-f67029e7f409_1600x1110.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-3kY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F27754ddd-13f9-49ac-ba00-f67029e7f409_1600x1110.png" width="672" height="466.15384615384613" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/27754ddd-13f9-49ac-ba00-f67029e7f409_1600x1110.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1010,&quot;width&quot;:1456,&quot;resizeWidth&quot;:672,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://ljvmiranda921.github.io/notebook/2021/05/30/navigating-the-mlops-landscape-part-3/&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-3kY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F27754ddd-13f9-49ac-ba00-f67029e7f409_1600x1110.png 424w, https://substackcdn.com/image/fetch/$s_!-3kY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F27754ddd-13f9-49ac-ba00-f67029e7f409_1600x1110.png 848w, https://substackcdn.com/image/fetch/$s_!-3kY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F27754ddd-13f9-49ac-ba00-f67029e7f409_1600x1110.png 1272w, https://substackcdn.com/image/fetch/$s_!-3kY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F27754ddd-13f9-49ac-ba00-f67029e7f409_1600x1110.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here, we will be summarizing <a href="https://twitter.com/ljvmiranda921">Lj Miranda&#8217;s</a> wonderful 3-part series on navigating the MLOps tooling landscape (<a href="https://ljvmiranda921.github.io/notebook/2021/05/10/navigating-the-mlops-landscape/">Part 1</a>, <a href="https://ljvmiranda921.github.io/notebook/2021/05/15/navigating-the-mlops-landscape-part-2/">Part 2</a>, <a href="https://ljvmiranda921.github.io/notebook/2021/05/30/navigating-the-mlops-landscape-part-3/">Part 3</a>). He starts by looking into who these tools are for, then categorizes the tools into a neat 2x2, and finally lays out a framework for deciding what tools to use and why.&nbsp;</p><h4><strong>Who are these tools for and what is the eventual goal?</strong></h4><p>Miranda focuses on two personas: the ML Researcher and the Software Engineer. The ML Researcher wants to focus on training models and creating new features, which is what she is trained for. The software engineer, on the other hand, wants to have a seamless process for getting models into production.&nbsp;</p><p>The software engineer is dealing with the loop on the left, while the ML Researcher is dealing with the loop on the right. Together this constitutes the ML lifecycle, and the aim for MLOps tooling is to improve this lifecycle.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://ljvmiranda921.github.io/notebook/2021/05/10/navigating-the-mlops-landscape/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bP0j!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa8bb9b7-81fb-47d2-aef1-97fe27279439_1600x938.png 424w, https://substackcdn.com/image/fetch/$s_!bP0j!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa8bb9b7-81fb-47d2-aef1-97fe27279439_1600x938.png 848w, https://substackcdn.com/image/fetch/$s_!bP0j!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa8bb9b7-81fb-47d2-aef1-97fe27279439_1600x938.png 1272w, https://substackcdn.com/image/fetch/$s_!bP0j!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa8bb9b7-81fb-47d2-aef1-97fe27279439_1600x938.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bP0j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa8bb9b7-81fb-47d2-aef1-97fe27279439_1600x938.png" width="534" height="313.21153846153845" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/fa8bb9b7-81fb-47d2-aef1-97fe27279439_1600x938.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:854,&quot;width&quot;:1456,&quot;resizeWidth&quot;:534,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://ljvmiranda921.github.io/notebook/2021/05/10/navigating-the-mlops-landscape/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bP0j!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa8bb9b7-81fb-47d2-aef1-97fe27279439_1600x938.png 424w, https://substackcdn.com/image/fetch/$s_!bP0j!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa8bb9b7-81fb-47d2-aef1-97fe27279439_1600x938.png 848w, https://substackcdn.com/image/fetch/$s_!bP0j!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa8bb9b7-81fb-47d2-aef1-97fe27279439_1600x938.png 1272w, https://substackcdn.com/image/fetch/$s_!bP0j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa8bb9b7-81fb-47d2-aef1-97fe27279439_1600x938.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>How do you categorize these MLOps tools?&nbsp;</strong></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://ljvmiranda921.github.io/notebook/2021/05/15/navigating-the-mlops-landscape-part-2/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7qca!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7597a1e-1f5a-4b40-bb12-0b7f67b2ab2f_1600x1081.png 424w, https://substackcdn.com/image/fetch/$s_!7qca!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7597a1e-1f5a-4b40-bb12-0b7f67b2ab2f_1600x1081.png 848w, https://substackcdn.com/image/fetch/$s_!7qca!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7597a1e-1f5a-4b40-bb12-0b7f67b2ab2f_1600x1081.png 1272w, https://substackcdn.com/image/fetch/$s_!7qca!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7597a1e-1f5a-4b40-bb12-0b7f67b2ab2f_1600x1081.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7qca!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7597a1e-1f5a-4b40-bb12-0b7f67b2ab2f_1600x1081.png" width="530" height="358.1868131868132" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/b7597a1e-1f5a-4b40-bb12-0b7f67b2ab2f_1600x1081.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:984,&quot;width&quot;:1456,&quot;resizeWidth&quot;:530,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://ljvmiranda921.github.io/notebook/2021/05/15/navigating-the-mlops-landscape-part-2/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7qca!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7597a1e-1f5a-4b40-bb12-0b7f67b2ab2f_1600x1081.png 424w, https://substackcdn.com/image/fetch/$s_!7qca!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7597a1e-1f5a-4b40-bb12-0b7f67b2ab2f_1600x1081.png 848w, https://substackcdn.com/image/fetch/$s_!7qca!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7597a1e-1f5a-4b40-bb12-0b7f67b2ab2f_1600x1081.png 1272w, https://substackcdn.com/image/fetch/$s_!7qca!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7597a1e-1f5a-4b40-bb12-0b7f67b2ab2f_1600x1081.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Miranda looks at two axes to divide up all MLOps tools. First, what type of artifact do they help produce -- software or models? Second, what is the scope of the tool -- piecemeal, that is affecting only one or two processes in the ML lifecycle or all-in-one, providing an end-to-end ML solution?&nbsp;</p><p>This leads to four quadrants:</p><ul><li><p><strong>Cloud Platforms</strong>: This includes general-purpose cloud providers such as AWS, GCP, Azure along with other Big Data focused platforms such as Cloudera and Paperspace. Miranda also includes Cloud-based ML Platforms such as AWS Sagemaker here, although we would choose to include them in the next category.&nbsp;</p></li><li><p><strong>ML Platforms</strong>: This includes tools that address multiple components in the ML lifecycle such as ClearML, Valohai, and orchestration framework specifically addressing the ML process, such as Kubeflow and Metaflow.&nbsp;</p></li><li><p><strong>Specialized ML tools</strong>: These are tools that address a very specific component in the ML lifecycle. Weights and Biases for experiment management, DVC for data versioning, Prodigy for data annotation, etc.&nbsp;</p></li><li><p><strong>Standard SWE tools</strong>: This includes orchestration tools such as Airflow and CI/CD tools such as Jenkins.&nbsp;</p></li></ul><h4><strong>How to pick MLOps tools?</strong></h4><p>Miranda introduces the <a href="https://www.thoughtworks.com/radar">Thoughtworks Technology Radar</a> as a methodology to decide which tools to adopt, which ones to trial, which to assess, and which to hold out on. We&#8217;ll let you read the <a href="https://ljvmiranda921.github.io/notebook/2021/05/30/navigating-the-mlops-landscape-part-3/">post</a> if you&#8217;re interested in the full analysis but will share his build vs buy criteria here.&nbsp;</p><ul><li><p>Don&#8217;t build MLOps tooling if it isn&#8217;t your core business -- leave it to the companies focusing 100% of their effort on the problem.&nbsp;</p></li><li><p>Build integrations and connectors between tools -- this allows you to personalize your tooling choice for your business use cases and best utilize this nascent industry.&nbsp;</p></li><li><p>Buy specialized ML tools first -- they are easier to plug in and out and work best with existing workflow in your organizations. This is sage advice!</p></li></ul><h2><a href="https://cloud.google.com/vertex-ai">Google Cloud launches Vertex AI, a managed Machine Learning platform</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://techcrunch.com/2021/05/18/google-cloud-launches-vertex-a-new-managed-machine-learning-platform/?guccounter=1" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mByp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9a03c1ec-c83c-4f10-8c7d-1cb9571f66b7_1600x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mByp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9a03c1ec-c83c-4f10-8c7d-1cb9571f66b7_1600x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mByp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9a03c1ec-c83c-4f10-8c7d-1cb9571f66b7_1600x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mByp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9a03c1ec-c83c-4f10-8c7d-1cb9571f66b7_1600x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mByp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9a03c1ec-c83c-4f10-8c7d-1cb9571f66b7_1600x900.jpeg" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/9a03c1ec-c83c-4f10-8c7d-1cb9571f66b7_1600x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://techcrunch.com/2021/05/18/google-cloud-launches-vertex-a-new-managed-machine-learning-platform/?guccounter=1&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mByp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9a03c1ec-c83c-4f10-8c7d-1cb9571f66b7_1600x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mByp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9a03c1ec-c83c-4f10-8c7d-1cb9571f66b7_1600x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mByp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9a03c1ec-c83c-4f10-8c7d-1cb9571f66b7_1600x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mByp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9a03c1ec-c83c-4f10-8c7d-1cb9571f66b7_1600x900.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At Google I/O this year, Google announced Vertex AI, a new managed machine learning platform on top of GCP for companies to train, deploy and maintain their AI models. Andrew Moore, vice president, and general manager of Cloud AI told <a href="https://techcrunch.com/2021/05/18/google-cloud-launches-vertex-a-new-managed-machine-learning-platform/?guccounter=1">Techcrunch</a>:&nbsp;</p><blockquote><p>&#8220;We had two guiding lights while building Vertex AI: get data scientists and engineers out of the orchestration weeds, and create an industry-wide shift that would make everyone get serious about moving AI out of pilot purgatory and into full-scale production&#8221;</p></blockquote><p>Official VertexAI documentation on Google Cloud provides a good introduction to Vertex AI feature sets, pricing, etc and we recommend reading it if you are considering building new ML models or migrating your ML models to the cloud. We share some key highlights below:&nbsp;</p><ul><li><p><strong>Training with minimal code</strong>: With the help of <a href="https://cloud.google.com/automl">AutoML</a> ML engineers and data scientists can build models in less time. Additionally, companies can take advantage of a centrally managed registry for all datasets across data types (vision, natural language, and structured data).</p></li><li><p><strong>Data quality</strong>: <a href="https://cloud.google.com/vertex-ai/docs/datasets/data-labeling-job">Vertex Data Labeling</a> is a data labeling service to generate custom labels for collecting data to train models.</p></li><li><p><strong>Support for open source ML frameworks</strong>: Vertex AI integrates with frameworks such as TensorFlow, PyTorch, and scikit-learn via custom containers for training and prediction.</p></li><li><p><strong>MLOps</strong>: With services like <a href="https://cloud.google.com/vertex-ai/docs/pipelines/introduction">Vertex Pipelines</a> (for continuous model monitoring) and <a href="https://cloud.google.com/vertex-ai/docs/featurestore/overview">Vertex Feature Store</a> (for serving and sharing ML features across models) companies can deploy and maintain ML models in production.</p></li></ul><p>GCP has had support for most of the features announced as part of VertexAI even earlier (e.g. <a href="https://cloud.google.com/ai-platform/pipelines/docs">AI platform pipelines</a>, <a href="https://cloud.google.com/ai-platform/data-labeling/docs">AI platform data labeling</a>). In part, this announcement is a rebranding exercise to bring these disparate services under the same umbrella and improve interoperability among them. Overall though, we quite welcome Google&#8217;s announcement especially related to MLOps. This suggests to us that model monitoring and maintenance is an important priority for Google Cloud AI. </p><h2><a href="https://dl.acm.org/doi/pdf/10.1145/3351095.3375624">Paper | Explainable Machine Learning in Deployment</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dl.acm.org/doi/pdf/10.1145/3351095.3375624" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vLxo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F255d5dec-7d84-4b99-9428-680bd6b81608_1330x350.png 424w, https://substackcdn.com/image/fetch/$s_!vLxo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F255d5dec-7d84-4b99-9428-680bd6b81608_1330x350.png 848w, https://substackcdn.com/image/fetch/$s_!vLxo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F255d5dec-7d84-4b99-9428-680bd6b81608_1330x350.png 1272w, https://substackcdn.com/image/fetch/$s_!vLxo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F255d5dec-7d84-4b99-9428-680bd6b81608_1330x350.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vLxo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F255d5dec-7d84-4b99-9428-680bd6b81608_1330x350.png" width="1330" height="350" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/255d5dec-7d84-4b99-9428-680bd6b81608_1330x350.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:350,&quot;width&quot;:1330,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://dl.acm.org/doi/pdf/10.1145/3351095.3375624&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vLxo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F255d5dec-7d84-4b99-9428-680bd6b81608_1330x350.png 424w, https://substackcdn.com/image/fetch/$s_!vLxo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F255d5dec-7d84-4b99-9428-680bd6b81608_1330x350.png 848w, https://substackcdn.com/image/fetch/$s_!vLxo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F255d5dec-7d84-4b99-9428-680bd6b81608_1330x350.png 1272w, https://substackcdn.com/image/fetch/$s_!vLxo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F255d5dec-7d84-4b99-9428-680bd6b81608_1330x350.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We have discussed explainability in previous issues, such as <a href="https://mlopsroundup.substack.com/p/issue-16-eu-ai-regulations-data-centric">here</a> when we talked about using k-NNs for explaining and improving model behavior. This week, we cover a paper that explores how organizations use explainability in their day-to-day operations.&nbsp;</p><h4><strong>What is this Paper?&nbsp;</strong></h4><blockquote><p>In this paper, we explore how organizations have deployed local explainability techniques so that we can observe which techniques work best in practice, report on the shortcomings of existing techniques, and recommend paths for future research.</p></blockquote><p>This was done by conducting interviews with fifty people from approximately thirty organizations.</p><h4><strong>How is Explainability Used Today?&nbsp;</strong></h4><p>There are different sets of stakeholders who require explainability:</p><ul><li><p><strong>Executives</strong>: High-level executives often desire explainability as a goal for their systems, but it can end up being an item to be crossed off for their team.&nbsp;</p></li><li><p><strong>ML Engineers</strong>: These end up using explainability to debug what the model has learned or inspect how it performs for certain data points.&nbsp;</p></li><li><p><strong>End Users</strong>: This is the most intuitive consumer of an explanation (since they are directly impacted by the model predictions) and more explainable models lead to a more transparent product.&nbsp;</p></li><li><p><strong>Other stakeholders</strong>: This could include regulators, domain experts, annotators, etc</p></li></ul><p>Some of the needs being fulfilled by explainability today:</p><ul><li><p><strong>Model debugging</strong>: Data scientists want to understand why a model performs poorly on certain inputs, or if adding new features and removing old features would improve performance.&nbsp;</p></li><li><p><strong>Model monitoring</strong>: Teams often want to understand whether drift in certain features would impact their model performance (and knowing feature importance is key).&nbsp;</p></li><li><p><strong>Model transparency</strong>: Explanations increase model transparency, giving end users more trust in the product while satisfying regulations. This can help communicate with business teams and respond to customer complaints.&nbsp;</p></li><li><p><strong>Model audit</strong>: In financial organizations, all deployed ML models must go through an internal audit to satisfy regulatory requirements such as <a href="https://www.federalreserve.gov/supervisionreg/srletters/sr1107.htm">SR 11-7</a>. Explainability can provide guidance on &#8220;conceptual soundness&#8221; by evaluating the model on multiple data points.&nbsp;</p></li></ul><h4><strong>Takeaways and Concerns</strong></h4><p>The judgment of domain experts (ie labels from experts) is still considered ground truth. Explanation-based methods have a long way to go -- these can suffer from spurious correlations (something we covered previously <a href="https://mlopsroundup.substack.com/p/issue-8-toronto-ml-summit-gpt-2-ml">here</a>) and causal understanding remains challenging.</p><p>ML Engineers mostly use explainability techniques to:</p><blockquote><p>identify and reconcile inconsistencies between the model&#8217;s explanations and their intuition or that of domain experts, rather than for directly providing explanations to end users.</p></blockquote><p>There are other technical challenges -- computation of explanations can be slow (exponential in input dimensions, in the case of Shapley values), it can be tricky to produce feasible <a href="https://christophm.github.io/interpretable-ml-book/counterfactual.html">counterfactual explanations</a> and certain explanations can lead to privacy risks (by exposing details about the model behavior or training data).&nbsp;</p><p>Finally, being a new discipline, organizations are yet to figure out the right frameworks for deciding how they&#8217;re going to use explainability, who is it useful for, and when. We expect to see continued research in this field and will continue to cover stories of adoption when we can. </p><h2><a href="https://www.markey.senate.gov/news/press-releases/senator-markey-rep-matsui-introduce-legislation-to-combat-harmful-algorithms-and-create-new-online-transparency-regime">Regulation | The Algorithmic Justice and Online Platform Transparency Act of 2021</a></h2><p>Senator Edward J. Markey and Congresswoman Doris Matsui recently introduced the Algorithmic Justice and Online Platform Transparency Act of 2021, a copy of which can be found <a href="https://www.markey.senate.gov/download/ajopta">here.</a> The legislation specifically relates to algorithms used by companies like Facebook and Twitter to determine which content and advertisements to show to users. Congresswoman Doris Matsui noted in the above article:&nbsp;</p><blockquote><p>&#8220;For far too many Americans, long-held biases and systemic injustices contained within certain algorithms are perpetuating inequalities and barriers to access. The Algorithmic Justice and Online Platform Transparency Act is an essential roadmap for digital justice to move us forward on the path to online equity and stop these discriminatory practices. I look forward to working with Senator Markey and urge all of my colleagues to join us in this effort.&#8221;</p></blockquote><h4><strong>Highlights</strong></h4><p>Key highlights of the legislation can be categorized into two buckets. It should be noted that in the bill and the highlights described below, the term &#8220;algorithm&#8221; is really more like a machine learning model (or ensemble of models) combined typically with some business rules that typically constitute large-scale data products such as search engines, news feeds, etc.</p><p>(1) Preventing Harm to Users:</p><ul><li><p>Prohibit platforms from using algorithmic that discriminate on the basis of protected categories such as race, age, and gender</p></li><li><p>Platforms may not employ algorithms if they fail to take reasonable steps to ensure these algorithms achieve their intended purposes.</p></li><li><p>Create an inter-agency task force in the government to investigate potential discriminatory algorithms used online.</p></li></ul><p>(2) Transparency:&nbsp;</p><ul><li><p>Platforms will be required to explain to users how they use algorithms and what data is used to power them.</p></li><li><p>Platforms will be required to maintain details of how they build algorithms for review by the FTC.</p></li><li><p>Platforms will be required to publish annual reports of their content moderation practices.</p></li></ul><h4><strong>Connection to Section 230</strong></h4><p>As noted in this <a href="https://www.cnbc.com/2021/05/28/markey-matsui-bill-would-force-social-media-content-algorithm-transparency.html">article</a>, this legislation is a fresh approach to bringing more accountability in platforms&#8217; policies and processes around content moderation by not touching Section 230. While there is agreement across the political spectrum about Section 230 needing reform, the details surrounding the reform have become a contested partisan issue. By not touching or seeking to reform Section 230, this bill might have a better shot at getting support across the aisle. </p><h2><a href="https://www.anthropic.com/news/announcement">Anthropic: An effort to build reliable, interpretable and steerable AI systems</a></h2><p>Anthropic, an AI safety research company, recently came out of stealth with an announcement about their mission &amp; funding (a $124M Series A!!).</p><h4><strong>Problem</strong></h4><p>Large AI systems today can also be unpredictable, unreliable, and opaque. As stated in the announcement above, Anthropic&#8217;s goal is to &#8220;make progress on these issues&#8221;, primarily focusing on research. Some areas that are within the scope of their research include natural language understanding, reinforcement learning, and interpretability.</p><h4><strong>Team</strong></h4><p>While the announcement was light on Anthropic&#8217;s research and product plans, they have a <a href="https://twitter.com/AnthropicAI/following">stellar team</a> of scientists and engineers many of whom have worked at OpenAI and Google Brain previously on initiatives like <a href="https://arxiv.org/abs/2005.14165">GPT-3</a>, <a href="https://distill.pub/2020/circuits/">Circuit-Based Interpretability</a>, <a href="https://distill.pub/2021/multimodal-neurons/">Multimodal Neurons</a>, and <a href="https://arxiv.org/abs/2010.14701">Scaling Laws</a>. We wish the team success on this journey and look forward to covering their research in the coming months. </p><h2><a href="https://twitter.com/hima_lakkaraju/status/1390754121322467330">Learn about Explainability</a></h2><p>Here&#8217;s a great compilation of resources on explainability from <a href="https://twitter.com/hima_lakkaraju">Hima Lakkaraju</a>, Assistant Professor at Harvard University. </p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/hima_lakkaraju/status/1390754121322467330&quot;,&quot;full_text&quot;:&quot;If you have less than 3 hours to spare &amp;amp; want to learn (almost) everything about state-of-the-art explainable ML, this thread is for you! Below, I am sharing info about 4 of our recent tutorials on explainability presented at NeurIPS, AAAI, FAccT, and CHIL conferences. [1/n]&quot;,&quot;username&quot;:&quot;hima_lakkaraju&quot;,&quot;name&quot;:&quot;&#120439;&#120466;&#120470;&#120458; &#120443;&#120458;&#120468;&#120468;&#120458;&#120475;&#120458;&#120467;&#120478;&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Fri May 07 19:43:16 +0000 2021&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:891,&quot;like_count&quot;:3965,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by&nbsp;<a href="https://twitter.com/nihit_desai">Nihit Desai</a>&nbsp;and&nbsp;<a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us&nbsp;<a href="https://twitter.com/mlopsroundup">@mlopsroundup</a>&nbsp;(open to DMs as well) or email us at&nbsp;<a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a></p><p>If you like what we are doing please tell your friends and colleagues to spread the word.</p>]]></content:encoded></item><item><title><![CDATA[Issue #18: MLOps on Coursera. #datalift. Long-tail events. Teaching AI to Forget. ]]></title><description><![CDATA[Welcome to the 18th issue of the MLOps newsletter.]]></description><link>https://mlopsroundup.substack.com/p/issue-18-mlops-on-coursera-datalift</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-18-mlops-on-coursera-datalift</guid><dc:creator><![CDATA[Nihit Desai]]></dc:creator><pubDate>Mon, 17 May 2021 17:03:53 GMT</pubDate><enclosure url="https://cdn.substack.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a3efb16-6464-487f-9d3f-35ed32f07407_1234x632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YiEq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff36787-0a10-4b32-893f-259f6c885d8b_1000x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YiEq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff36787-0a10-4b32-893f-259f6c885d8b_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!YiEq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff36787-0a10-4b32-893f-259f6c885d8b_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!YiEq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff36787-0a10-4b32-893f-259f6c885d8b_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!YiEq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff36787-0a10-4b32-893f-259f6c885d8b_1000x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YiEq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff36787-0a10-4b32-893f-259f6c885d8b_1000x400.png" width="498" height="199.2" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/4ff36787-0a10-4b32-893f-259f6c885d8b_1000x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:1000,&quot;resizeWidth&quot;:498,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YiEq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff36787-0a10-4b32-893f-259f6c885d8b_1000x400.png 424w, https://substackcdn.com/image/fetch/$s_!YiEq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff36787-0a10-4b32-893f-259f6c885d8b_1000x400.png 848w, https://substackcdn.com/image/fetch/$s_!YiEq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff36787-0a10-4b32-893f-259f6c885d8b_1000x400.png 1272w, https://substackcdn.com/image/fetch/$s_!YiEq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4ff36787-0a10-4b32-893f-259f6c885d8b_1000x400.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>Welcome to the 18th issue of the MLOps newsletter. First things first, we are excited to show you some design changes to the newsletter. We have moved away from the default Substack orange, and have a newly designed logo (which you can see above) with its own dark version on <a href="https://twitter.com/mlopsroundup">Twitter</a>. Let us know how you feel about it by replying to this email or commenting on the Substack post. &#9999;&#65039;</p><p>In this issue, we share a new MLOps course and a virtual event where you can meet us,   discuss the effects of long-tail events on ML systems, deep dive into research on teaching ML models to forget, and more.</p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><a href="https://www.coursera.org/specializations/machine-learning-engineering-for-production-mlops">Andrew Ng on Coursera | </a><strong><a href="https://www.coursera.org/specializations/machine-learning-engineering-for-production-mlops">Machine Learning Engineering for Production (MLOps) Specialization</a></strong></h2><p>In a <a href="https://mlopsroundup.substack.com/p/issue-16-eu-ai-regulations-data-centric">previous issue</a> of our newsletter, we covered <a href="https://youtu.be/06-AZXmwHjo">Andrew Ng&#8217;s talk</a> about MLOps where he outlines the importance of striking the right balance between data and modeling improvements in real-world ML applications. In partnership with Coursera and DeepLearning.AI, Andrew Ng is teaching a course focused on MLOps. We are very happy to learn about this development and highly recommend checking it out:</p><blockquote><p>The Machine Learning Engineering for Production (MLOps) Specialization covers how to conceptualize, build, and maintain integrated systems that continuously operate in production. In striking contrast with standard machine learning modeling, production systems need to handle relentless evolving data. Moreover, the production system must run non-stop at the minimum cost while producing the maximum performance. In this Specialization, you will learn how to use well-established tools and methodologies for doing all of this effectively and efficiently. </p></blockquote><h2><a href="https://hopin.com/events/datalift-no-5-productionize-data-analytics-and-machine-learning?ref=49aee2b637b0">Event Reminder | #datalift No 5 - Productionize data analytics and machine learning</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://hopin.com/events/datalift-no-5-productionize-data-analytics-and-machine-learning?ref=49aee2b637b0" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MqNk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F80d5d49d-a30e-4990-a013-38c6b51863d4_1500x600.png 424w, https://substackcdn.com/image/fetch/$s_!MqNk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F80d5d49d-a30e-4990-a013-38c6b51863d4_1500x600.png 848w, https://substackcdn.com/image/fetch/$s_!MqNk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F80d5d49d-a30e-4990-a013-38c6b51863d4_1500x600.png 1272w, https://substackcdn.com/image/fetch/$s_!MqNk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F80d5d49d-a30e-4990-a013-38c6b51863d4_1500x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MqNk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F80d5d49d-a30e-4990-a013-38c6b51863d4_1500x600.png" width="1456" height="582" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/80d5d49d-a30e-4990-a013-38c6b51863d4_1500x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:582,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://hopin.com/events/datalift-no-5-productionize-data-analytics-and-machine-learning?ref=49aee2b637b0&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MqNk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F80d5d49d-a30e-4990-a013-38c6b51863d4_1500x600.png 424w, https://substackcdn.com/image/fetch/$s_!MqNk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F80d5d49d-a30e-4990-a013-38c6b51863d4_1500x600.png 848w, https://substackcdn.com/image/fetch/$s_!MqNk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F80d5d49d-a30e-4990-a013-38c6b51863d4_1500x600.png 1272w, https://substackcdn.com/image/fetch/$s_!MqNk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F80d5d49d-a30e-4990-a013-38c6b51863d4_1500x600.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://www.thedatalift.eu/">#datalift</a> is an initiative from the <a href="https://www.theguild.ai/">AI Guild</a> which brings together a community of AI practitioners, and business leaders to help bridge the gap between proof-of-concept and productionization. They&#8217;re hosting an event on May 28 with interesting speakers and opportunities to network with folks interested in ML deployments.&nbsp;</p><p><strong>We&#8217;re excited to announce that the MLOps Roundup will have a virtual booth at the event -- come say hi to us if you&#8217;re attending!</strong></p><h2><a href="https://doordash.engineering/2021/04/28/improving-eta-prediction-accuracy-for-long-tail-events/">DoorDash Blog | Improving ETA Prediction Accuracy for Long-tail Events</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://doordash.engineering/2021/04/28/improving-eta-prediction-accuracy-for-long-tail-events/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dKP_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F58234047-54af-4ad9-a1cd-7e8273edb60e_1593x745.png 424w, https://substackcdn.com/image/fetch/$s_!dKP_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F58234047-54af-4ad9-a1cd-7e8273edb60e_1593x745.png 848w, https://substackcdn.com/image/fetch/$s_!dKP_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F58234047-54af-4ad9-a1cd-7e8273edb60e_1593x745.png 1272w, https://substackcdn.com/image/fetch/$s_!dKP_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F58234047-54af-4ad9-a1cd-7e8273edb60e_1593x745.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dKP_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F58234047-54af-4ad9-a1cd-7e8273edb60e_1593x745.png" width="1456" height="681" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/58234047-54af-4ad9-a1cd-7e8273edb60e_1593x745.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:681,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:288327,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://doordash.engineering/2021/04/28/improving-eta-prediction-accuracy-for-long-tail-events/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dKP_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F58234047-54af-4ad9-a1cd-7e8273edb60e_1593x745.png 424w, https://substackcdn.com/image/fetch/$s_!dKP_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F58234047-54af-4ad9-a1cd-7e8273edb60e_1593x745.png 848w, https://substackcdn.com/image/fetch/$s_!dKP_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F58234047-54af-4ad9-a1cd-7e8273edb60e_1593x745.png 1272w, https://substackcdn.com/image/fetch/$s_!dKP_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F58234047-54af-4ad9-a1cd-7e8273edb60e_1593x745.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a really interesting <a href="https://doordash.engineering/2021/04/28/improving-eta-prediction-accuracy-for-long-tail-events/">blog post</a> from the Doordash Engineering team that discusses the impact of long-tail events on a specific ML system at Doordash and how they improved their system to address this problem.&nbsp;</p><h4><strong>What are long-tail events?</strong></h4><blockquote><p>Long-tail events are often problematic for businesses because they occur somewhat frequently but are difficult to predict. We define long-tail events as large deviations from the average that nevertheless happen with some regularity. Given the severity and frequency of long-tail events, being able to predict them accurately can greatly improve the customer experience.&nbsp;</p></blockquote><p>In the context of Doordash, their customers see an ETA before they get food delivered. This ETA sets an expectation for their customers for when their order will arrive, and while this system works well for most cases, a small number of late deliveries can have an outsized negative impact (we all know what it&#8217;s like to be &#8220;hangry&#8221; - remember the <a href="https://www.youtube.com/watch?v=yYBrFBWCzbU">Snickers commercials</a>?).&nbsp;</p><blockquote><p>The post makes a distinction between tail events and outliers:<br>Outliers tend to be extreme values that occur very infrequently. Typically they are less than 1% of the data...On the other hand, tail events represent occurrences that happen with some amount of regularity (typically 5-10%), such that they should be predictable to some degree.</p></blockquote><h4><strong>Challenges with tail events?</strong></h4><p>Given that tail events happen 5-10% of the time, there is a sizable opportunity in improving the ML system for these events.&nbsp;</p><p>However, it&#8217;s challenging because there is often not as much ground truth or factual information for an ML model to learn generalized patterns. It can also be difficult to obtain leading indicators that are correlated with the occurrence of a tail event. A simple example of this is in the case of an online retailer, a social media post from an influencer might cause a sudden spike in the demand for a product. It is almost impossible to predict something like this.&nbsp;</p><p>For Doordash, they could choose to always overestimate the ETA time as a safeguard against tail events, but this hurts revenue. Many people will choose not to get food delivered if ETA crosses some threshold. So the only recourse is to improve predictions on tail events.&nbsp;</p><h4><strong>Solutions</strong></h4><p>The Doordash team tried a few different options to address the challenges. First, they used bucketing and target encoding (read this good intro to <a href="https://maxhalford.github.io/blog/target-encoding/">target encoding</a>) for certain continuous-valued features (like marketplace health). This gave the model an easier path to learning the effect of such features on the target value.&nbsp;</p><p>Second, they introduced real-time features that captured real-time signals about the outcome they cared about.&nbsp;</p><blockquote><p>For example, we look at average delivery durations over the past 20 minutes at a store level and sub-region level. If anything, from an unexpected rainstorm to road construction, causes elevated delivery times, our ETAs model will be able to detect it through these real-time features and update accordingly.</p></blockquote><p>Finally, they tweaked their loss function from a linear to a quadratic loss function, which penalizes large deviations much more strongly.&nbsp;</p><h4><strong>Results</strong></h4><blockquote><p>Based on the experiment results, we were able to improve long-tail ETA accuracy by 10% (while maintaining constant average quotes). This led to significant improvements in the customer experience by reducing the frequency of very late orders, particularly during critical peak meal times when markets were supply-constrained.</p></blockquote><p>The lessons they share:</p><blockquote><p>First, investments in feature engineering tend to have the biggest returns.&nbsp;</p><p>Secondly, it&#8217;s helpful to curate a loss function that closely represents the business tradeoffs.</p></blockquote><h2><a href="https://ai.facebook.com/blog/teaching-ai-how-to-forget-at-scale/">Facebook AI Blog | Teaching AI to Forget at Scale</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://ai.facebook.com/blog/teaching-ai-how-to-forget-at-scale/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p_jI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a3efb16-6464-487f-9d3f-35ed32f07407_1234x632.png 424w, https://substackcdn.com/image/fetch/$s_!p_jI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a3efb16-6464-487f-9d3f-35ed32f07407_1234x632.png 848w, https://substackcdn.com/image/fetch/$s_!p_jI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a3efb16-6464-487f-9d3f-35ed32f07407_1234x632.png 1272w, https://substackcdn.com/image/fetch/$s_!p_jI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a3efb16-6464-487f-9d3f-35ed32f07407_1234x632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p_jI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a3efb16-6464-487f-9d3f-35ed32f07407_1234x632.png" width="1234" height="632" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7a3efb16-6464-487f-9d3f-35ed32f07407_1234x632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:632,&quot;width&quot;:1234,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://ai.facebook.com/blog/teaching-ai-how-to-forget-at-scale/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!p_jI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a3efb16-6464-487f-9d3f-35ed32f07407_1234x632.png 424w, https://substackcdn.com/image/fetch/$s_!p_jI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a3efb16-6464-487f-9d3f-35ed32f07407_1234x632.png 848w, https://substackcdn.com/image/fetch/$s_!p_jI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a3efb16-6464-487f-9d3f-35ed32f07407_1234x632.png 1272w, https://substackcdn.com/image/fetch/$s_!p_jI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a3efb16-6464-487f-9d3f-35ed32f07407_1234x632.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A <a href="https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)">Transformer</a> is a type of deep learning model architecture that utilizes the mechanism of attention to selectively focus on certain parts of the input that the model thinks are relevant for making the final prediction. While transformers have revolutionized language understanding, they suffer from the inability to scale to large pieces of text, primarily because of computational costs. Facebook recently published new research (<a href="https://scontent-sjc3-1.xx.fbcdn.net/v/t39.8562-6/185356217_1177665109329269_6669883335010742565_n.pdf?_nc_cat=111&amp;ccb=1-3&amp;_nc_sid=ae5e01&amp;_nc_eui2=AeFrqSO7ETOC_fNSL2v0HGwW16G5gk-Zy-bXobmCT5nL5nuFniku66QIJj2vWDRAws4&amp;_nc_ohc=j1veruiOF2wAX-LKhyv&amp;_nc_ht=scontent-sjc3-1.xx&amp;oh=a00c06e5bfbeeaf797a0db171df3be93&amp;oe=60C84E1E">paper</a>, <a href="https://github.com/facebookresearch/transformer-sequential?fbclid=IwAR3nWDXNJHt0L-ieqk-OEnPSboL-i83x-3tl0nJCCZeMyQfPFDJ4dzwIDqQ">code</a>) that allows models to &#8220;forget&#8221; or expire information it has learned in the past that might no longer be relevant, thus allowing attention mechanisms to scale to much longer sequences of inputs.</p><h4><strong>What is it?</strong></h4><p>In the paper above, authors introduce Expire-Scan, a new technique to allow neural networks to expire information it has learned in the past that might no longer be relevant. While the problem this paper solves has long been acknowledged,&nbsp; the main challenge so far has been that &#8220;expiring a given piece of information&#8221; is a discrete operation i.e. not differentiable. Expire-Span assigns an expiration value to each hidden state it has learned in the past and recomputes this value at each time step. In this way, the expiring information is a learnable parameter in the model.</p><h4><strong>Our take</strong></h4><p>As mentioned in the paper and in the article above, Expire-Scan is promising to scale attention to long pieces of text (or sequential inputs more generally - e.g. image frames in a video). It is also quite interesting to consider potential use cases of &#8220;memory expiration&#8221; for problems such as models encoding bias of the underlying datasets. For example, is it possible to set up training loss so that models learn to forget spurious (and often biased correlations) it has learned from data? Can we improve generalizability and tackle data/concept drifts by ensuring models forget information that might no longer be relevant? This is an exciting and novel idea, and we look forward to future research directions that build upon this work.</p><h2><a href="https://engineering.linkedin.com/blog/2021/greykite--a-flexible--intuitive--and-fast-forecasting-library">New Tool Alert | GreyKite from LinkedIn</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://engineering.linkedin.com/blog/2021/greykite--a-flexible--intuitive--and-fast-forecasting-library" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TcDY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7d0f3e-0cd7-4ba2-ba5d-6caaba3d24f8_1060x656.png 424w, https://substackcdn.com/image/fetch/$s_!TcDY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7d0f3e-0cd7-4ba2-ba5d-6caaba3d24f8_1060x656.png 848w, https://substackcdn.com/image/fetch/$s_!TcDY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7d0f3e-0cd7-4ba2-ba5d-6caaba3d24f8_1060x656.png 1272w, https://substackcdn.com/image/fetch/$s_!TcDY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7d0f3e-0cd7-4ba2-ba5d-6caaba3d24f8_1060x656.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TcDY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7d0f3e-0cd7-4ba2-ba5d-6caaba3d24f8_1060x656.png" width="448" height="277.25283018867924" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/2b7d0f3e-0cd7-4ba2-ba5d-6caaba3d24f8_1060x656.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:656,&quot;width&quot;:1060,&quot;resizeWidth&quot;:448,&quot;bytes&quot;:156186,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://engineering.linkedin.com/blog/2021/greykite--a-flexible--intuitive--and-fast-forecasting-library&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TcDY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7d0f3e-0cd7-4ba2-ba5d-6caaba3d24f8_1060x656.png 424w, https://substackcdn.com/image/fetch/$s_!TcDY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7d0f3e-0cd7-4ba2-ba5d-6caaba3d24f8_1060x656.png 848w, https://substackcdn.com/image/fetch/$s_!TcDY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7d0f3e-0cd7-4ba2-ba5d-6caaba3d24f8_1060x656.png 1272w, https://substackcdn.com/image/fetch/$s_!TcDY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7d0f3e-0cd7-4ba2-ba5d-6caaba3d24f8_1060x656.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>LinkedIn recently open-sourced <a href="https://engineering.linkedin.com/blog/2021/greykite--a-flexible--intuitive--and-fast-forecasting-library">Greykite</a> (<a href="https://github.com/linkedin/greykite">link</a> to Github), a Python library for time series forecasting. As part of the library, they are also open-sourcing the core algorithm used for predictions based on time series data called &#8216;Silverkite&#8217; (<a href="https://arxiv.org/pdf/2105.01098.pdf">link</a> to the paper if you&#8217;re interested). The authors have shown this algorithm to work well for cases like time-varying trends and seasonality, which we assume LinkedIn has to deal with quite frequently. The open-source article linked above, and the <a href="https://github.com/linkedin/greykite">Github repo</a> are quite comprehensive and deep-dive into multiple use cases. We recommend checking it out if you&#8217;re dealing with time-series data!</p><h2>Twitter Fun | Types of ML/NLP papers</h2><p>&#8220;All memes are wrong, but some are interesting&#8221; (or was it &#8220;<a href="https://www.lacan.upc.edu/admoreWeb/2018/05/all-models-are-wrong-but-some-are-useful-george-e-p-box/">All models are wrong, but some are useful</a>&#8221;?). This one was too real to not share! &#128515;</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/seb_ruder/status/1387886948438708224&quot;,&quot;full_text&quot;:&quot;Types of ML / NLP Papers &quot;,&quot;username&quot;:&quot;seb_ruder&quot;,&quot;name&quot;:&quot;Sebastian Ruder&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Thu Apr 29 21:50:08 +0000 2021&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/E0LDpBHWYAMxvWD.jpg&quot;,&quot;link_url&quot;:&quot;https://t.co/mdPMGUXL70&quot;,&quot;alt_text&quot;:null}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:818,&quot;like_count&quot;:3178,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/natashajaques/status/1387859601555554304&quot;,&quot;full_text&quot;:&quot;I couldn't resist. With contributions from <span class=\&quot;tweet-fake-link\&quot;>@maxhkw</span> &quot;,&quot;username&quot;:&quot;natashajaques&quot;,&quot;name&quot;:&quot;Natasha Jaques&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Thu Apr 29 20:01:28 +0000 2021&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/E0Kq8KhVcAIOVdp.jpg&quot;,&quot;link_url&quot;:&quot;https://t.co/3J7fz6dQut&quot;,&quot;alt_text&quot;:null}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:1606,&quot;like_count&quot;:6496,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> (open to DMs as well) or email us at <a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a></p><p>If you like what we are doing please tell your friends and colleagues to spread the word.</p>]]></content:encoded></item><item><title><![CDATA[Issue #17: FTC Guidance on AI. Feature Store vs Data Warehouse. #datalift. Unsupervised Dataset Generation.]]></title><description><![CDATA[Welcome to the 17th issue of the MLOps newsletter.]]></description><link>https://mlopsroundup.substack.com/p/issue-17-ftc-guidance-on-ai-feature</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-17-ftc-guidance-on-ai-feature</guid><dc:creator><![CDATA[Nihit Desai]]></dc:creator><pubDate>Mon, 03 May 2021 17:04:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7Y-s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb244a16f-afff-43e8-ac55-25f341f93fcf_1024x684.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 17th issue of the MLOps newsletter.</p><p>In this issue, we cover some recent developments on AI regulations, discuss the similarities and differences between data warehouses and feature stores, and share our thoughts on a recent paper about dataset generation using language models.</p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><a href="https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai">A Quick Update on the EU AI Regulations</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7Y-s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb244a16f-afff-43e8-ac55-25f341f93fcf_1024x684.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7Y-s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb244a16f-afff-43e8-ac55-25f341f93fcf_1024x684.png 424w, https://substackcdn.com/image/fetch/$s_!7Y-s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb244a16f-afff-43e8-ac55-25f341f93fcf_1024x684.png 848w, https://substackcdn.com/image/fetch/$s_!7Y-s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb244a16f-afff-43e8-ac55-25f341f93fcf_1024x684.png 1272w, https://substackcdn.com/image/fetch/$s_!7Y-s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb244a16f-afff-43e8-ac55-25f341f93fcf_1024x684.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7Y-s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb244a16f-afff-43e8-ac55-25f341f93fcf_1024x684.png" width="524" height="350.015625" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/b244a16f-afff-43e8-ac55-25f341f93fcf_1024x684.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:684,&quot;width&quot;:1024,&quot;resizeWidth&quot;:524,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7Y-s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb244a16f-afff-43e8-ac55-25f341f93fcf_1024x684.png 424w, https://substackcdn.com/image/fetch/$s_!7Y-s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb244a16f-afff-43e8-ac55-25f341f93fcf_1024x684.png 848w, https://substackcdn.com/image/fetch/$s_!7Y-s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb244a16f-afff-43e8-ac55-25f341f93fcf_1024x684.png 1272w, https://substackcdn.com/image/fetch/$s_!7Y-s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb244a16f-afff-43e8-ac55-25f341f93fcf_1024x684.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In <a href="https://mlopsroundup.substack.com/p/issue-16-eu-ai-regulations-data-centric">our last issue</a>, we had discussed the AI regulations that the EU had been considering. This week, the EU Commission released its <a href="https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai">proposed AI Framework</a> to address the risks associated with AI. It still needs to be adopted by the European Parliament and member states, but it is likely to pass through successfully. However, if this follows a path similar to GDPR, it will be a few years before it is enforceable (the earliest estimate is the second half of 2024).&nbsp;</p><p>From <a href="https://www.reuters.com/world/china/eu-aims-set-global-standards-ai-fines-violations-2021-04-21/">Reuters</a>, European tech chief Margrethe Vestager:</p><blockquote><p>"On artificial intelligence, trust is a must, not a nice to have. With these landmark rules, the EU is spearheading the development of new global norms to make sure AI can be trusted"</p></blockquote><h4><strong>Our very short take</strong></h4><p>AI risks are very real and need good regulation. That being said, we do need to be careful about poorly written regulation, and we hope that there will be significant discussion in the next few months and years about AI risks.&nbsp;</p><h2><a href="https://www.ftc.gov/news-events/blogs/business-blog/2021/04/aiming-truth-fairness-equity-your-companys-use-ai">FTC | Aiming for truth, fairness, and equity in your company&#8217;s use of AI</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!t0_T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6fcce2e-3c15-456b-a9e5-8f1c3cb0f3b2_878x328.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!t0_T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6fcce2e-3c15-456b-a9e5-8f1c3cb0f3b2_878x328.png 424w, https://substackcdn.com/image/fetch/$s_!t0_T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6fcce2e-3c15-456b-a9e5-8f1c3cb0f3b2_878x328.png 848w, https://substackcdn.com/image/fetch/$s_!t0_T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6fcce2e-3c15-456b-a9e5-8f1c3cb0f3b2_878x328.png 1272w, https://substackcdn.com/image/fetch/$s_!t0_T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6fcce2e-3c15-456b-a9e5-8f1c3cb0f3b2_878x328.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!t0_T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6fcce2e-3c15-456b-a9e5-8f1c3cb0f3b2_878x328.png" width="878" height="328" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/d6fcce2e-3c15-456b-a9e5-8f1c3cb0f3b2_878x328.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:328,&quot;width&quot;:878,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!t0_T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6fcce2e-3c15-456b-a9e5-8f1c3cb0f3b2_878x328.png 424w, https://substackcdn.com/image/fetch/$s_!t0_T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6fcce2e-3c15-456b-a9e5-8f1c3cb0f3b2_878x328.png 848w, https://substackcdn.com/image/fetch/$s_!t0_T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6fcce2e-3c15-456b-a9e5-8f1c3cb0f3b2_878x328.png 1272w, https://substackcdn.com/image/fetch/$s_!t0_T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6fcce2e-3c15-456b-a9e5-8f1c3cb0f3b2_878x328.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The FTC recently provided guidance to companies on building AI applications with an aim for truth, fairness, and equity, that we wanted to share with our readers. This guidance comes almost a year after a previous <a href="https://www.ftc.gov/news-events/blogs/business-blog/2020/04/using-artificial-intelligence-algorithms">FTC recommendation</a> about the use of AI technologies and shows perhaps that the topic continues to be a high priority.&nbsp;</p><h4><strong>Existing laws</strong></h4><p>The guidance includes a reminder on the three laws that are important to this conversation:</p><blockquote><p>(1) <strong>Section 5 of the FTC Act</strong>. The FTC Act prohibits unfair or deceptive practices. That would include the sale or use of &#8211; for example &#8211; racially biased algorithms.</p><p>(2) <strong>Fair Credit Reporting Act</strong>. The FCRA comes into play in certain circumstances where an algorithm is used to deny people employment, housing, credit, insurance, or other benefits.</p><p>(3) <strong>Equal Credit Opportunity Act</strong>. The ECOA makes it illegal for a company to use a biased algorithm that results in credit discrimination on the basis of race, color, religion, national origin, sex, marital status, age, or because a person receives public assistance.</p></blockquote><h4><strong>Recommendations</strong></h4><p>In the guidance, the FTC also issued recommendations for companies to follow when building AI applications. For ML researchers and practitioners, these might not be very new but it is great to see these practices and recommendations get more visibility.&nbsp; You can check out the full list of recommendations on FTC&#8217;s website but we share our thoughts below:&nbsp;</p><ul><li><p><strong>Focus on foundations</strong>: The guidance focuses on the importance of starting with datasets that are unbiased and representative of the world in which the models will operate</p></li></ul><blockquote><p>If a data set is missing information from particular populations, using that data to build an AI model may yield results that are unfair or inequitable to legally protected groups.</p></blockquote><ul><li><p><strong>Focus on transparency and practices</strong>: The guidance focuses on the importance of transparency in how datasets and models are used, including third-party audits, unambiguous terms of service to inform users, and not making exaggerated claims about the model&#8217;s real-world performance.&nbsp;</p></li></ul><blockquote><p>As your company develops and uses AI, think about ways to embrace transparency and independence &#8230; by conducting and publishing the results of independent audits, and by opening your data or source code to outside inspection.</p></blockquote><ul><li><p><strong>Cited examples</strong>: One thing that struck us while reading the article was the citations of bias in AI, and actions against these violations in the past: racial bias in the <a href="https://www.ftc.gov/system/files/documents/public_events/1548288/privacycon-2020-ziad_obermeyer.pdf">healthcare algorithms</a>, a <a href="https://www.ftc.gov/system/files/documents/cases/182_3109_facebook_complaint_filed_7-24-19.pdf">complaint against Facebook</a>, and recent action against <a href="https://www.ftc.gov/news-events/press-releases/2020/05/bronx-honda-to-pay-over-1-million-to-settle-charges">Bronx Honda</a> car dealership. To us, these examples and the strong language associated with them, indicate that FTC is serious about enforcing these laws in the context of AI applications.&nbsp;</p></li></ul><h4><strong>Reactions</strong></h4><p>This series of tweets by University of Washington School of Law professor Ryan Calo is a good read.&nbsp;</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/rcalo/status/1384276880602238976&quot;,&quot;full_text&quot;:&quot;Woah, woah, WOAH. An official <span class=\&quot;tweet-fake-link\&quot;>@FTC</span> blog post by a staff attorney noting that \&quot;The FTC Act prohibits unfair or deceptive practices. That would include the sale or use of &#8211; for example &#8211; racially biased algorithms.\&quot; &quot;,&quot;username&quot;:&quot;rcalo&quot;,&quot;name&quot;:&quot;Ryan Calo&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Mon Apr 19 22:45:01 +0000 2021&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:649,&quot;like_count&quot;:1523,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{&quot;url&quot;:&quot;https://www.ftc.gov/news-events/blogs/business-blog/2021/04/aiming-truth-fairness-equity-your-companys-use-ai&quot;,&quot;image&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/c36b6714-a58d-4776-9218-7ac5c02e2c5e_1200x630.jpeg&quot;,&quot;title&quot;:&quot;Aiming for truth, fairness, and equity in your company&#8217;s use of AI&quot;,&quot;description&quot;:&quot;Advances in artificial intelligence (AI) technology promise to revolutionize our approach to medicine, finance, business operations, media, and more.&quot;,&quot;domain&quot;:&quot;ftc.gov&quot;},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><h2><a href="https://medium.com/data-for-ai/feature-store-vs-data-warehouse-306d1567c100">Medium | Feature Stores vs Data Warehouse</a></h2><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Qcsp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66aeec23-fc2c-46bb-9102-b37f4ff5ebd2_717x656.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Qcsp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66aeec23-fc2c-46bb-9102-b37f4ff5ebd2_717x656.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Qcsp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66aeec23-fc2c-46bb-9102-b37f4ff5ebd2_717x656.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Qcsp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66aeec23-fc2c-46bb-9102-b37f4ff5ebd2_717x656.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Qcsp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66aeec23-fc2c-46bb-9102-b37f4ff5ebd2_717x656.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Qcsp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66aeec23-fc2c-46bb-9102-b37f4ff5ebd2_717x656.jpeg" width="481" height="440.07810320781033" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/66aeec23-fc2c-46bb-9102-b37f4ff5ebd2_717x656.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:656,&quot;width&quot;:717,&quot;resizeWidth&quot;:481,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Qcsp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66aeec23-fc2c-46bb-9102-b37f4ff5ebd2_717x656.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Qcsp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66aeec23-fc2c-46bb-9102-b37f4ff5ebd2_717x656.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Qcsp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66aeec23-fc2c-46bb-9102-b37f4ff5ebd2_717x656.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Qcsp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66aeec23-fc2c-46bb-9102-b37f4ff5ebd2_717x656.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In a recent post, <a href="https://twitter.com/jim_dowling">Jim Dowling</a> highlights the similarities and differences between data warehouses and feature stores. The feature store is a data warehouse of features, used for training and inference of ML models. Both, data warehouse and feature store, are a central store of data and both have pipelines to ingest data from one or more sources. However, as noted in the post, there are some important differences in terms of architecture and use-cases.</p><h4><strong>Dual Use-Cases of Feature Stores</strong></h4><p>Unlike data warehouses, which are used typically for offline batch processing of data for gathering insights, reporting, etc, feature stores support a dual use-case:</p><ul><li><p>Offline use-case (column-oriented) for training models and offline batch inference</p></li><li><p>Online use-cases (row-oriented) for online inference i.e. individual predictions.</p></li></ul><h4><strong>Data validation in Feature Stores&nbsp;</strong></h4><p>A data warehouse stores data in tables along with predefined schemas and column constraints. By contrast, feature stores typically have more relaxed constraints and at the same time, they can store more abstract data (e.g. an array of floating-point values as an input feature vector of a model). For feature stores, it is best to do data validation and quality checks before ingestion. The article mentions example data validation tools like <a href="https://greatexpectations.io/">Great Expectations</a>, which we have <a href="https://mlopsroundup.substack.com/p/issue-2-nuts-and-bolts-of-ml-unfriendly-comments-great-expectations-common-ml-misconceptions-279968">covered earlier</a> in our newsletter as well.&nbsp;</p><blockquote><p>As model training is very sensitive to bad data (null values, outliers cause numerical instability, missing values), feature data should be validated before ingestion</p></blockquote><h4><strong>Feature Statistics Monitoring</strong></h4><p>As we have covered previously in our newsletter, data and feature drifts are important to monitor for real-world ML applications. Many feature stores can facilitate comparisons of online traffic against the training or validation sets for the detection of such drifts.</p><h2><a href="https://hopin.com/events/datalift-no-5-productionize-data-analytics-and-machine-learning?ref=49aee2b637b0">Event | #datalift No 5 - Productionize data analytics and machine learning</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3bxO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6a84f8-c46d-4649-9537-b6c27357b67c_1500x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3bxO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6a84f8-c46d-4649-9537-b6c27357b67c_1500x600.png 424w, https://substackcdn.com/image/fetch/$s_!3bxO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6a84f8-c46d-4649-9537-b6c27357b67c_1500x600.png 848w, https://substackcdn.com/image/fetch/$s_!3bxO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6a84f8-c46d-4649-9537-b6c27357b67c_1500x600.png 1272w, https://substackcdn.com/image/fetch/$s_!3bxO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6a84f8-c46d-4649-9537-b6c27357b67c_1500x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3bxO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6a84f8-c46d-4649-9537-b6c27357b67c_1500x600.png" width="1456" height="582" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7b6a84f8-c46d-4649-9537-b6c27357b67c_1500x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:582,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3bxO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6a84f8-c46d-4649-9537-b6c27357b67c_1500x600.png 424w, https://substackcdn.com/image/fetch/$s_!3bxO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6a84f8-c46d-4649-9537-b6c27357b67c_1500x600.png 848w, https://substackcdn.com/image/fetch/$s_!3bxO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6a84f8-c46d-4649-9537-b6c27357b67c_1500x600.png 1272w, https://substackcdn.com/image/fetch/$s_!3bxO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7b6a84f8-c46d-4649-9537-b6c27357b67c_1500x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://www.thedatalift.eu/">#datalift</a> is an initiative from the <a href="https://www.theguild.ai/">AI Guild</a> which brings together a community of AI practitioners, and business leaders to help bridge the gap between proof-of-concept and productionization. They&#8217;re hosting an event on May 28 with many interesting speakers and opportunities to network with folks who are interested in ML deployments.&nbsp;We&#8217;ll be attending and hope to see you there!</p><h2><a href="https://arxiv.org/pdf/2104.07540.pdf">Paper | Generating Datasets with Pretrained Language Models</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!R1N0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F076fbe22-35be-4a49-bbb1-920f21b7cd56_620x550.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!R1N0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F076fbe22-35be-4a49-bbb1-920f21b7cd56_620x550.png 424w, https://substackcdn.com/image/fetch/$s_!R1N0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F076fbe22-35be-4a49-bbb1-920f21b7cd56_620x550.png 848w, https://substackcdn.com/image/fetch/$s_!R1N0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F076fbe22-35be-4a49-bbb1-920f21b7cd56_620x550.png 1272w, https://substackcdn.com/image/fetch/$s_!R1N0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F076fbe22-35be-4a49-bbb1-920f21b7cd56_620x550.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!R1N0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F076fbe22-35be-4a49-bbb1-920f21b7cd56_620x550.png" width="498" height="441.7741935483871" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/076fbe22-35be-4a49-bbb1-920f21b7cd56_620x550.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:550,&quot;width&quot;:620,&quot;resizeWidth&quot;:498,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!R1N0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F076fbe22-35be-4a49-bbb1-920f21b7cd56_620x550.png 424w, https://substackcdn.com/image/fetch/$s_!R1N0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F076fbe22-35be-4a49-bbb1-920f21b7cd56_620x550.png 848w, https://substackcdn.com/image/fetch/$s_!R1N0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F076fbe22-35be-4a49-bbb1-920f21b7cd56_620x550.png 1272w, https://substackcdn.com/image/fetch/$s_!R1N0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F076fbe22-35be-4a49-bbb1-920f21b7cd56_620x550.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>What&#8217;s the problem?&nbsp;</strong></h4><p>One of the major challenges in machine learning (especially when working with unstructured data like text, images, videos) is getting good representations for data. In NLP, this has typically meant <a href="https://en.wikipedia.org/wiki/Word_embedding">embeddings</a>, which were popularised by the <a href="https://arxiv.org/pdf/1301.3781.pdf">Word2Vec paper</a> from 2013. While there have been many innovations (such as large pre-trained language models like GPT-3) that have changed the NLP landscape over the last few years, generating good sentence-level embeddings for downstream tasks remains difficult.&nbsp;</p><p>As the authors say:</p><blockquote><p>To obtain high-quality sentence embeddings from pretrained language models (PLMs), they must either be augmented with additional pretraining objectives or finetuned on a large set of labeled text pairs. While the latter approach typically outperforms the former, it requires great human effort to generate suitable datasets of sufficient size.</p></blockquote><h4><strong>How do they solve this problem?</strong></h4><p>This paper introduces a method of leveraging pre-trained language models to:</p><blockquote><p>generate entire datasets of labeled text pairs from scratch, which can then be used for regular finetuning of much smaller models.</p></blockquote><p>This is done by prompting the language model (LM) in a very specific way (we&#8217;ll let you read the <a href="https://arxiv.org/pdf/2104.07540.pdf">paper</a> for full details) and generating tokens until full sentences have been produced. For example, as seen in the image above, the input could be the LM was the Task + Sentence 1, with Sentence 2 being the output.&nbsp;</p><p>This is neat because the LM is producing example pairs of sentences in a completely unsupervised fashion, and the sentences are going to be fairly fine-tuned to your dataset (if Sentence 1 is from your dataset) while Sentence 2 is produced using the knowledge of the LM.&nbsp;</p><h4><strong>What does this mean?&nbsp;</strong></h4><p>While we won&#8217;t focus on the results from this paper too much, we believe that this is an interesting direction for the future. Companies are often dealing with large amounts of unlabeled data where human labeling would be expensive. If large LMs can somehow be leveraged to create datasets or provide labels with reduced human intervention, we could radically speed up the time-to-production for many projects.</p><h2><a href="https://twitter.com/fchollet/status/1373112777519230977?s=20">Crossing the Deep Learning Chasm: Why the prototype to production journey is hard</a></h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/fchollet/status/1373112777519230977?s=20&quot;,&quot;full_text&quot;:&quot;Deep learning excels at unlocking the creation of impressive early demos of new applications using very little development resources.\n\nThe part where it struggles is reaching the level of consistent usefulness and reliability required by production usage.&quot;,&quot;username&quot;:&quot;fchollet&quot;,&quot;name&quot;:&quot;Fran&#231;ois Chollet&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Sat Mar 20 03:22:52 +0000 2021&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:263,&quot;like_count&quot;:1301,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>We came across this series of tweets by<a href="https://twitter.com/fchollet/"> Fran&#231;ois Chollet, </a>creator of <a href="https://keras.io/">Keras</a>, about why the prototype to productionization journey for deep learning applications is hard.</p><p>This struck a chord with us:</p><blockquote><p>Every app demo based on GPT-3 follows this pattern. You can build the demo in a weekend, but if you invest $20M and 3 years fleshing out the app, it's unlikely it will still be using GPT-3 at all, and it may ever meet customer requirements</p></blockquote><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> (open to DMs as well) or email us at <a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a></p><p>Thanks again, and if you like what we are doing please tell your friends and colleagues to spread the word.</p>]]></content:encoded></item><item><title><![CDATA[Issue #16: EU AI regulations. Data-centric AI. Scale Transform 2021. Explainability via Memorization. ]]></title><description><![CDATA[Welcome to the 16th issue of the MLOps newsletter.]]></description><link>https://mlopsroundup.substack.com/p/issue-16-eu-ai-regulations-data-centric</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-16-eu-ai-regulations-data-centric</guid><dc:creator><![CDATA[Nihit Desai]]></dc:creator><pubDate>Mon, 19 Apr 2021 17:16:52 GMT</pubDate><enclosure url="https://cdn.substack.com/image/youtube/w_728,c_limit/06-AZXmwHjo" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 16th issue of the MLOps newsletter.&nbsp;</p><p>In this issue, we begin with some news out of the EU on AI regulation, followed by a fascinating paper on explainability using nearest neighbours. Next, we discuss a few talks from Scale&#8217;s Transform conference and deep dive into an insightful Andrew Ng talk on taking a data-centric view of AI.&nbsp;</p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><strong><a href="https://www.theverge.com/2021/4/14/22383301/eu-ai-regulation-draft-leak-surveillance-social-credit">The Verge | The EU is considering a ban on AI for mass surveillance and social credit scores</a></strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6QYd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F21a25948-9dfd-4d89-b2b8-58eba531d575_1024x684.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6QYd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F21a25948-9dfd-4d89-b2b8-58eba531d575_1024x684.png 424w, https://substackcdn.com/image/fetch/$s_!6QYd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F21a25948-9dfd-4d89-b2b8-58eba531d575_1024x684.png 848w, https://substackcdn.com/image/fetch/$s_!6QYd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F21a25948-9dfd-4d89-b2b8-58eba531d575_1024x684.png 1272w, https://substackcdn.com/image/fetch/$s_!6QYd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F21a25948-9dfd-4d89-b2b8-58eba531d575_1024x684.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6QYd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F21a25948-9dfd-4d89-b2b8-58eba531d575_1024x684.png" width="468" height="312.609375" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/21a25948-9dfd-4d89-b2b8-58eba531d575_1024x684.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:684,&quot;width&quot;:1024,&quot;resizeWidth&quot;:468,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6QYd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F21a25948-9dfd-4d89-b2b8-58eba531d575_1024x684.png 424w, https://substackcdn.com/image/fetch/$s_!6QYd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F21a25948-9dfd-4d89-b2b8-58eba531d575_1024x684.png 848w, https://substackcdn.com/image/fetch/$s_!6QYd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F21a25948-9dfd-4d89-b2b8-58eba531d575_1024x684.png 1272w, https://substackcdn.com/image/fetch/$s_!6QYd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F21a25948-9dfd-4d89-b2b8-58eba531d575_1024x684.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>What happened?</strong></h4><p>From the <a href="https://www.theverge.com/2021/4/14/22383301/eu-ai-regulation-draft-leak-surveillance-social-credit">Verge</a>:</p><blockquote><p>&#8220;The European Union is considering banning the use of artificial intelligence for a number of purposes, including mass surveillance and social credit scores. This is according to a leaked proposal that is circulating online&#8230;&#8221;</p></blockquote><p>This would lead to certain use cases being regulated (similar to privacy rights under <a href="https://gdpr-info.eu/">GDPR</a>) by member states of the EU, and if companies sell prohibited software, they &#8220;could be fined up to 4 percent of their global revenue&#8221; (and not European revenue). <a href="https://www.politico.eu/wp-content/uploads/2021/04/14/AI-Draft.pdf">Here</a> is the draft of the regulation, if you&#8217;re interested.&nbsp;</p><p><strong>What do the regulations include?</strong></p><blockquote><ul><li><p>A ban on AI for &#8220;indiscriminate surveillance,&#8221; including systems that directly track individuals in physical environments or aggregate data from other sources</p></li><li><p>A ban on AI systems that create social credit scores, which means judging someone&#8217;s trustworthiness based on social behaviour or predicted personality traits</p></li><li><p>Special authorization for using &#8220;remote biometric identification systems&#8221; like facial recognition in public spaces</p></li><li><p>The creation of a &#8220;European Artificial Intelligence Board,&#8221; consisting of representatives from every nation-state, to help the commission decide which AI systems count as &#8220;high-risk&#8221; and to recommend changes to prohibitions</p></li></ul></blockquote><h4><strong>Reactions</strong></h4><p>As mentioned in the Verge article, Daniel Leufer, Europe policy analyst at Access Now says:</p><blockquote><p>&#8220;The descriptions of AI systems to be prohibited are vague, and full of language that is unclear and would create serious room for loopholes&#8221;.</p></blockquote><p>From Twitter:</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/yoavgo/status/1382759762786390017?s=20&quot;,&quot;full_text&quot;:&quot;gotta love this.\n\nitem (1a) prohibits pretty much any HCI+AI, smart interface, decision support system, etc.\n\nitem (2) cements existing power structures in a way i am sure the activists who pushed for these regulations didn't intend. &quot;,&quot;username&quot;:&quot;yoavgo&quot;,&quot;name&quot;:&quot;(((&#1604;()(&#1604;() 'yoav))))&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Thu Apr 15 18:16:32 +0000 2021&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/EzCMrN0WgAYY3T9.jpg&quot;,&quot;link_url&quot;:&quot;https://t.co/MCtzTayEgw&quot;,&quot;alt_text&quot;:null}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:16,&quot;like_count&quot;:93,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><h4><strong>Our Take</strong></h4><p>We understand how difficult it is to write good regulation, especially in domains that involve new technologies, such as artificial intelligence. And while it is critical that AI/ML companies are regulated appropriately to protect consumers of AI software from harm, we do believe that it needs to be done carefully and thoughtfully. Poorly written regulation can itself have many ill-effects: difficult to enforce consistently, loopholes, push companies away from deploying in the EU, making the market anti-competitive for smaller companies, etc.&nbsp;</p><p>We will be following this closely to see what changes are made to this draft proposal, with a potential announcement on April 21st.&nbsp;</p><h2><a href="https://youtu.be/06-AZXmwHjo">Andrew Ng Talk | MLOps: From Model-centric to Data-centric AI</a></h2><div id="youtube2-06-AZXmwHjo" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;06-AZXmwHjo&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/06-AZXmwHjo?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><a href="https://twitter.com/AndrewYNg">Andrew Ng</a> probably needs no introduction for the readers of this newsletter. He recently gave a talk about the importance of striking the right balance between data and modelling improvements in real-world ML applications. It is an hour long and filled with practical insights and learnings and we recommend watching it fully. Overall, the message and theme of this talk is music to our ears and one of our motivations for writing this newsletter.</p><h4><strong>Model v/s data-centric approaches to AI</strong></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MZ0G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcceafb5c-dabc-40e6-8e96-b1f4fb26c578_897x293.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MZ0G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcceafb5c-dabc-40e6-8e96-b1f4fb26c578_897x293.png 424w, https://substackcdn.com/image/fetch/$s_!MZ0G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcceafb5c-dabc-40e6-8e96-b1f4fb26c578_897x293.png 848w, https://substackcdn.com/image/fetch/$s_!MZ0G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcceafb5c-dabc-40e6-8e96-b1f4fb26c578_897x293.png 1272w, https://substackcdn.com/image/fetch/$s_!MZ0G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcceafb5c-dabc-40e6-8e96-b1f4fb26c578_897x293.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MZ0G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcceafb5c-dabc-40e6-8e96-b1f4fb26c578_897x293.png" width="897" height="293" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/cceafb5c-dabc-40e6-8e96-b1f4fb26c578_897x293.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:293,&quot;width&quot;:897,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MZ0G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcceafb5c-dabc-40e6-8e96-b1f4fb26c578_897x293.png 424w, https://substackcdn.com/image/fetch/$s_!MZ0G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcceafb5c-dabc-40e6-8e96-b1f4fb26c578_897x293.png 848w, https://substackcdn.com/image/fetch/$s_!MZ0G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcceafb5c-dabc-40e6-8e96-b1f4fb26c578_897x293.png 1272w, https://substackcdn.com/image/fetch/$s_!MZ0G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcceafb5c-dabc-40e6-8e96-b1f4fb26c578_897x293.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>Much of the high visibility work, especially in research and academia, happens in the realm of improving model architectures, learning schemes etc whereas the benchmark training and test sets are standardized and static (think ImageNet, MNIST, SNLI dataset etc). This is the opposite of most real-world settings where the data is messy, noisy, ever-changing.&nbsp;&nbsp;</p></li><li><p>In Ng&#8217;s view, this has ensured we&#8217;ve had great progress in the realm of model-centric AI but we don&#8217;t place nearly enough importance on data collections &amp; quality in most applications.&nbsp;</p></li></ul><h4><strong>How is MLOps different from DevOps?</strong></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AZy3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcb8cc3d-0ccb-4708-bb8c-179df1d7b603_1600x614.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AZy3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcb8cc3d-0ccb-4708-bb8c-179df1d7b603_1600x614.png 424w, https://substackcdn.com/image/fetch/$s_!AZy3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcb8cc3d-0ccb-4708-bb8c-179df1d7b603_1600x614.png 848w, https://substackcdn.com/image/fetch/$s_!AZy3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcb8cc3d-0ccb-4708-bb8c-179df1d7b603_1600x614.png 1272w, https://substackcdn.com/image/fetch/$s_!AZy3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcb8cc3d-0ccb-4708-bb8c-179df1d7b603_1600x614.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AZy3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcb8cc3d-0ccb-4708-bb8c-179df1d7b603_1600x614.png" width="1456" height="559" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/dcb8cc3d-0ccb-4708-bb8c-179df1d7b603_1600x614.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:559,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AZy3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcb8cc3d-0ccb-4708-bb8c-179df1d7b603_1600x614.png 424w, https://substackcdn.com/image/fetch/$s_!AZy3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcb8cc3d-0ccb-4708-bb8c-179df1d7b603_1600x614.png 848w, https://substackcdn.com/image/fetch/$s_!AZy3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcb8cc3d-0ccb-4708-bb8c-179df1d7b603_1600x614.png 1272w, https://substackcdn.com/image/fetch/$s_!AZy3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcb8cc3d-0ccb-4708-bb8c-179df1d7b603_1600x614.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>Real-world AI systems (Software 2.0) are a combination of Software and Data.</p></li><li><p>In Andrew Ng&#8217;s view, the rising and nascent field of MLOps can play a role in AI systems that&#8217;s equivalent to DevOps in the traditional software world.&nbsp;</p></li><li><p>Unlike software 1.0 which is mostly feed-forward (build &#8594; package &#8594; deploy), AI systems have feedback loops which means that MLOps plays a critical role in establishing best practices and processes around establishing these feedback loops from post-deployment back to model training and data collection.&nbsp;</p></li></ul><p><strong>MLOps as a way to systematize best processes around data quality</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PDtA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0c6167b7-6edc-48d2-ad48-2869909d0a04_1600x814.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PDtA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0c6167b7-6edc-48d2-ad48-2869909d0a04_1600x814.png 424w, https://substackcdn.com/image/fetch/$s_!PDtA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0c6167b7-6edc-48d2-ad48-2869909d0a04_1600x814.png 848w, https://substackcdn.com/image/fetch/$s_!PDtA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0c6167b7-6edc-48d2-ad48-2869909d0a04_1600x814.png 1272w, https://substackcdn.com/image/fetch/$s_!PDtA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0c6167b7-6edc-48d2-ad48-2869909d0a04_1600x814.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PDtA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0c6167b7-6edc-48d2-ad48-2869909d0a04_1600x814.png" width="1456" height="741" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/0c6167b7-6edc-48d2-ad48-2869909d0a04_1600x814.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:741,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PDtA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0c6167b7-6edc-48d2-ad48-2869909d0a04_1600x814.png 424w, https://substackcdn.com/image/fetch/$s_!PDtA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0c6167b7-6edc-48d2-ad48-2869909d0a04_1600x814.png 848w, https://substackcdn.com/image/fetch/$s_!PDtA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0c6167b7-6edc-48d2-ad48-2869909d0a04_1600x814.png 1272w, https://substackcdn.com/image/fetch/$s_!PDtA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0c6167b7-6edc-48d2-ad48-2869909d0a04_1600x814.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>In Andrew Ng&#8217;s view, the biggest value an MLOps team can provide is to ensure consistently high-quality training data.</p></li><li><p>In this sense, MLOps plays an important role across the model development and deployment lifecycle: how to get enough training data? How to ensure label consistency? How to clean and transform data to improve model performance? How to track concept and data drifts in deployed models?&nbsp;</p></li></ul><h2><a href="https://jameskle.com/writes/scale-transform-2021">James Le | Learnings from Scale Transform 2021</a></h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/le_james94/status/1382456805901496320&quot;,&quot;full_text&quot;:&quot;1/ My notes from <span class=\&quot;tweet-fake-link\&quot;>@scale_AI</span> Transform&#9997;&#65039;It covers:\n- Building good data\n- Future of ML frameworks\n- Challenges for scalable deployment\n- How to assess ML maturity\n\nThanks <span class=\&quot;tweet-fake-link\&quot;>@alexandr_wang</span> and team for organizing the best AI conference of 2021 thus far. Enjoy!\n\n&quot;,&quot;username&quot;:&quot;le_james94&quot;,&quot;name&quot;:&quot;James&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Wed Apr 14 22:12:41 +0000 2021&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:19,&quot;like_count&quot;:65,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{&quot;url&quot;:&quot;https://jameskle.com/writes/scale-transform-2021&quot;,&quot;image&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/8a53184e-3af2-4ecd-b05e-8792bae1f5e7_1500x1208.png&quot;,&quot;title&quot;:&quot;What I Learned From Attending Scale Transform 2021 &#8212; James Le&quot;,&quot;description&quot;:&quot;<p><strong>Vietnamese. Tech addict. Coffee fanatic. Book sidekick. Amateur Writer. Seasoned Traveler. Continuous Learner. All Things Entrepreneurship</strong></p>&quot;,&quot;domain&quot;:&quot;jameskle.com&quot;},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>James Le always does an incredible job of capturing notes from the conferences that he attends (last year we had included <a href="https://mlopsroundup.substack.com/p/issue-8-toronto-ml-summit-gpt-2-ml">his post on the Toronto ML Summit</a>), and his <a href="https://jameskle.com/writes/scale-transform-2021">latest post</a> from <a href="https://scale.com/events/transform">Transform</a>, <a href="https://scale.com/">Scale&#8217;s</a> conference doesn&#8217;t disappoint.&nbsp;</p><p>We recommend reading the entire post, but here we wanted to cover a couple of the talks that were interesting from an MLOps perspective.&nbsp;</p><h4><strong><a href="https://scale.com/events/transform/videos/ai-at-facebook-scale?validation=ai-at-facebook-scale">AI at Facebook Scale</a></strong></h4><p><a href="https://scale.com/events/transform/videos/ai-at-facebook-scale?validation=ai-at-facebook-scale">This</a> was a talk by <a href="https://twitter.com/snsf">Srinivas Narayanan</a>, who is the Director, Applied Research at Facebook AI. The talk goes into a few different challenges faced by a company at Facebook&#8217;s scale.</p><p><strong>Data Challenge: Scaling Training Data</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://scale.com/events/transform/videos/ai-at-facebook-scale?validation=ai-at-facebook-scale" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q1Zx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F40d5b648-6b66-4a2e-a7b3-6f667b073d8f_1500x906.png 424w, https://substackcdn.com/image/fetch/$s_!q1Zx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F40d5b648-6b66-4a2e-a7b3-6f667b073d8f_1500x906.png 848w, https://substackcdn.com/image/fetch/$s_!q1Zx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F40d5b648-6b66-4a2e-a7b3-6f667b073d8f_1500x906.png 1272w, https://substackcdn.com/image/fetch/$s_!q1Zx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F40d5b648-6b66-4a2e-a7b3-6f667b073d8f_1500x906.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q1Zx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F40d5b648-6b66-4a2e-a7b3-6f667b073d8f_1500x906.png" width="1456" height="879" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/40d5b648-6b66-4a2e-a7b3-6f667b073d8f_1500x906.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:879,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://scale.com/events/transform/videos/ai-at-facebook-scale?validation=ai-at-facebook-scale&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!q1Zx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F40d5b648-6b66-4a2e-a7b3-6f667b073d8f_1500x906.png 424w, https://substackcdn.com/image/fetch/$s_!q1Zx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F40d5b648-6b66-4a2e-a7b3-6f667b073d8f_1500x906.png 848w, https://substackcdn.com/image/fetch/$s_!q1Zx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F40d5b648-6b66-4a2e-a7b3-6f667b073d8f_1500x906.png 1272w, https://substackcdn.com/image/fetch/$s_!q1Zx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F40d5b648-6b66-4a2e-a7b3-6f667b073d8f_1500x906.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>One of the biggest challenges in building AI systems is getting the right and sufficient training data. Getting labeled data for supervised learning can be difficult, expensive, and in some cases, impossible. Facebook approaches this by focusing on techniques beyond supervised learning.</p></blockquote><p>For example, at Instagram, FB was able to train an object-classification system using 3.5B images along with the hashtags there were shared with, which is a technique called weak supervision. This allowed them to leverage a much larger volume of training data, leading to state-of-the-art results.&nbsp;</p><p>Similarly, they were able to use self-supervision (where the system learns to fill in the blanks for inputs with missing pieces) to train cross-lingual language models and audio models. Again, with a smaller amount of labelled data, they are able to achieve good results.&nbsp;</p><p><strong>Tooling Challenge: Building the AI Platforms</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://scale.com/events/transform/videos/ai-at-facebook-scale?validation=ai-at-facebook-scale" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sWqA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb552b6-1de6-4b71-b1bc-d6d101c57f16_1500x796.png 424w, https://substackcdn.com/image/fetch/$s_!sWqA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb552b6-1de6-4b71-b1bc-d6d101c57f16_1500x796.png 848w, https://substackcdn.com/image/fetch/$s_!sWqA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb552b6-1de6-4b71-b1bc-d6d101c57f16_1500x796.png 1272w, https://substackcdn.com/image/fetch/$s_!sWqA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb552b6-1de6-4b71-b1bc-d6d101c57f16_1500x796.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sWqA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb552b6-1de6-4b71-b1bc-d6d101c57f16_1500x796.png" width="1456" height="773" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/2cb552b6-1de6-4b71-b1bc-d6d101c57f16_1500x796.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:773,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://scale.com/events/transform/videos/ai-at-facebook-scale?validation=ai-at-facebook-scale&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sWqA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb552b6-1de6-4b71-b1bc-d6d101c57f16_1500x796.png 424w, https://substackcdn.com/image/fetch/$s_!sWqA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb552b6-1de6-4b71-b1bc-d6d101c57f16_1500x796.png 848w, https://substackcdn.com/image/fetch/$s_!sWqA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb552b6-1de6-4b71-b1bc-d6d101c57f16_1500x796.png 1272w, https://substackcdn.com/image/fetch/$s_!sWqA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2cb552b6-1de6-4b71-b1bc-d6d101c57f16_1500x796.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As can be seen in the image, Facebook has built out a rich roster of internal tooling to support their AI efforts. As James writes:</p><blockquote><ul><li><p>On the left, you&#8217;ll see tools for preparing data in the right format.</p></li><li><p>In the middle, you&#8217;ll see the pieces for building and training models &#8212; going bottom up all the way from hardware, whether it&#8217;s CPUs or GPU&#8217;s. These include frameworks like PyTorch that ease the model building environment, libraries that are specific to each domain, and the models that are used in products.</p></li><li><p>And on the right, once you have the trained models, you have the right tools and systems for deploying them in production, whether it&#8217;s in a data center or locally on the device.</p></li></ul></blockquote><p>And to make the research to production flow smooth, Facebook invested heavily in PyTorch as a single framework.&nbsp;</p><p><strong>Bias Challenge: Creating AI Responsibly</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://scale.com/events/transform/videos/ai-at-facebook-scale?validation=ai-at-facebook-scale" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PnKH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8dea5c17-8eb7-4a3d-b77e-b1eec2072cca_1500x768.png 424w, https://substackcdn.com/image/fetch/$s_!PnKH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8dea5c17-8eb7-4a3d-b77e-b1eec2072cca_1500x768.png 848w, https://substackcdn.com/image/fetch/$s_!PnKH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8dea5c17-8eb7-4a3d-b77e-b1eec2072cca_1500x768.png 1272w, https://substackcdn.com/image/fetch/$s_!PnKH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8dea5c17-8eb7-4a3d-b77e-b1eec2072cca_1500x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PnKH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8dea5c17-8eb7-4a3d-b77e-b1eec2072cca_1500x768.png" width="1456" height="745" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/8dea5c17-8eb7-4a3d-b77e-b1eec2072cca_1500x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:745,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://scale.com/events/transform/videos/ai-at-facebook-scale?validation=ai-at-facebook-scale&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PnKH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8dea5c17-8eb7-4a3d-b77e-b1eec2072cca_1500x768.png 424w, https://substackcdn.com/image/fetch/$s_!PnKH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8dea5c17-8eb7-4a3d-b77e-b1eec2072cca_1500x768.png 848w, https://substackcdn.com/image/fetch/$s_!PnKH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8dea5c17-8eb7-4a3d-b77e-b1eec2072cca_1500x768.png 1272w, https://substackcdn.com/image/fetch/$s_!PnKH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8dea5c17-8eb7-4a3d-b77e-b1eec2072cca_1500x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As James writes:</p><blockquote><p>Not introducing or amplifying bias and creating unfair systems is a challenge because it&#8217;s not as simple as using the right tools. Fairness is a process. At each step of the implementation process, there is a risk of bias creeping in &#8212; in data and labels, in algorithms, in the predictions, and in the resulting actions based on those predictions.</p></blockquote><p>At each step, Facebook attempts to surface the fairness risks, resolve questions and document the decisions that were made.&nbsp;</p><h4><strong><a href="https://scale.com/events/transform/videos/ai-at-doordash?validation=ai-at-doordash">Applied ML at Doordash</a></strong></h4><p><a href="https://twitter.com/andyfang">Andy Fang</a>, co-founder and CTO at DoorDash discussed a couple of case studies of how DoorDash uses AI to improve its business. The one that we&#8217;ll cover is their process for creating a rich item taxonomy.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://scale.com/events/transform/videos/ai-at-doordash?validation=ai-at-doordash" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yPn8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a64d634-f6b5-4ac0-89f1-5c2c1b0afe99_604x566.png 424w, https://substackcdn.com/image/fetch/$s_!yPn8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a64d634-f6b5-4ac0-89f1-5c2c1b0afe99_604x566.png 848w, https://substackcdn.com/image/fetch/$s_!yPn8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a64d634-f6b5-4ac0-89f1-5c2c1b0afe99_604x566.png 1272w, https://substackcdn.com/image/fetch/$s_!yPn8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a64d634-f6b5-4ac0-89f1-5c2c1b0afe99_604x566.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yPn8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a64d634-f6b5-4ac0-89f1-5c2c1b0afe99_604x566.png" width="604" height="566" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/2a64d634-f6b5-4ac0-89f1-5c2c1b0afe99_604x566.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:566,&quot;width&quot;:604,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://scale.com/events/transform/videos/ai-at-doordash?validation=ai-at-doordash&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yPn8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a64d634-f6b5-4ac0-89f1-5c2c1b0afe99_604x566.png 424w, https://substackcdn.com/image/fetch/$s_!yPn8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a64d634-f6b5-4ac0-89f1-5c2c1b0afe99_604x566.png 848w, https://substackcdn.com/image/fetch/$s_!yPn8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a64d634-f6b5-4ac0-89f1-5c2c1b0afe99_604x566.png 1272w, https://substackcdn.com/image/fetch/$s_!yPn8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a64d634-f6b5-4ac0-89f1-5c2c1b0afe99_604x566.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>DoorDash has tens of millions of items across all its restaurants and tens of thousands of new items are added every day. These have to be sorted and categorized so that users can quickly understand the offerings on the DoorDash platform, and make the best (ie tasty) decisions for themselves.</p><p>DoorDash managed to achieve this with a rich taxonomy, models that categorize each item into this taxonomy and a human-in-the-loop system that reduces annotation costs while allowing them to grow their taxonomy efficiently.&nbsp;</p><p>There are three critical rules for defining annotation tags:</p><blockquote><ul><li><p>Make sure that there are different levels of item tagging specificity that don&#8217;t overlap. Let&#8217;s say for coffee, you can say it&#8217;s a drink, you can say it&#8217;s non-alcoholic, or you can say it&#8217;s caffeinated. Those are three separate labels that don&#8217;t overlap and categorization with each other.</p></li><li><p>Allow annotators to pick &#8220;others&#8221; as an option at each level. Having &#8220;others&#8221; is a great catch-all option that allows DoorDash to process items tagged in this bucket to see further how they can add new tags to enrich their taxonomy.</p></li><li><p>Make tags as objective as possible. They want to avoid popular or convenient tags &#8212; things that would require subjectivity for an annotator to determine.</p></li></ul></blockquote><p>Also, when defining tasks for human annotators, it is important for the tasks to be high-precision and high-throughput.&nbsp;</p><blockquote><p>High precision is critical for accurate tags, while high throughput is critical to ensure that the human tasks are cost-efficient.</p></blockquote><p>DoorDash&#8217;s taxonomy + simple binary/multiple-choice questions help them achieve high precision for their models with less experienced annotators and less detailed instructions. Finally, a separate QA loop on the annotations maintains high annotation quality.&nbsp;</p><h2><a href="https://arxiv.org/pdf/2010.09030.pdf">Paper | Explaining and Improving Model Behavior with k Nearest Neighbor<br>Representations</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://arxiv.org/pdf/2010.09030.pdf" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8zyu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8525308c-b35f-4217-8619-8ec3b3b5eaa1_1094x608.png 424w, https://substackcdn.com/image/fetch/$s_!8zyu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8525308c-b35f-4217-8619-8ec3b3b5eaa1_1094x608.png 848w, https://substackcdn.com/image/fetch/$s_!8zyu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8525308c-b35f-4217-8619-8ec3b3b5eaa1_1094x608.png 1272w, https://substackcdn.com/image/fetch/$s_!8zyu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8525308c-b35f-4217-8619-8ec3b3b5eaa1_1094x608.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8zyu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8525308c-b35f-4217-8619-8ec3b3b5eaa1_1094x608.png" width="1094" height="608" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/8525308c-b35f-4217-8619-8ec3b3b5eaa1_1094x608.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:608,&quot;width&quot;:1094,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://arxiv.org/pdf/2010.09030.pdf&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8zyu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8525308c-b35f-4217-8619-8ec3b3b5eaa1_1094x608.png 424w, https://substackcdn.com/image/fetch/$s_!8zyu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8525308c-b35f-4217-8619-8ec3b3b5eaa1_1094x608.png 848w, https://substackcdn.com/image/fetch/$s_!8zyu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8525308c-b35f-4217-8619-8ec3b3b5eaa1_1094x608.png 1272w, https://substackcdn.com/image/fetch/$s_!8zyu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8525308c-b35f-4217-8619-8ec3b3b5eaa1_1094x608.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This recent paper from Salesforce Research explores the idea of improving model explainability by searching for similar examples from the model&#8217;s training data.&nbsp;</p><h4><strong>Highlights</strong></h4><ul><li><p>The paper proposes using k-nearest neighbors in some representation space to identify training examples that are most similar to the example that the model is currently producing a prediction for. These examples from the training set (and the labels associated with them) can be interpreted as the ones most responsible for the model&#8217;s prediction.</p></li><li><p>In addition, the paper presents a generic implementation of the space in which k-NN should be computed for this application: the deep learned model&#8217;s output hidden layer representation. This intuitively makes sense in that this representation is learned specifically for the training task, and in this way, the meaning of &#8220;nearness&#8221; is learned contextually to the downstream task for which the model is doing inference.&nbsp;</p></li><li><p>As an additional benefit, the paper shows that oftentimes looking at nearest neighbors of misclassified examples can surface label noise (incorrectly labelled examples) in the training data.</p></li></ul><h4><strong>Our take</strong></h4><p>Explainability is a problem that has received a lot of attention and visibility in the last few years, both by researchers and ML practitioners alike. And for good reason: increasingly we&#8217;re seeing applications of machine learning outside of their digital-only sandboxes and out into the physical world (think self-driving cars, healthcare, autonomous drones). The technique presented in this paper appeals a lot to us - it is conceptually simple to understand and offers a way to explain model predictions by reasoning by analogy. A couple of open questions/interesting directions to explore:</p><ol><li><p>Can k-NN based explanation techniques be extended to tasks beyond classification, for instance, question answering, textual entailment or machine translation?</p></li><li><p>Can the representations learned for this kNN search have other applications in the domain of MLOps and monitoring? For instance, measuring data drift?&nbsp;</p></li></ol><h2><a href="https://twitter.com/ai_memes/status/1382374419666976771">Fun | Machine Learning Pipelines are hard</a></h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/ai_memes/status/1382374419666976771&quot;,&quot;full_text&quot;:&quot;Machine learning pipelines &quot;,&quot;username&quot;:&quot;ai_memes&quot;,&quot;name&quot;:&quot;AI Memes for Artificially Intelligent Teens&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Wed Apr 14 16:45:19 +0000 2021&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://cdn.substack.com/image/upload/w_728,c_limit/l_twitter_play_button_rvaygk,w_120/nokm6uivt2wt2ndg7zmv&quot;,&quot;link_url&quot;:&quot;https://t.co/5FpG3HrdW0&quot;,&quot;alt_text&quot;:null}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:8863,&quot;like_count&quot;:33158,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> (open to DMs as well) or email us at <a href="mailto:mlmonitoringnews@gmail.com">mlmonitoringnews@gmail.com</a>. </p>]]></content:encoded></item><item><title><![CDATA[Issue #15: AI for self-driving at Tesla. HuggingFace meets AWS. Embedding Stores. ML and Databases. ]]></title><description><![CDATA[Welcome to the 15th issue of the MLOps newsletter.]]></description><link>https://mlopsroundup.substack.com/p/issue-15-ai-for-self-driving-at-tesla</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-15-ai-for-self-driving-at-tesla</guid><dc:creator><![CDATA[Rishabh Bhargava]]></dc:creator><pubDate>Mon, 05 Apr 2021 17:15:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!iDWq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb755676f-e82d-433d-8444-484733837b19_1600x857.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 15th issue of the MLOps newsletter.&nbsp;</p><p>In this issue, we highlight a talk on self-driving cars at Tesla, discuss a partnership between Hugging Face and AWS, share a post on embedding stores, dive into a paper on ML-in-databases and more.&nbsp;</p><p>Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><a href="https://www.youtube.com/watch?v=hx7BXih7zx8">AI for Self-driving Cars at Tesla: The importance of feedback loops</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.youtube.com/watch?v=hx7BXih7zx8" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iDWq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb755676f-e82d-433d-8444-484733837b19_1600x857.png 424w, https://substackcdn.com/image/fetch/$s_!iDWq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb755676f-e82d-433d-8444-484733837b19_1600x857.png 848w, https://substackcdn.com/image/fetch/$s_!iDWq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb755676f-e82d-433d-8444-484733837b19_1600x857.png 1272w, https://substackcdn.com/image/fetch/$s_!iDWq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb755676f-e82d-433d-8444-484733837b19_1600x857.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iDWq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb755676f-e82d-433d-8444-484733837b19_1600x857.png" width="1456" height="780" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/b755676f-e82d-433d-8444-484733837b19_1600x857.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:780,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.youtube.com/watch?v=hx7BXih7zx8&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iDWq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb755676f-e82d-433d-8444-484733837b19_1600x857.png 424w, https://substackcdn.com/image/fetch/$s_!iDWq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb755676f-e82d-433d-8444-484733837b19_1600x857.png 848w, https://substackcdn.com/image/fetch/$s_!iDWq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb755676f-e82d-433d-8444-484733837b19_1600x857.png 1272w, https://substackcdn.com/image/fetch/$s_!iDWq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb755676f-e82d-433d-8444-484733837b19_1600x857.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://twitter.com/karpathy">Andrej Karpathy</a>, Director of AI at Tesla, gave a talk last year at ScaledML Conference on how Tesla is solving the ML challenges to provide Full Self Driving in their cars. It is a fascinating talk, both in terms of concepts in deep learning and vision and the systems and engineering aspects of making this work at Tesla&#8217;s scale. We recommend listening to the talk. For this article, we want to especially focus on the data feedback loops that help Tesla collect what is arguably the best data on self-driving, and how it helps their models improve over time.&nbsp;</p><h4><strong>Highlights</strong></h4><ul><li><p>Over the past decade, neural networks have repeatedly shown their ability to solve complex computer vision problems. However, when the same set of model architectures and training techniques are available to everyone, data is the most important variable in terms of improving visual recognition in self-driving cars</p></li><li><p>In the limit, this means that if the &#8220;right datasets&#8221; can be collected, the models will successfully learn to recognize the things we need. The right data is not just about quantity but also quality:&nbsp;</p></li></ul><blockquote><p>&#8220;What&#8217;s important is not just the scale of the dataset, but covering all possible use cases&#8221;</p></blockquote><ul><li><p>Because of the size of the Tesla fleet driving in the wild, Tesla has the ability to collect this data at scale to solve the long tail of corner cases (occluded Stop signs, construction sites, potholes, etc) that are critical to solve but hard to get data for.</p></li><li><p>Andrej mentioned that the platform allows the team to collect data when a driver&#8217;s actions disagree with the self-driving model&#8217;s predictions but also fanout to the fleet to collect &#8220;similar&#8221; images to the ones in a seed dataset:&nbsp;</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.youtube.com/watch?v=hx7BXih7zx8" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kB6f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf6449f-06d7-4025-bc91-fd910f193728_1600x794.png 424w, https://substackcdn.com/image/fetch/$s_!kB6f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf6449f-06d7-4025-bc91-fd910f193728_1600x794.png 848w, https://substackcdn.com/image/fetch/$s_!kB6f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf6449f-06d7-4025-bc91-fd910f193728_1600x794.png 1272w, https://substackcdn.com/image/fetch/$s_!kB6f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf6449f-06d7-4025-bc91-fd910f193728_1600x794.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kB6f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf6449f-06d7-4025-bc91-fd910f193728_1600x794.png" width="1456" height="723" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/8cf6449f-06d7-4025-bc91-fd910f193728_1600x794.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:723,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.youtube.com/watch?v=hx7BXih7zx8&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kB6f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf6449f-06d7-4025-bc91-fd910f193728_1600x794.png 424w, https://substackcdn.com/image/fetch/$s_!kB6f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf6449f-06d7-4025-bc91-fd910f193728_1600x794.png 848w, https://substackcdn.com/image/fetch/$s_!kB6f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf6449f-06d7-4025-bc91-fd910f193728_1600x794.png 1272w, https://substackcdn.com/image/fetch/$s_!kB6f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf6449f-06d7-4025-bc91-fd910f193728_1600x794.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.youtube.com/watch?v=hx7BXih7zx8" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HYVO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b17d97-61c9-48a2-ba37-4ee65ed669bf_1600x853.png 424w, https://substackcdn.com/image/fetch/$s_!HYVO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b17d97-61c9-48a2-ba37-4ee65ed669bf_1600x853.png 848w, https://substackcdn.com/image/fetch/$s_!HYVO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b17d97-61c9-48a2-ba37-4ee65ed669bf_1600x853.png 1272w, https://substackcdn.com/image/fetch/$s_!HYVO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b17d97-61c9-48a2-ba37-4ee65ed669bf_1600x853.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HYVO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b17d97-61c9-48a2-ba37-4ee65ed669bf_1600x853.png" width="1456" height="776" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/c9b17d97-61c9-48a2-ba37-4ee65ed669bf_1600x853.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:776,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.youtube.com/watch?v=hx7BXih7zx8&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HYVO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b17d97-61c9-48a2-ba37-4ee65ed669bf_1600x853.png 424w, https://substackcdn.com/image/fetch/$s_!HYVO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b17d97-61c9-48a2-ba37-4ee65ed669bf_1600x853.png 848w, https://substackcdn.com/image/fetch/$s_!HYVO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b17d97-61c9-48a2-ba37-4ee65ed669bf_1600x853.png 1272w, https://substackcdn.com/image/fetch/$s_!HYVO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b17d97-61c9-48a2-ba37-4ee65ed669bf_1600x853.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>This &#8220;Data Engine&#8221; is a feedback loop system that allows Tesla to know exactly where the models currently don&#8217;t do a good job, collect data at scale to solve it, and move on to the next problem in the long tail of problems to get to 99.999+ % reliability.</p></li></ul><h4><strong>Our take</strong></h4><p>We&#8217;re big fans of the approach outlined by Andrej and the Tesla team in this talk. We believe that the data collection &lt;&gt; model improvement feedback problem is not just unique to cars or tesla but something that should ideally be solved for all real-world ML applications. We think this is broadly an unsolved problem (to the extent that it is even recognized as a problem) and why we&#8217;re excited and optimistic about the role MLOps and monitoring tools can play.</p><h2><a href="https://huggingface.co/blog/the-partnership-amazon-sagemaker-and-hugging-face">Hugging Face Blog | The Partnership: Amazon SageMaker and Hugging Face</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HQi2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8680f50c-b4e9-4948-a75e-3d8fc09133e1_800x250.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HQi2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8680f50c-b4e9-4948-a75e-3d8fc09133e1_800x250.png 424w, https://substackcdn.com/image/fetch/$s_!HQi2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8680f50c-b4e9-4948-a75e-3d8fc09133e1_800x250.png 848w, https://substackcdn.com/image/fetch/$s_!HQi2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8680f50c-b4e9-4948-a75e-3d8fc09133e1_800x250.png 1272w, https://substackcdn.com/image/fetch/$s_!HQi2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8680f50c-b4e9-4948-a75e-3d8fc09133e1_800x250.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HQi2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8680f50c-b4e9-4948-a75e-3d8fc09133e1_800x250.png" width="800" height="250" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/8680f50c-b4e9-4948-a75e-3d8fc09133e1_800x250.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;hugging-face-and-aws-logo&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="hugging-face-and-aws-logo" title="hugging-face-and-aws-logo" srcset="https://substackcdn.com/image/fetch/$s_!HQi2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8680f50c-b4e9-4948-a75e-3d8fc09133e1_800x250.png 424w, https://substackcdn.com/image/fetch/$s_!HQi2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8680f50c-b4e9-4948-a75e-3d8fc09133e1_800x250.png 848w, https://substackcdn.com/image/fetch/$s_!HQi2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8680f50c-b4e9-4948-a75e-3d8fc09133e1_800x250.png 1272w, https://substackcdn.com/image/fetch/$s_!HQi2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8680f50c-b4e9-4948-a75e-3d8fc09133e1_800x250.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>What happened?</strong></h4><p><a href="https://huggingface.co/">Hugging Face</a> and AWS have announced a strategic partnership:</p><blockquote><p>&#8220;to make it easier for companies to leverage State of the Art Machine Learning models, and ship cutting-edge NLP features faster.</p><p>Through this partnership, Hugging Face is leveraging Amazon Web Services as its Preferred Cloud Provider to deliver services to its customers.&#8221;</p></blockquote><p>This is a big move from Hugging Face, less than a month after <a href="https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library/">raising a $40M Series B round.</a></p><h4><strong>What does this involve?</strong></h4><p>First, Hugging Face and <a href="https://aws.amazon.com/sagemaker/">AWS Sagemaker</a> will provide Hugging Face <a href="https://github.com/aws/deep-learning-containers/blob/master/available_images.md#huggingface-training-containers">Deep Learning Containers</a> optimized for PyTorch and TensorFlow training that will work well with different EC2 instances. This will also mean that Hugging Face models that are trained/fine-tuned with Sagemaker will only be charged by the number of seconds of compute used.</p><p>Second, there will be a Hugging Face extension to the Sagemaker Python SDK, which will simplify creating and managing training jobs in AWS. For a quick demo of what this looks like, check out <a href="https://www.youtube.com/watch?v=leyrCgLAGjM">this YouTube video</a>.</p><p>Third, the Hugging Face extension to Sagemaker will work seamlessly with existing Sagemaker functionality for Data Parallelism, Model Parallelism and Hyperparameter tuning when training models. This extension will also simplify sending metrics into Sagemaker&#8217;s own metrics store or CloudWatch.&nbsp;</p><h4><strong>What&#8217;s next?</strong></h4><p>They already have an integrated solution for training, so what could be next?</p><blockquote><p>&#8220;We are working on offering an integrated solution for Amazon SageMaker with Hugging Face Inference DLCs in the future - stay tuned!&#8221;</p></blockquote><h4><strong>Our Take</strong></h4><p>This is pretty exciting news. Transformer models in NLP show tremendous promise and Hugging Face and AWS Sagemaker are making it very simple to train such models. We would recommend reading through <a href="https://huggingface.co/blog/the-partnership-amazon-sagemaker-and-hugging-face">the article</a> and skimming through some of the resources if you&#8217;re interested in learning more.&nbsp;</p><p>In our <a href="https://mlopsroundup.substack.com/p/issue-14-ai-index-report-2021-multimodal">last issue</a>, we had highlighted the <a href="https://ai-infrastructure.org/">AI Infrastructure Alliance</a> which had many startups coming together to build the canonical stack for ML (as an alternative to the offerings from the Big Cloud providers). This news from Hugging Face and AWS shows that the world of MLOps is in a bit of a free-for-all, which can only be a good thing for ML practitioners.</p><h2><a href="https://nlathia.github.io/2021/03/Embeddings.html">Neal Lathia | Embedding Stores</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!muQS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faa96a09c-b49f-418f-ba92-70b72e9b51f2_1404x588.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!muQS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faa96a09c-b49f-418f-ba92-70b72e9b51f2_1404x588.png 424w, https://substackcdn.com/image/fetch/$s_!muQS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faa96a09c-b49f-418f-ba92-70b72e9b51f2_1404x588.png 848w, https://substackcdn.com/image/fetch/$s_!muQS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faa96a09c-b49f-418f-ba92-70b72e9b51f2_1404x588.png 1272w, https://substackcdn.com/image/fetch/$s_!muQS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faa96a09c-b49f-418f-ba92-70b72e9b51f2_1404x588.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!muQS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faa96a09c-b49f-418f-ba92-70b72e9b51f2_1404x588.png" width="1404" height="588" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/aa96a09c-b49f-418f-ba92-70b72e9b51f2_1404x588.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:588,&quot;width&quot;:1404,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:332586,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!muQS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faa96a09c-b49f-418f-ba92-70b72e9b51f2_1404x588.png 424w, https://substackcdn.com/image/fetch/$s_!muQS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faa96a09c-b49f-418f-ba92-70b72e9b51f2_1404x588.png 848w, https://substackcdn.com/image/fetch/$s_!muQS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faa96a09c-b49f-418f-ba92-70b72e9b51f2_1404x588.png 1272w, https://substackcdn.com/image/fetch/$s_!muQS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faa96a09c-b49f-418f-ba92-70b72e9b51f2_1404x588.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">What is an embedding? Image courtesy of the MLOps Roundup</figcaption></figure></div><p><a href="https://nlathia.github.io/">Neal Lathia,</a> the Director of Machine Learning at <a href="https://monzo.com/">Monzo</a>, recently wrote this article on using &#8220;embedding stores&#8221; at Monzo. We share our takeaways and thoughts on the article here.&nbsp;</p><h4><strong>Highlights</strong></h4><ul><li><p>When dealing with unstructured data (e.g. raw text, images, videos), training models from scratch is often not feasible in real-world scenarios because of two reasons. First, privacy considerations mean that we might not have access to the raw data. Second, unstructured data is incredibly high dimensional and training high-quality models on these takes more compute resources, larger datasets and more time. In such cases, embeddings (models that map the raw input to a low dimensional dense vector representation) can be a great tool. </p></li><li><p>Pretrained models for text and images are available out of the box (e.g.&nbsp;<a href="https://pytorch.org/vision/stable/models.html">torchvision.models</a>, or <a href="https://huggingface.co/models">Hugging Face</a>). By exposing these models as an API endpoint, the team built a system to generate and log embeddings.&nbsp;</p></li><li><p>The team then trained the ML models for their end use case by using these embeddings as feature vectors. This ensured that the data was used in a privacy-compliant way and the model training and iterations were faster compared to training end to end models from scratch.&nbsp;</p></li><li><p>The tradeoff here, of course, is the inability to backpropagate into the embedding model to fine-tune the embeddings for their specific downstream tasks.</p></li></ul><h4><strong>Thoughts &amp; Open Questions</strong></h4><ul><li><p>The general approach highlighted in Neal&#8217;s article is a great summary of the best practices emerging in industry regarding content understanding: a generic &#8220;trunk&#8221; model that learns good general-purpose embeddings of raw content, combined with a multitude of task-specific downstream models that are built on top of the embedding representation and fine-tuned for the specific task (e.g. <a href="https://twitter.com/PaulYacoubian/status/1316773387268653056?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1316773387268653056%7Ctwgr%5E%7Ctwcon%5Es1_c10&amp;ref_url=https%3A%2F%2Fcdn.embedly.com%2Fwidgets%2Fmedia.html%3Ftype%3Dtext2Fhtmlkey%3Da19fcc184b9711e1b4764040d3dc5c07schema%3Dtwitterurl%3Dhttps3A%2F%2Ftwitter.com%2Fpaulyacoubian%2Fstatus%2F1316773387268653056image%3D">Copy.AI</a> powered by GPT-3,&nbsp; <a href="https://monzo.com/">Monzo</a> uses HuggingFace).&nbsp;&nbsp;&nbsp;</p></li><li><p>As alluded to in the article, embeddings don&#8217;t entirely solve the privacy concerns as they aren&#8217;t strictly one-way hashes. As has been <a href="https://arxiv.org/pdf/2004.00053.pdf">shown in prior research</a> (and something we covered <a href="https://mlopsroundup.substack.com/p/issue-13-feature-stores-information">here</a>), it is possible to recover the raw input partially from embeddings in some cases so it is important to evaluate this explicitly for your individual use case.</p></li><li><p>One limitation that we believe is yet to be solved (at least we&#8217;re not aware of a good solution) is a clean way to support versioning and experimentation on underlying embeddings, especially if multiple downstream models depend on it as input features.&nbsp;</p></li></ul><h2><a href="https://mytherin.github.io/papers/2018-machinelearningudfs.pdf">Paper | Deep Integration of Machine Learning Into Column Stores</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://mytherin.github.io/papers/2018-machinelearningudfs.pdf" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VwXy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9ab404-2439-4db6-99d0-42b726c3c2e3_1600x940.png 424w, https://substackcdn.com/image/fetch/$s_!VwXy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9ab404-2439-4db6-99d0-42b726c3c2e3_1600x940.png 848w, https://substackcdn.com/image/fetch/$s_!VwXy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9ab404-2439-4db6-99d0-42b726c3c2e3_1600x940.png 1272w, https://substackcdn.com/image/fetch/$s_!VwXy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9ab404-2439-4db6-99d0-42b726c3c2e3_1600x940.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VwXy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9ab404-2439-4db6-99d0-42b726c3c2e3_1600x940.png" width="1456" height="855" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/ab9ab404-2439-4db6-99d0-42b726c3c2e3_1600x940.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:855,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://mytherin.github.io/papers/2018-machinelearningudfs.pdf&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VwXy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9ab404-2439-4db6-99d0-42b726c3c2e3_1600x940.png 424w, https://substackcdn.com/image/fetch/$s_!VwXy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9ab404-2439-4db6-99d0-42b726c3c2e3_1600x940.png 848w, https://substackcdn.com/image/fetch/$s_!VwXy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9ab404-2439-4db6-99d0-42b726c3c2e3_1600x940.png 1272w, https://substackcdn.com/image/fetch/$s_!VwXy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fab9ab404-2439-4db6-99d0-42b726c3c2e3_1600x940.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a paper from the team that built DuckDB and MonetDB, two of the most exciting new databases of the last couple of years. If you&#8217;re interested in learning about DuckDB, check out this YouTube video from one of the creators. Now, let&#8217;s discuss the paper itself. The authors say that they:</p><blockquote><p>&#8220;integrate unchanged machine learning pipelines into an analytical data management system. The entire pipelines including data, models, parameters and evaluation outcomes are stored and executed inside the database system. Experiments using our MonetDB/Python UDFs show greatly improved performance due to reduced data movement and parallel processing opportunities.&#8221;</p></blockquote><h4><strong>What is the problem they are trying to solve? </strong></h4><p>The authors say that current ML workflows have the following problems when it comes to data management:</p><ul><li><p>Managing large datasets as flat files is error-prone and multiple people working on such datasets leads to further issues</p></li><li><p>Loading data from structured data formats (such as CSV and XML) is inefficient, and often data needs to be loaded from multiple times</p></li></ul><p>These problems can be solved by existing relational database management systems (RDBMS), but integrating analytical tools with databases has proven tricky. This is because:</p><ul><li><p>The standard approach of storing data on a separate database and communicating over a socket connection is a bottleneck with large amounts of data</p></li><li><p>On the other hand, in-database processing techniques are cumbersome, and rewriting analytical pipelines into SQL remains a research problem.&nbsp;</p></li></ul><h4><strong>Contributions of the Paper</strong></h4><p>In this paper, they show:</p><ul><li><p>Classification models (such as the ones provided by scikit-learn) that can be trained within a column-store RDBMS using a combination of a Python User-Defined Function (UDF) and SQL (see image earlier for an example)</p></li><li><p>Storage of models within the database, which can then further be used for testing and future predictions</p></li><li><p>Performance benefits of running the end-to-end analytical workflow within the database. This is compared against reading raw data from files or a separate database, followed by pre-processing, model training and inference in a Python environment.&nbsp;</p></li></ul><h4><strong>Our Take</strong></h4><p>Given how much data is being stored in analytical databases (Snowflake, Amazon Redshift, Google BigQuery), there seems to be growing interest in bringing non-SQL workloads to these databases. <a href="https://cloud.google.com/bigquery-ml/docs/introduction">BigQuery</a> appears to be the furthest ahead today - check out the following example of <a href="https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-dnn-models">creating a Deep Learning model using TensorFlow directly inside BigQuery</a>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-dnn-models" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EuY6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F89100a58-15a0-41a2-9129-af49ab373154_1600x805.png 424w, https://substackcdn.com/image/fetch/$s_!EuY6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F89100a58-15a0-41a2-9129-af49ab373154_1600x805.png 848w, https://substackcdn.com/image/fetch/$s_!EuY6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F89100a58-15a0-41a2-9129-af49ab373154_1600x805.png 1272w, https://substackcdn.com/image/fetch/$s_!EuY6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F89100a58-15a0-41a2-9129-af49ab373154_1600x805.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EuY6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F89100a58-15a0-41a2-9129-af49ab373154_1600x805.png" width="1456" height="733" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/89100a58-15a0-41a2-9129-af49ab373154_1600x805.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:733,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-dnn-models&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EuY6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F89100a58-15a0-41a2-9129-af49ab373154_1600x805.png 424w, https://substackcdn.com/image/fetch/$s_!EuY6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F89100a58-15a0-41a2-9129-af49ab373154_1600x805.png 848w, https://substackcdn.com/image/fetch/$s_!EuY6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F89100a58-15a0-41a2-9129-af49ab373154_1600x805.png 1272w, https://substackcdn.com/image/fetch/$s_!EuY6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F89100a58-15a0-41a2-9129-af49ab373154_1600x805.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>However, whether such approaches will take off remains an open question. If you&#8217;re working on problems involving tabular data, this might be worth a try. We remain intrigued by these ideas and will follow developments closely. </p><h2><a href="https://twitter.com/EricTopol/status/1371496436764975104">Covid-19 + AI</a></h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/EricTopol/status/1371496436764975104&quot;,&quot;full_text&quot;:&quot;There've been &amp;gt; 300 <span class=\&quot;tweet-fake-link\&quot;>#AI</span> models, &amp;gt;2,000 studies for covid medical imaging (chest X-ray, CT) diagnosis.Systematically reviewed here:\n*\&quot;None of the models are of potential clinical use due to methodological flaws and/or underlying biases\&quot;*\n<a class=\&quot;tweet-url\&quot; href=\&quot;https://www.nature.com/articles/s42256-021-00307-0\&quot;>nature.com/articles/s4225&#8230;</a>\n<span class=\&quot;tweet-fake-link\&quot;>@NatMachIntell</span> &quot;,&quot;username&quot;:&quot;EricTopol&quot;,&quot;name&quot;:&quot;Eric Topol&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Mon Mar 15 16:20:06 +0000 2021&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/EwiH7FvVIAMni97.png&quot;,&quot;link_url&quot;:&quot;https://t.co/08YksnbxF3&quot;,&quot;alt_text&quot;:null}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:520,&quot;like_count&quot;:1181,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>This is a slightly depressing Twitter thread from <a href="https://twitter.com/EricTopol">Eric Topol</a>. He discusses an <a href="https://www.nature.com/articles/s42256-021-00307-0">article from Nature Machine Intelligence</a> which shows that even with more than 2000 studies involving machine learning models on detection and prognostication of Covid-19 from Chest X-Rays and CT images, not a single one is of &#8220;potential clinical use due to methodological flaws and/or underlying biases.&#8221;&nbsp;</p><p>While the enthusiasm to do interesting work in Machine Learning remains high, producing valuable research remains difficult. We hope that clear documentation and well-defined processes become the norm both in industry and academia!</p><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. This is only <a href="https://s2.q4cdn.com/299287126/files/doc_financials/annual/Shareholderletter97.pdf">Day 1</a> for MLOps and this newsletter and we would love to hear your thoughts and feedback. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> (open to DMs as well) or email us at mlmonitoringnews@gmail.com</p>]]></content:encoded></item><item><title><![CDATA[Issue #14: AI Index Report 2021. Multimodal Neurons. AI Infra Alliance. Similarity Search]]></title><description><![CDATA[Welcome to the 14th issue of the MLOps newsletter.]]></description><link>https://mlopsroundup.substack.com/p/issue-14-ai-index-report-2021-multimodal</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-14-ai-index-report-2021-multimodal</guid><dc:creator><![CDATA[Rishabh Bhargava]]></dc:creator><pubDate>Mon, 22 Mar 2021 17:14:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xfer!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fa8f4-a8ce-4f28-a95e-0b557097d04c_1600x801.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 14th issue of the MLOps newsletter. We&#8217;ve been writing this newsletter for six months now and have honestly been surprised by how many wonderful folks (by that, we mean you!) choose to read this. We remain equally excited for the next six months.&nbsp;</p><p><strong>In this issue, we look at charts and numbers from the AI Index Report 2021, cover some fascinating research from OpenAI, news of the AI Infra Alliance, Vector Databases</strong> and much more.&nbsp;Thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><a href="https://aiindex.stanford.edu/report/">&nbsp;Artificial Intelligence Index Report 2021</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xfer!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fa8f4-a8ce-4f28-a95e-0b557097d04c_1600x801.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xfer!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fa8f4-a8ce-4f28-a95e-0b557097d04c_1600x801.png 424w, https://substackcdn.com/image/fetch/$s_!xfer!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fa8f4-a8ce-4f28-a95e-0b557097d04c_1600x801.png 848w, https://substackcdn.com/image/fetch/$s_!xfer!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fa8f4-a8ce-4f28-a95e-0b557097d04c_1600x801.png 1272w, https://substackcdn.com/image/fetch/$s_!xfer!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fa8f4-a8ce-4f28-a95e-0b557097d04c_1600x801.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xfer!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fa8f4-a8ce-4f28-a95e-0b557097d04c_1600x801.png" width="1456" height="729" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/288fa8f4-a8ce-4f28-a95e-0b557097d04c_1600x801.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:729,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xfer!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fa8f4-a8ce-4f28-a95e-0b557097d04c_1600x801.png 424w, https://substackcdn.com/image/fetch/$s_!xfer!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fa8f4-a8ce-4f28-a95e-0b557097d04c_1600x801.png 848w, https://substackcdn.com/image/fetch/$s_!xfer!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fa8f4-a8ce-4f28-a95e-0b557097d04c_1600x801.png 1272w, https://substackcdn.com/image/fetch/$s_!xfer!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F288fa8f4-a8ce-4f28-a95e-0b557097d04c_1600x801.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Bigger seems to be better with NLP models - performance as size of GPT-3 model increases</figcaption></figure></div><p>This is an annual report from the Stanford&#8217;s Institute for Human-Centred AI. Clocking in at 222 pages, it spans everything from R&amp;D and technical performance of AI, trends with AI education and ethics and the impact of Covid-19 on AI development. Here we&#8217;ll highlight a subset of topics, but we&#8217;d recommend skimming through the report if you have the time.&nbsp;</p><h4><strong>Research and Technical Performance</strong></h4><ul><li><p>The number of AI publications grew by 34.5% from 2019-2020, compared to 1.9.6% from 2018-2019.</p></li><li><p>Performance of generative AI systems (text, audio, images) has improved to a sufficiently high degree that humans have a hard time telling the difference between synthetic and non-synthetic outputs for certain applications. That being said, there has been a significant improvement in the results for the <a href="https://www.kaggle.com/c/deepfake-detection-challenge/">Deepfake Detection Challenge</a> created by Facebook.</p></li><li><p>Performance on image benchmarks seems to be flattening out, which suggests that harder benchmarks are needed (as an example, we saw 98.8% Top-5 accuracy on <a href="http://image-net.org/">ImageNet</a> when using additional training data)</p></li><li><p>NLP continues to make massive gains against the state-of-the-art with advances like GPT-3, but also with systems achieving near-human performance on language understanding tasks such as <a href="https://super.gluebenchmark.com/">SuperGlue</a>.&nbsp;</p></li><li><p><a href="https://deepmind.com/blog/article/AlphaFold-Using-AI-for-scientific-discovery">AlphaFold</a> from DeepMind made a significant breakthrough in the challenge of protein folding -- generally speaking, AI has had a major impact on biology and healthcare.&nbsp;</p></li></ul><h4><strong>AI and the Economy</strong></h4><ul><li><p>&#8220;Drugs, Cancer, Molecular, Drug Discovery&#8221; received the greatest amount of private AI investment in 2020, with more than USD 13.8 billion, 4.5 times higher than 2019.</p></li><li><p>More private investment in AI is being funneled into fewer startups. 2020 saw a 9.3% increase in the amount of private AI investment from 2019 (compared to 5.7% in 2019 from 2018), though the number of newly funded companies decreased for the third year in a row.</p></li><li><p>Brazil, India, Canada, Singapore, and South Africa are the countries with the highest growth in AI hiring from 2016 to 2020.&nbsp;</p></li><li><p>Despite the economic downturn caused by the pandemic, half the respondents in a McKinsey survey said that the coronavirus had no effect on their investment in AI, while 27% actually reported increasing their investment.</p></li><li><p>Check the chart below for the growth in AI job postings in different industries:</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OC_2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f3b3552-5b69-4a44-bcfe-203439584476_1600x1223.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OC_2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f3b3552-5b69-4a44-bcfe-203439584476_1600x1223.png 424w, https://substackcdn.com/image/fetch/$s_!OC_2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f3b3552-5b69-4a44-bcfe-203439584476_1600x1223.png 848w, https://substackcdn.com/image/fetch/$s_!OC_2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f3b3552-5b69-4a44-bcfe-203439584476_1600x1223.png 1272w, https://substackcdn.com/image/fetch/$s_!OC_2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f3b3552-5b69-4a44-bcfe-203439584476_1600x1223.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OC_2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f3b3552-5b69-4a44-bcfe-203439584476_1600x1223.png" width="1456" height="1113" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/6f3b3552-5b69-4a44-bcfe-203439584476_1600x1223.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1113,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OC_2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f3b3552-5b69-4a44-bcfe-203439584476_1600x1223.png 424w, https://substackcdn.com/image/fetch/$s_!OC_2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f3b3552-5b69-4a44-bcfe-203439584476_1600x1223.png 848w, https://substackcdn.com/image/fetch/$s_!OC_2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f3b3552-5b69-4a44-bcfe-203439584476_1600x1223.png 1272w, https://substackcdn.com/image/fetch/$s_!OC_2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f3b3552-5b69-4a44-bcfe-203439584476_1600x1223.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>AI Policy and National Strategies</strong></h4><ul><li><p>More than 30 countries have published documents discussing national AI strategy since Canada was the first to do so in 2017.&nbsp;</p></li><li><p>US Federal civilian (non-defense) agencies allocated ~1.1B USD for AI research and development, while some reports suggest that the Defense R&amp;D spend on AI might be closer to 5B USD</p></li><li><p>In the US, the most recently ended 116th Congress (January 3, 2019 &#8211; January 3, 2021) was the most AI-focused congressional session in history, with the number of AI mentions in legislation and congressional reports being more than triple of the 115th Congress.</p></li></ul><h4><strong>Final Thoughts</strong></h4><p>At a very high-level in the world of AI, the chart seems to be going &#8220;up and to the right&#8221; along almost all meaningful dimensions. The next decade will remain an exciting time to contribute to AI research, applications and strategy.</p><h2><a href="https://openai.com/blog/multimodal-neurons/">Multimodal Neurons in Artificial Neural Networks </a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qbms!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5b36f1-f4fe-4200-a963-2e052ba8c8b0_1600x425.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qbms!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5b36f1-f4fe-4200-a963-2e052ba8c8b0_1600x425.png 424w, https://substackcdn.com/image/fetch/$s_!qbms!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5b36f1-f4fe-4200-a963-2e052ba8c8b0_1600x425.png 848w, https://substackcdn.com/image/fetch/$s_!qbms!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5b36f1-f4fe-4200-a963-2e052ba8c8b0_1600x425.png 1272w, https://substackcdn.com/image/fetch/$s_!qbms!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5b36f1-f4fe-4200-a963-2e052ba8c8b0_1600x425.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qbms!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5b36f1-f4fe-4200-a963-2e052ba8c8b0_1600x425.png" width="1456" height="387" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/1b5b36f1-f4fe-4200-a963-2e052ba8c8b0_1600x425.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:387,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qbms!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5b36f1-f4fe-4200-a963-2e052ba8c8b0_1600x425.png 424w, https://substackcdn.com/image/fetch/$s_!qbms!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5b36f1-f4fe-4200-a963-2e052ba8c8b0_1600x425.png 848w, https://substackcdn.com/image/fetch/$s_!qbms!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5b36f1-f4fe-4200-a963-2e052ba8c8b0_1600x425.png 1272w, https://substackcdn.com/image/fetch/$s_!qbms!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5b36f1-f4fe-4200-a963-2e052ba8c8b0_1600x425.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>In the last issue of our newsletter, we briefly touched upon the multimodal neurons research by OpenAI. We cover this topic in more detail here. Recently, OpenAI released <a href="https://openai.com/blog/clip/">CLIP</a>, a new state of the art visual understanding model that outperforms existing vision systems on datasets like ImageNet and ObjectNet. As a follow up, OpenAI shared their observations around the existence of &#8220;multimodal neurons&#8221; in CLIP -&nbsp;</p><blockquote><p><em>We&#8217;ve discovered neurons in CLIP that respond to the same concept whether presented literally, symbolically, or conceptually.</em></p></blockquote><p>Typically, single neurons in an artificial neural network would fire for a visual cluster of ideas - e.g. &#8220;edge detectors&#8221;, &#8220;face detectors&#8221; etc. But this finding is novel, in that the neuron responds to a semantic cluster of ideas represented across a variety of forms (sketch, picture, text etc) by forming abstractions. However, as noted in the paper, the degree of abstraction in multimodal neurons can present new attack vectors and introduce sources of bias that haven&#8217;t manifested in previous systems.</p><h4><strong>Typographic Attacks</strong></h4><p>Authors observed that excitations of the multimodal neurons in CLIP can be controllable by its response to images of text. This introduces a simple attack vector they call &#8220;typographic attacks&#8221;: Fooling the model into classifying an image by overlaying adversarial text unrelated to the underlying image. For example, in the image shared above, ovulating &#8220;$$$&#8221; over an image of a poodle, the model is fooled into thinking the image is a piggy bank:</p><blockquote><p><em>&#8220;The finance neuron [<a href="https://microscope.openai.com/models/contrastive_4x/image_block_4_5_Add_6_0/1330">1330</a>], for example, responds to images of piggy banks, but also responds to the string &#8220;$$$&#8221;. By forcing the finance neuron to fire, we can fool our model into classifying a dog as a piggy bank.&#8221;</em></p></blockquote><h4><strong>Bias and overgeneralization</strong></h4><p>Another unintended consequence of abstraction is new sources of bias stemming from overgeneralization: the model learns associations between concepts because of the associations present in underlying training data. As noted in the article, some such associations uncovered during CLIP&#8217;s evaluation:</p><ul><li><p>&#8220;Middle East&#8221; neuron <a href="https://microscope.openai.com/models/contrastive_v2/image_block_4_2_Add_6_0/1895">[1895]</a> has an association with terrorism;&nbsp;</p></li><li><p>&#8220;immigration&#8221; neuron <a href="https://microscope.openai.com/models/contrastive_v2/image_block_4_2_Add_6_0/395">[395]</a> fires when the input contains Latin America.&nbsp;</p></li><li><p>Neuron [<a href="https://microscope.openai.com/models/contrastive_4x/image_block_4_5_Add_6_0/1257">1257</a>] fires for both dark-skinned people and gorillas.</p></li></ul><h4><strong>Our take&nbsp;</strong></h4><p>Both, adversarial attacks like the typographic attack mentioned above, and bias from overgeneralization, present challenges to real world adoption. While attacks like the typographic attacks can be at least formulated as adversarial learning, problems arising from bias and overgeneralization are even more challenging: While the examples shared above give us anecdotal evidence of bias, it is hard to measure or quantify this because the exhaustive set of all possible biases is impossible to anticipate in advance. We thank the authors for sharing these insights, and agree with their view on the importance of building a robust toolkit to study interpretability:</p><blockquote><p><em>We believe that these tools of interpretability may aid practitioners the ability to preempt potential problems, by discovering some of these associations and ambiguities ahead of time.</em></p></blockquote><h2><a href="https://ai-infrastructure.org/">AI Infrastructure Alliance: Building the canonical stack for ML</a></h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h50o!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a9a7d-9770-4b7c-aaf9-1a7445b4598e_1600x364.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h50o!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a9a7d-9770-4b7c-aaf9-1a7445b4598e_1600x364.png 424w, https://substackcdn.com/image/fetch/$s_!h50o!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a9a7d-9770-4b7c-aaf9-1a7445b4598e_1600x364.png 848w, https://substackcdn.com/image/fetch/$s_!h50o!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a9a7d-9770-4b7c-aaf9-1a7445b4598e_1600x364.png 1272w, https://substackcdn.com/image/fetch/$s_!h50o!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a9a7d-9770-4b7c-aaf9-1a7445b4598e_1600x364.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h50o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a9a7d-9770-4b7c-aaf9-1a7445b4598e_1600x364.png" width="1456" height="331" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7a2a9a7d-9770-4b7c-aaf9-1a7445b4598e_1600x364.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:331,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!h50o!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a9a7d-9770-4b7c-aaf9-1a7445b4598e_1600x364.png 424w, https://substackcdn.com/image/fetch/$s_!h50o!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a9a7d-9770-4b7c-aaf9-1a7445b4598e_1600x364.png 848w, https://substackcdn.com/image/fetch/$s_!h50o!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a9a7d-9770-4b7c-aaf9-1a7445b4598e_1600x364.png 1272w, https://substackcdn.com/image/fetch/$s_!h50o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a9a7d-9770-4b7c-aaf9-1a7445b4598e_1600x364.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4><strong>What is it ?&nbsp;</strong></h4><p>We&#8217;ve written in the past about the wide variety of tools and products that comprise the enterprise AI landscape today (check out the <a href="https://github.com/visenger/awesome-mlops">Awesome MLOps repo</a>, <a href="https://huyenchip.com/2020/12/30/mlops-v2.html">MLOps Tooling landscape</a>). As a first order approximation, we can consider two axes along which to categorize these solutions:</p><ol><li><p>Part of the ML lifecycle (data labeling, model training, inference, monitoring)&nbsp;</p></li><li><p>Breadth of support (tied to a single cloud, support across cloud providers, VPC etc)</p></li></ol><p>With this cambrian explosion of tools, teams often struggle to stitch together the right set of tools to create an end-to-end ML training &amp; deployment workflow. With the aim to tackle this issue, more than 20 startups are coming together to form the <a href="https://ai-infrastructure.org/">AI Infrastructure Alliance</a>. Their goal is to define the &#8220;canonical stack&#8221; for enterprise Machine Learning - a set of common practices and standards for cross-platform support that these companies can build towards.&nbsp;</p><p>Dan Jeffries (who works at Pachyderm) will serve as director of the alliance and has previously written about the problem motivation in this post on a <a href="https://towardsdatascience.com/rise-of-the-canonical-stack-in-machine-learning-724e7d2faa75">Canonical Stack (CS) for machine learning</a>. This <a href="https://venturebeat.com/2021/02/24/band-of-ai-startups-launch-rebel-alliance-for-interoperability/">article</a>, reporting on the alliance quotes Dan Jeffries:</p><blockquote><p><em>&#8220; In a conversation with VentureBeat, Jeffries referred to the endeavor for small to medium-size businesses in AI as a &#8220;rebel alliance against the empire&#8221; that will serve as an alternative to offerings from Big Tech cloud providers, which he characterized as &#8220;building an infrastructure just to lock you in.&#8221;&#8221;</em></p></blockquote><h4><strong>Our take</strong></h4><p>We think the problem of defining common standards &amp; practices, allowing startups to build towards interoperability is a real one. We believe if this gains adoption then over time, larger cloud providers like AWS Sagemaker will want to bake these standards into the platform&#8217;s ML offerings, thus leveling the playing field and reducing complexity for the end customer. All in all, a big accelerant for ML adoption.&nbsp;</p><p>The alliance so far has laid out its goals in broad strokes. We look forward to more concrete technical details about the &#8220;canonical stack&#8221; and standards of interoperability surrounding it that are proposed.</p><p>We do wonder whether the Big Tech cloud providers (think Sagemaker and GCP Cloud ML) are really the &#8220;empire&#8221;? While they have MLOps offerings, there are startups both large and small that are trying to attack different parts of the pie. Will the &#8220;rebel alliance&#8221; be strong enough given that there are competitive dynamics between them? Will this alliance continue to be accepting of newer rebels? We have so many Star Wars themed questions.</p><p>Having said this, we continue to believe that AI &amp; ML are going to create tremendous value over the next two decades and the journey is likely to be a positive sum for most participants.&nbsp;</p><h2><a href="https://medium.com/mlops-community/domain-specific-machine-learning-monitoring-88bc0dd8a212">Domain-Specific MLOps: Tying Monitoring to Business Outcomes </a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Vo-a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e6e9a73-0c6b-46fe-afb7-b07684a7533f_1600x1066.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Vo-a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e6e9a73-0c6b-46fe-afb7-b07684a7533f_1600x1066.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Vo-a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e6e9a73-0c6b-46fe-afb7-b07684a7533f_1600x1066.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Vo-a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e6e9a73-0c6b-46fe-afb7-b07684a7533f_1600x1066.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Vo-a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e6e9a73-0c6b-46fe-afb7-b07684a7533f_1600x1066.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Vo-a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e6e9a73-0c6b-46fe-afb7-b07684a7533f_1600x1066.jpeg" width="1456" height="970" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/4e6e9a73-0c6b-46fe-afb7-b07684a7533f_1600x1066.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:970,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Vo-a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e6e9a73-0c6b-46fe-afb7-b07684a7533f_1600x1066.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Vo-a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e6e9a73-0c6b-46fe-afb7-b07684a7533f_1600x1066.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Vo-a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e6e9a73-0c6b-46fe-afb7-b07684a7533f_1600x1066.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Vo-a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4e6e9a73-0c6b-46fe-afb7-b07684a7533f_1600x1066.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://medium.com/@lina.weichbrodt?source=follow_footer-------------------------------------">Lina Weichbrodt</a>, a Machine Learning engineer based in Germany, recently wrote an insightful article on the importance of domain-specific metrics when monitoring ML models. We recommend reading it in its entirety but share our key takeaways and thoughts.</p><h4><strong>The case for domain-specific metrics&nbsp;</strong></h4><p>Monitoring ML models, much like monitoring any online software service, is important for a real-time &#8220;pulse check&#8221;. One question we need to answer as part of any ML monitoring solution is &#8220;what should we measure&#8221;. In addition to the already well-defined and well understood metrics around data, concept drift etc, Lina makes the case that it is important to track metrics that are as closely related to business/product outcomes as possible. For instance, at Spotify, the homepage personalization team tracked &#8220;Rank of a user&#8217;s most used carousel&#8221;: if a new model ranks the user&#8217;s favorite carousel low or a sudden drop in the rank occurs it indicates a problem.</p><h4><strong>Insights lie at the extremes&nbsp;</strong></h4><p>When designing domain-specific metrics for monitoring ML models, it helps to think of possible failure cases and design metrics that can capture the occurrence of failures even though they may be infrequent. As an analogy from DevOps, we track not just the average or median latency but also the P95 and P99 latency as this can help detect problems earlier.</p><h4><strong>Metric Maturity Cycle</strong></h4><p>This insight holds not just for metrics related to ML monitoring, but really any metric you want to build and operationalize: it helps to think of the maturity cycle:</p><ul><li><p><strong>Step1: Research &amp; Analysis</strong> - first &#8220;implementation&#8221; of a metric. The goal is to verify correctness and that tracking it can be valuable if productionized.</p></li><li><p><strong>Step2: Offline/batch compute - </strong>typically how most metrics first get productionized. There is a data pipeline that runs at regular intervals to fetch the upstream data, compute the metric and store it for consumption downstream (in dashboards, email reports etc)</p></li><li><p><strong>Step3: Realtime</strong> - the ideal stage for any metric. Computation is real-time, often using a streaming data abstraction. Realtime metrics are often accompanied by triggers and alerts that can proactively notify relevant stakeholders in case of any outliers.&nbsp;</p></li></ul><h2><a href="https://www.pinecone.io/">New product: Vector database from Pinecone</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CJ6X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8567bb49-f9a2-4424-9fa4-08621d76eb32_1540x898.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CJ6X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8567bb49-f9a2-4424-9fa4-08621d76eb32_1540x898.png 424w, https://substackcdn.com/image/fetch/$s_!CJ6X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8567bb49-f9a2-4424-9fa4-08621d76eb32_1540x898.png 848w, https://substackcdn.com/image/fetch/$s_!CJ6X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8567bb49-f9a2-4424-9fa4-08621d76eb32_1540x898.png 1272w, https://substackcdn.com/image/fetch/$s_!CJ6X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8567bb49-f9a2-4424-9fa4-08621d76eb32_1540x898.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CJ6X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8567bb49-f9a2-4424-9fa4-08621d76eb32_1540x898.png" width="1456" height="849" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/8567bb49-f9a2-4424-9fa4-08621d76eb32_1540x898.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:849,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CJ6X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8567bb49-f9a2-4424-9fa4-08621d76eb32_1540x898.png 424w, https://substackcdn.com/image/fetch/$s_!CJ6X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8567bb49-f9a2-4424-9fa4-08621d76eb32_1540x898.png 848w, https://substackcdn.com/image/fetch/$s_!CJ6X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8567bb49-f9a2-4424-9fa4-08621d76eb32_1540x898.png 1272w, https://substackcdn.com/image/fetch/$s_!CJ6X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8567bb49-f9a2-4424-9fa4-08621d76eb32_1540x898.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We recently came across a company called <a href="https://www.pinecone.io/">Pinecone</a> and the product that they're building, which is a <a href="https://www.pinecone.io/learn/">vector database</a>. This definitely feels like a missing piece of MLOps infrastructure. Often, data scientists are working with vectors and while vectors can be saved and retrieved relatively easily in relational and noSQL databases, a &#8220;vector database&#8221; can go above and beyond by providing similarity search as a service. Another analogy for this is: ElasticSearch for vectors (rather than raw text).&nbsp;</p><p>While an ML team can build something similar in-house by deploying a library like <a href="https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/">Faiss</a> or <a href="https://github.com/spotify/annoy">Annoy</a> (which are both pretty good for nearest-neighbour search), if a company can provide a hosted version (and take care of sometimes painful fine-tuning with these libraries), that might be valuable.&nbsp;</p><p>You can read a few of the examples (<a href="https://www.pinecone.io/learn/movie-recommender-system/">Movie recommendations</a> or <a href="https://www.pinecone.io/learn/image-similarity-search/">Image search</a>) that the Pinecone team provides if you want to learn more</p><h2>ML in Covid-19 Research: A sobering Twitter thread</h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/EricTopol/status/1371496436764975104&quot;,&quot;full_text&quot;:&quot;There've been &amp;gt; 300 <span class=\&quot;tweet-fake-link\&quot;>#AI</span> models, &amp;gt;2,000 studies for covid medical imaging (chest X-ray, CT) diagnosis.Systematically reviewed here:\n*\&quot;None of the models are of potential clinical use due to methodological flaws and/or underlying biases\&quot;*\n<a class=\&quot;tweet-url\&quot; href=\&quot;https://www.nature.com/articles/s42256-021-00307-0\&quot;>nature.com/articles/s4225&#8230;</a>\n<span class=\&quot;tweet-fake-link\&quot;>@NatMachIntell</span> &quot;,&quot;username&quot;:&quot;EricTopol&quot;,&quot;name&quot;:&quot;Eric Topol&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Mon Mar 15 16:20:06 +0000 2021&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/EwiH7FvVIAMni97.png&quot;,&quot;link_url&quot;:&quot;https://t.co/08YksnbxF3&quot;,&quot;alt_text&quot;:null}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:384,&quot;like_count&quot;:842,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. This is only <a href="https://s2.q4cdn.com/299287126/files/doc_financials/annual/Shareholderletter97.pdf">Day 1</a> for MLOps and this newsletter and we would love to hear your thoughts and feedback. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> (open to DMs as well) or email us at mlmonitoringnews@gmail.com</p>]]></content:encoded></item><item><title><![CDATA[Issue #13: Feature Stores. Information Leakage. AWS Well-Architected Framework. Covid & ML. ]]></title><description><![CDATA[Welcome to the 13th issue of the ML Ops newsletter.]]></description><link>https://mlopsroundup.substack.com/p/issue-13-feature-stores-information</link><guid isPermaLink="false">https://mlopsroundup.substack.com/p/issue-13-feature-stores-information</guid><dc:creator><![CDATA[Nihit Desai]]></dc:creator><pubDate>Mon, 08 Mar 2021 18:03:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Do1Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde31d06-91d5-4d25-8e64-f9bb442047be_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the 13th issue of the ML Ops newsletter. We hope this issue finds you well &#8212; as vaccinations continue to ramp up across the world, we hope that you receive one soon (if you haven&#8217;t already). &#128137;</p><p>In this issue, we cover a wonderful article on feature stores, explore information leaks in embedding models, continue our coverage of the AWS Well-architected framework whitepaper, discuss the impact of Covid on banks and more.</p><p>As always, thank you for subscribing. If you find this newsletter interesting, tell a few friends and support this project &#10084;&#65039;</p><h2><strong><a href="https://eugeneyan.com/writing/feature-stores/">Eugene Yan | Feature Stores - A Hierarchy of Needs</a></strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://eugeneyan.com/writing/feature-stores/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Do1Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde31d06-91d5-4d25-8e64-f9bb442047be_1200x630.png 424w, https://substackcdn.com/image/fetch/$s_!Do1Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde31d06-91d5-4d25-8e64-f9bb442047be_1200x630.png 848w, https://substackcdn.com/image/fetch/$s_!Do1Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde31d06-91d5-4d25-8e64-f9bb442047be_1200x630.png 1272w, https://substackcdn.com/image/fetch/$s_!Do1Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde31d06-91d5-4d25-8e64-f9bb442047be_1200x630.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Do1Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde31d06-91d5-4d25-8e64-f9bb442047be_1200x630.png" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/fde31d06-91d5-4d25-8e64-f9bb442047be_1200x630.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://eugeneyan.com/writing/feature-stores/&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Do1Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde31d06-91d5-4d25-8e64-f9bb442047be_1200x630.png 424w, https://substackcdn.com/image/fetch/$s_!Do1Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde31d06-91d5-4d25-8e64-f9bb442047be_1200x630.png 848w, https://substackcdn.com/image/fetch/$s_!Do1Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde31d06-91d5-4d25-8e64-f9bb442047be_1200x630.png 1272w, https://substackcdn.com/image/fetch/$s_!Do1Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde31d06-91d5-4d25-8e64-f9bb442047be_1200x630.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a really insightful read from <a href="https://twitter.com/eugeneyan?lang=en">Eugene Yan</a>. We covered Feature Stores <a href="https://mlopsroundup.substack.com/p/issue-6-mlops-resources-feature-stores">last year</a>, and since then, the chatter around Feature Stores has only grown. Back then, based on <a href="https://www.tecton.ai/blog/what-is-a-feature-store/">this blog post</a> from Tecton, we mentioned:</p><blockquote><p>A feature store is an ML-specific data system that:</p><ul><li><p>Runs data pipelines that transform raw data into feature values</p></li><li><p>Stores and manages the feature data itself, and</p></li><li><p>Serves feature data consistently for training and inference purposes</p></li></ul></blockquote><p>Eugene breaks down feature stores by viewing them through a &#8220;<a href="https://en.wikipedia.org/wiki/Maslow's_hierarchy_of_needs">hierarchy of needs</a>&#8221; lens. This means that feature stores fulfil several needs and that &#8220;some needs are more pressing than others&#8221; and thus have to be addressed first. Let&#8217;s look into these:</p><ul><li><p><strong>Access needs</strong>: At the most basic level, a feature store needs to provide users access to feature data and other information about features. This makes it easier for data scientists to discover work done in the past, understand how/when to use features and experiment easily.&nbsp;</p></li><li><p><strong>Serving needs</strong>: Next, the models need to be served in production and features need to be made available to these models with high throughput and low latency. This might involve the movement of feature data from analytical data stores (Snowflake) to faster, &#8220;more operational&#8221; data stores (Redis). There might be data transformation needs addressed in this layer as well.&nbsp;</p></li><li><p><strong>Integrity needs</strong>: The feature data being served to models in production needs to be fresh, accurate and available at all times. Advanced features would allow point-in-time correctness (providing a snapshot of how the data looked at an earlier time) and an understanding of train-serve skew.&nbsp;</p></li><li><p><strong>Convenience needs</strong>: Features stores should be simple to work with for data scientists and software engineers. It should be intuitive to use features during development and serve them in production. Features should be easy to discover and when distributions change, users should know about it.&nbsp;</p></li><li><p><strong>Autopilot needs</strong>: Tedious work for feature engineering should be simplified &#8212; backfilling data, monitoring and alerting, etc.&nbsp;</p></li></ul><h3>Discussion</h3><p>We encourage you to read the <a href="https://eugeneyan.com/writing/feature-stores/">original article</a> if you want to dive deeper - it is superbly written and provides enlightening examples from major feature store deployments (<a href="https://blog.gojekengineering.com/feast-bridging-ml-models-and-data-efd06b7d1644?gi=efad546c4fbb">Gojek</a>, <a href="https://doordash.engineering/2020/11/19/building-a-gigascale-ml-feature-store-with-redis/">Doordash</a>, <a href="https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform">Airbnb</a>, <a href="https://nlathia.github.io/2020/12/Building-a-feature-store.html">Monzo</a>, <a href="https://www.infoq.com/presentations/michelangelo-palette-uber/">Uber</a>, <a href="https://databricks.com/session/fact-store-scale-for-netflix-recommendations">Netflix</a> and others). However, here are a few things we find particularly interesting.&nbsp;</p><p><strong>Feature Sharing</strong></p><p>At Gojek, Eugene says:</p><blockquote><p>Data engineers and scientists create features and contribute them to the feature store. Then, ML practitioners consume these ready-made features, saving time by not having to create their own features.&nbsp;</p></blockquote><p>But then he goes on:&nbsp;</p><blockquote><p>Not sure how I feel about this productivity gain, given my preference for data scientists to be more <a href="https://eugeneyan.com/writing/end-to-end-data-science/">end-to-end</a>.</p></blockquote><p>This resonates with us. There is a balance between data scientists fully relying on available features as &#8220;raw data&#8221; vs experimenting, transforming data based on their intuition and really owning the models they build. We wonder if there are factors such as company size or the number of unique models owned by a team or the cost of recreating features that play a part in how much &#8220;feature sharing&#8221; is prevalent.&nbsp;</p><p><strong>Development vs Production data</strong></p><p>A common pain point for data scientists working in dev environments is that they don&#8217;t have access to production data. This leads to issues such as train-serve skews going unnoticed, or worse, data leaks in features. Having a feature store that can bridge these gaps is a huge win.&nbsp;</p><p><strong>What is Day 1 vs Day 100 for Feature Stores?&nbsp;</strong></p><p>There are open questions about when should teams start thinking about &#8220;feature stores&#8221;? There are aspects of a feature store (access to features, serving them in models) that are necessary when deploying a model, but should a team building their first model really think about buying/building a feature store?&nbsp;</p><p>Companies with relatively well-developed ML teams have likely thought about several of these needs already and might have their &#8220;unique&#8221; infrastructure that supports them. Is it even reasonable for them to integrate an external solution? Or is it worth it? If so, what is the path for them?&nbsp;</p><p>We would love to hear from you if you have thoughts on this!</p><h2><a href="https://arxiv.org/pdf/2004.00053.pdf">Paper | Information Leakage in Embedding Models</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://arxiv.org/pdf/2004.00053.pdf" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dkUF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73ecf7-be3a-449f-8d9a-8ae0e0df27db_1144x354.png 424w, https://substackcdn.com/image/fetch/$s_!dkUF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73ecf7-be3a-449f-8d9a-8ae0e0df27db_1144x354.png 848w, https://substackcdn.com/image/fetch/$s_!dkUF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73ecf7-be3a-449f-8d9a-8ae0e0df27db_1144x354.png 1272w, https://substackcdn.com/image/fetch/$s_!dkUF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73ecf7-be3a-449f-8d9a-8ae0e0df27db_1144x354.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dkUF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73ecf7-be3a-449f-8d9a-8ae0e0df27db_1144x354.png" width="1144" height="354" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/1e73ecf7-be3a-449f-8d9a-8ae0e0df27db_1144x354.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:354,&quot;width&quot;:1144,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://arxiv.org/pdf/2004.00053.pdf&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dkUF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73ecf7-be3a-449f-8d9a-8ae0e0df27db_1144x354.png 424w, https://substackcdn.com/image/fetch/$s_!dkUF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73ecf7-be3a-449f-8d9a-8ae0e0df27db_1144x354.png 848w, https://substackcdn.com/image/fetch/$s_!dkUF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73ecf7-be3a-449f-8d9a-8ae0e0df27db_1144x354.png 1272w, https://substackcdn.com/image/fetch/$s_!dkUF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73ecf7-be3a-449f-8d9a-8ae0e0df27db_1144x354.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Background</strong></h4><p>Embedding models learn mappings from the space of raw input data to an n-dimensional space (typically much lower than the dimensionality of the input), while preserving important semantic information about the inputs. Learning pre-training embeddings from large amounts of unlabeled data (often called &#8220;trunk models&#8221;) that can then be fine-tuned for downstream tasks has become a common practice in achieving state-of-the-art results on a variety of machine learning tasks.&nbsp;</p><h4><strong>The problem</strong></h4><p>This paper demonstrates that embedding models can often learn vector representations that leak sensitive information about the raw data. It introduces three classes of adversarial attacks to study the kind of information leaked by embedding models:</p><ol><li><p><strong>Inversion Attack: </strong>Think of embedding model as a functional mapping F: input data &#8594; n-dim output space. Embedding vectors can be inverted to partially recover the input data, given the embedding. Two types of attacks are studied under this scenario: white-box attack (where embedding model&#8217;s parameters are known) and black-box attack (where we only have the input raw data and output embedding representations). The paper shows that popular sentence embedding models like BERT allow recovery of between 50%&#8211;70% of the input words!&nbsp;</p></li><li><p><strong>Attribute Inference Attack</strong>: Oftentimes, for privacy reasons, we may not want to reveal/log/store the raw input to a model, but only an n-dimensional embedding representation. However, the paper shows that embeddings can reveal sensitive attributes (metadata) about a raw input just from the embeddings. For example: </p><blockquote><p>attributes such as authorship of text can be easily extracted by training an inference model on just a handful (10-50 examples per author) of labeled embedding vectors.&nbsp;</p></blockquote></li><li><p><strong>Membership inference Attack</strong>: In this kind of attack, the goal is to figure out if a given data point was part of the training set of an embedding model. The paper shows that, especially for infrequent/outlier training data inputs, it can be very easy to figure this out.</p></li></ol><h4><strong>Proposed Mitigation</strong></h4><p>The paper proposes and evaluates adversarial training techniques to minimize information leakage in embedding models. In this training scheme, embeddings are trained jointly with an adversarial loss: a simulated adversary A, whose goal is to infer any sensitive information, is trained jointly with an embedding model M, while M is trained to maximize the adversary&#8217;s loss and minimize the learning objective. If you&#8217;re interested in a deep dive into the technical details, we highly recommend reading the paper.</p><p>Embedding Models are gaining widespread use in real-world AI applications, and for good reason: To the extent that we can train models to understand content and context about the real world, and map it to a low dimensional representation, it makes the downstream task of task-specific modeling simpler and more efficient. However, papers like the one above highlight the potential dangers, one we think will be important to address and mitigate before these models can be adopted in privacy-sensitive domains like healthcare.</p><h2><a href="https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/wellarchitected-machine-learning-lens.pdf#welcome">AWS Whitepaper | AWS Well-Architected Framework - Machine Learning Lens</a> [Part 2] </h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/wellarchitected-machine-learning-lens.pdf#welcome" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9VVe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7ef8d7-bc25-44e6-bd94-6f9da6a64ad8_1522x932.png 424w, https://substackcdn.com/image/fetch/$s_!9VVe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7ef8d7-bc25-44e6-bd94-6f9da6a64ad8_1522x932.png 848w, https://substackcdn.com/image/fetch/$s_!9VVe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7ef8d7-bc25-44e6-bd94-6f9da6a64ad8_1522x932.png 1272w, https://substackcdn.com/image/fetch/$s_!9VVe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7ef8d7-bc25-44e6-bd94-6f9da6a64ad8_1522x932.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9VVe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7ef8d7-bc25-44e6-bd94-6f9da6a64ad8_1522x932.png" width="1456" height="892" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/2a7ef8d7-bc25-44e6-bd94-6f9da6a64ad8_1522x932.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:892,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/wellarchitected-machine-learning-lens.pdf#welcome&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9VVe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7ef8d7-bc25-44e6-bd94-6f9da6a64ad8_1522x932.png 424w, https://substackcdn.com/image/fetch/$s_!9VVe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7ef8d7-bc25-44e6-bd94-6f9da6a64ad8_1522x932.png 848w, https://substackcdn.com/image/fetch/$s_!9VVe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7ef8d7-bc25-44e6-bd94-6f9da6a64ad8_1522x932.png 1272w, https://substackcdn.com/image/fetch/$s_!9VVe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7ef8d7-bc25-44e6-bd94-6f9da6a64ad8_1522x932.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In our <a href="https://mlopsroundup.substack.com/p/issue-12-data-cascades-aws-well-architected">previous issue</a>, we introduced the Machine Learning Lens on AWS&#8217;s Well-Architected Framework. As they say:</p><blockquote><p>In the Machine Learning Lens, we focus on how to design, deploy, and architect your machine learning workloads in the AWS Cloud.</p></blockquote><p>In this issue, we will go over the five Pillars of the Well-Architected Framework for a machine learning solution. These five pillars are:</p><ul><li><p>Operational Excellence</p></li><li><p>Security</p></li><li><p>Reliability</p></li><li><p>Performance Efficiency</p></li><li><p>Cost Optimization</p></li></ul><p>The original document goes into significant detail, but here we will cover the considerations that were most interesting to us.&nbsp;</p><h4><strong>Operational Excellence</strong></h4><ul><li><p>Establish cross-functional teams (since different personas are involved)</p></li><li><p>Identify the end-to-end architecture and operational model early (this ensures that business and technical objectives are aligned)</p></li><li><p>Establish a model retraining strategy (data drift is a given)</p></li><li><p>Document findings throughout the process and version all input and artifacts (reproducibility!)</p></li></ul><h4><strong>Security</strong></h4><ul><li><p>Restrict access to ML systems and ensure data governance (inference, model and data access in all environments available only to the right users)</p></li><li><p>Enforce data lineage (makes it easier to trace source data in case of errors)</p></li><li><p>Enforce regulatory compliance (privacy concerns and HIPAA, GDPR need to be respected)</p></li></ul><h4><strong>Reliability</strong></h4><ul><li><p>Manage changes to model inputs through automation (since data and code can both change, automation ensures reproducibility)</p></li><li><p>Train once and deploy across environments (models shouldn&#8217;t be retrained when moving them from dev to prod, since minor data changes/randomness can influence behaviour)</p></li><li><p>Ensure that inference services can scale easily</p></li></ul><h4><strong>Performance Efficiency</strong></h4><ul><li><p>Optimize compute for your ML workload (training and inference workloads often require different hardware)</p></li><li><p>Define latency and network bandwidth performance requirements for your models (if real-time inference is needed in applications, latency requirements are critical)</p></li><li><p>Continuously monitor and measure system performance (collection of system, service and business metrics for ML workloads gives us directions for improvement)</p></li></ul><h4><strong>Cost Optimization</strong></h4><ul><li><p>Define overall ROI and opportunity cost for ML projects ahead of time (is a team of two data scientists worth it or can using predictions from some API suffice)</p></li><li><p>Experiment with small datasets (fail fast / de-risk projects early)</p></li><li><p>Account for inference architecture based on consumption pattern (batch predictions vs real-time predictions need different hardware)</p></li></ul><h4><strong>Final Thoughts</strong></h4><p>While this is a lot to take in, it can be valuable to look at end-to-end projects from the lens of these five pillars. We will continue to evaluate different frameworks for ML projects, but would love to hear from you on this as well!</p><h2><a href="https://www.bankofengland.co.uk/bank-overground/2021/how-has-covid-affected-the-performance-of-machine-learning-models-used-by-uk-banks">How has COVID-19 affected the performance of ML models used by UK banks</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.bankofengland.co.uk/bank-overground/2021/how-has-covid-affected-the-performance-of-machine-learning-models-used-by-uk-banks" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kCg5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F942265ae-316c-48e5-80ef-0b0f66fbdf3d_829x354.png 424w, https://substackcdn.com/image/fetch/$s_!kCg5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F942265ae-316c-48e5-80ef-0b0f66fbdf3d_829x354.png 848w, https://substackcdn.com/image/fetch/$s_!kCg5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F942265ae-316c-48e5-80ef-0b0f66fbdf3d_829x354.png 1272w, https://substackcdn.com/image/fetch/$s_!kCg5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F942265ae-316c-48e5-80ef-0b0f66fbdf3d_829x354.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kCg5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F942265ae-316c-48e5-80ef-0b0f66fbdf3d_829x354.png" width="829" height="354" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/942265ae-316c-48e5-80ef-0b0f66fbdf3d_829x354.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:354,&quot;width&quot;:829,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.bankofengland.co.uk/bank-overground/2021/how-has-covid-affected-the-performance-of-machine-learning-models-used-by-uk-banks&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kCg5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F942265ae-316c-48e5-80ef-0b0f66fbdf3d_829x354.png 424w, https://substackcdn.com/image/fetch/$s_!kCg5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F942265ae-316c-48e5-80ef-0b0f66fbdf3d_829x354.png 848w, https://substackcdn.com/image/fetch/$s_!kCg5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F942265ae-316c-48e5-80ef-0b0f66fbdf3d_829x354.png 1272w, https://substackcdn.com/image/fetch/$s_!kCg5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F942265ae-316c-48e5-80ef-0b0f66fbdf3d_829x354.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>The full report is available <a href="https://www.bankofengland.co.uk/quarterly-bulletin/2020/2020-q4/the-impact-of-covid-on-machine-learning-and-data-science-in-uk-banking">here</a>, and we recommend clicking through and going through the insightful charts.&nbsp;</p><h4><strong>High-level overview</strong></h4><blockquote><p>We asked banks how Covid-19 (Covid) had affected their use of machine learning and data science. Although these technologies will continue to have many benefits, over a third of banks reported a negative impact on model performance as a result of the pandemic.</p></blockquote><h4><strong>Learnings</strong></h4><ul><li><p>While half of the banks reported an increase in their perceived importance of Machine Learning (ML) and Data Science (DS) for them, only a third reported an actual increase in the number of planned or existing projects.&nbsp;</p></li><li><p>About 35% of banks reported a negative impact on ML model performance as a result of Covid. They note that is likely because of &#8220;major movements in macroeconomic variables, such as rising unemployment and mortgage forbearance&#8221;, which required ML models to be &#8220;recalibrated&#8221;.&nbsp;</p></li><li><p>Smaller banks showed a positive impact of Covid in their use of third-party solutions for different parts of their ML stack. While we don&#8217;t quite know the &#8220;magnitude&#8221; of impact, it reflects both a willingness to invest in ML applications and the openness to use third-party solutions.&nbsp;</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.bankofengland.co.uk/bank-overground/2021/how-has-covid-affected-the-performance-of-machine-learning-models-used-by-uk-banks" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4f7-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd664e838-bd0e-4fae-a23d-179faa9f7f41_1272x702.png 424w, https://substackcdn.com/image/fetch/$s_!4f7-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd664e838-bd0e-4fae-a23d-179faa9f7f41_1272x702.png 848w, https://substackcdn.com/image/fetch/$s_!4f7-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd664e838-bd0e-4fae-a23d-179faa9f7f41_1272x702.png 1272w, https://substackcdn.com/image/fetch/$s_!4f7-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd664e838-bd0e-4fae-a23d-179faa9f7f41_1272x702.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4f7-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd664e838-bd0e-4fae-a23d-179faa9f7f41_1272x702.png" width="1272" height="702" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/d664e838-bd0e-4fae-a23d-179faa9f7f41_1272x702.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:702,&quot;width&quot;:1272,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.bankofengland.co.uk/bank-overground/2021/how-has-covid-affected-the-performance-of-machine-learning-models-used-by-uk-banks&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4f7-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd664e838-bd0e-4fae-a23d-179faa9f7f41_1272x702.png 424w, https://substackcdn.com/image/fetch/$s_!4f7-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd664e838-bd0e-4fae-a23d-179faa9f7f41_1272x702.png 848w, https://substackcdn.com/image/fetch/$s_!4f7-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd664e838-bd0e-4fae-a23d-179faa9f7f41_1272x702.png 1272w, https://substackcdn.com/image/fetch/$s_!4f7-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd664e838-bd0e-4fae-a23d-179faa9f7f41_1272x702.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Our Take</strong></h4><p>While the number of institutions surveyed isn&#8217;t very high (17 UK banks, 9 banks based outside the UK and 6 insurers although representing 88% of all UK banks assets), we believe the information to be directionally correct. While there are more and more use cases that fit ML applications well, the financial sector being more heavily regulated means that the safe deployment of models will be critical.&nbsp;</p><h2><a href="https://www.sagifyml.com/">New tool alert: Sagify</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.sagifyml.com/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SGkC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F308e4f10-0bd9-4585-9f89-df441ffeb397_1150x490.png 424w, https://substackcdn.com/image/fetch/$s_!SGkC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F308e4f10-0bd9-4585-9f89-df441ffeb397_1150x490.png 848w, https://substackcdn.com/image/fetch/$s_!SGkC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F308e4f10-0bd9-4585-9f89-df441ffeb397_1150x490.png 1272w, https://substackcdn.com/image/fetch/$s_!SGkC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F308e4f10-0bd9-4585-9f89-df441ffeb397_1150x490.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SGkC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F308e4f10-0bd9-4585-9f89-df441ffeb397_1150x490.png" width="1150" height="490" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/308e4f10-0bd9-4585-9f89-df441ffeb397_1150x490.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:490,&quot;width&quot;:1150,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.sagifyml.com/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SGkC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F308e4f10-0bd9-4585-9f89-df441ffeb397_1150x490.png 424w, https://substackcdn.com/image/fetch/$s_!SGkC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F308e4f10-0bd9-4585-9f89-df441ffeb397_1150x490.png 848w, https://substackcdn.com/image/fetch/$s_!SGkC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F308e4f10-0bd9-4585-9f89-df441ffeb397_1150x490.png 1272w, https://substackcdn.com/image/fetch/$s_!SGkC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F308e4f10-0bd9-4585-9f89-df441ffeb397_1150x490.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We recently came across this neat MLOps tool that we wanted to share. <a href="https://www.sagifyml.com/">Sagify</a> is a command-line utility to train and deploy ML models on AWS Sagemaker. Think of it as a higher-level abstraction on top of AWS Sagemaker that hides away much of the low-level details. If you&#8217;re interested in a quick demo of the tool, check out this <a href="https://www.youtube.com/watch?v=cWv8zR2Qu94">video</a> or this intro <a href="https://pavlosmitsoulis.medium.com/how-to-deploy-a-machine-learning-model-as-a-streaming-application-a16e4c2d3959">article</a>.&nbsp;</p><h2>Fun: How to Break ML Models - Exhibit 9456</h2><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://twitter.com/moyix/status/1367575109305794563&quot;,&quot;full_text&quot;:&quot;The latest generation of adversarial image attacks is, uh, somewhat simpler to carry out <a class=\&quot;tweet-url\&quot; href=\&quot;https://openai.com/blog/multimodal-neurons/\&quot;>openai.com/blog/multimoda&#8230;</a> &quot;,&quot;username&quot;:&quot;moyix&quot;,&quot;name&quot;:&quot;Brendan Dolan-Gavitt&quot;,&quot;profile_image_url&quot;:&quot;&quot;,&quot;date&quot;:&quot;Thu Mar 04 20:38:09 +0000 2021&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/EvqaR1QXIAMiTch.jpg&quot;,&quot;link_url&quot;:&quot;https://t.co/h4e0bShq9i&quot;,&quot;alt_text&quot;:null}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:0,&quot;retweet_count&quot;:6418,&quot;like_count&quot;:19085,&quot;impression_count&quot;:0,&quot;expanded_url&quot;:{},&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>OpenAI recently released a fantastic post about <a href="https://openai.com/blog/multimodal-neurons/">Multimodal Neurons</a> - parts of the neural network that respond to a &#8220;concept&#8221; in the input content in the same way, across modalities like text, image, video. We&#8217;ll cover this article in depth in the next edition of our newsletter but here we wanted to share this funny (and scary) anecdote highlighting just how easy it is to mount adversarial attacks. Just because the model can read a piece of text in the input saying &#8220;iPod&#8221; the model thinks it is actually looking at an iPod!&nbsp;&nbsp;</p><h2>Thanks</h2><p>Thanks for making it to the end of the newsletter! This has been curated by <a href="https://twitter.com/nihit_desai">Nihit Desai</a> and <a href="https://twitter.com/rish_bhargava">Rishabh Bhargava</a>. This is only <a href="https://s2.q4cdn.com/299287126/files/doc_financials/annual/Shareholderletter97.pdf">Day 1</a> for MLOps and this newsletter and we would love to hear your thoughts and feedback. If you have suggestions for what we should be covering in this newsletter, tweet us <a href="https://twitter.com/mlopsroundup">@mlopsroundup</a> (open to DMs as well) or email us at mlmonitoringnews@gmail.com</p>]]></content:encoded></item></channel></rss>