<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Magis]]></title><description><![CDATA[Alex Izydorczyk's thoughts on data, finance, and economics, focusing on DaaS (data-as-a-service) businesses. ]]></description><link>https://magis.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!wYJp!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c41f48-a7aa-4609-966a-ee62fc65f2e4_640x640.png</url><title>Magis</title><link>https://magis.substack.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 11 Apr 2026 03:41:18 GMT</lastBuildDate><atom:link href="https://magis.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Alex Izydorczyk]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[magis@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[magis@substack.com]]></itunes:email><itunes:name><![CDATA[Alex Izydorczyk]]></itunes:name></itunes:owner><itunes:author><![CDATA[Alex Izydorczyk]]></itunes:author><googleplay:owner><![CDATA[magis@substack.com]]></googleplay:owner><googleplay:email><![CDATA[magis@substack.com]]></googleplay:email><googleplay:author><![CDATA[Alex Izydorczyk]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Skepticism, Early Trends & an Early Leader in AI-for-Hedge Funds Race]]></title><description><![CDATA[Ken Griffin, founder of Citadel, is skeptical of AI&#8217;s stock-picking abilities. AlphaSense, meanwhile, says AI is driving revenue. This post maps the early product categories forming in the space.]]></description><link>https://magis.substack.com/p/skepticism-early-trends-and-an-early</link><guid isPermaLink="false">https://magis.substack.com/p/skepticism-early-trends-and-an-early</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Sun, 19 Oct 2025 14:02:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Xe0-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dddcbfa-9fc3-47a3-a4b5-64e28f44554e_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Ken Griffin, the founder of Citadel, recently poured cold water on AI stock-picking abilities. At a JPMorgan investor conference, he argued that while generative AI can boost productivity, it <em>&#8220;falls short&#8221;</em> when it comes to uncovering investment alpha<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>. This measured skepticism can be read as a bearish signal for the wave of AI-for-hedge-funds startups. Yet despite Griffin&#8217;s doubts, the space is buzzing with interest from both buy-side technologists and Silicon Valley founders. I continue to track over <strong><a href="https://alexizydorczyk.com/ai-for-hedge-funds.html">100 startups</a></strong> building in this space. </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://alexizydorczyk.com/ai-for-hedge-funds.html&quot;,&quot;text&quot;:&quot;List of AI Startups for Hedge Funds&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://alexizydorczyk.com/ai-for-hedge-funds.html"><span>List of AI Startups for Hedge Funds</span></a></p><p>All of these new companies are private, so concrete data to verify whether Griffin&#8217;s comments are shared by other managers is hard to come by<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. Few have publicly known customer bases or revenue, and in my judgment <strong>most haven&#8217;t achieved major product&#8211;market fit</strong> to date. A notable exception is <strong>AlphaSense</strong><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>, arguably the leading &#8220;AI for hedge funds&#8221; company<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> and it actually predates the ChatGPT era. AlphaSense began as a document search engine and later integrated expert network content (partially through its acquisition of Tegus in 2024). In October 2025, the company announced it surpassed <strong>$500 million in ARR</strong><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>, with growth accelerating after launching a suite of AI research features. AlphaSense&#8217;s AI capabilities now range from search to an AI research copilot and automated spreadsheet analysis via a recent acquisition. It&#8217;s hard to pin down how much of this growth is driven by the new AI features exactly<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>, but it&#8217;s hard to doubt the AI tools haven&#8217;t contributed to their revenue success based on the disclosure below:<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AyPR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c9d32c9-b758-49bf-9945-9a4ff012957d_590x585.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AyPR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c9d32c9-b758-49bf-9945-9a4ff012957d_590x585.png 424w, https://substackcdn.com/image/fetch/$s_!AyPR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c9d32c9-b758-49bf-9945-9a4ff012957d_590x585.png 848w, https://substackcdn.com/image/fetch/$s_!AyPR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c9d32c9-b758-49bf-9945-9a4ff012957d_590x585.png 1272w, https://substackcdn.com/image/fetch/$s_!AyPR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c9d32c9-b758-49bf-9945-9a4ff012957d_590x585.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AyPR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c9d32c9-b758-49bf-9945-9a4ff012957d_590x585.png" width="590" height="585" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c9d32c9-b758-49bf-9945-9a4ff012957d_590x585.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:585,&quot;width&quot;:590,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:144693,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://magis.substack.com/i/176539642?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c9d32c9-b758-49bf-9945-9a4ff012957d_590x585.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AyPR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c9d32c9-b758-49bf-9945-9a4ff012957d_590x585.png 424w, https://substackcdn.com/image/fetch/$s_!AyPR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c9d32c9-b758-49bf-9945-9a4ff012957d_590x585.png 848w, https://substackcdn.com/image/fetch/$s_!AyPR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c9d32c9-b758-49bf-9945-9a4ff012957d_590x585.png 1272w, https://substackcdn.com/image/fetch/$s_!AyPR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c9d32c9-b758-49bf-9945-9a4ff012957d_590x585.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Recent post from SVP of Strategic Finance at Alphasense </figcaption></figure></div><p>If this remains the trend, it will be worth speculating on the source of Alphasense&#8217; success<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a>. For now, to better track product market fit in this space, I&#8217;ve begun categorizing the startups on my list into a few distinct buckets based on their core focus. These categories highlight the common approaches AI startups are taking to serve hedge funds and asset managers:</p><ul><li><p><strong>Research Copilots (AI &#8220;Analyst&#8221; Assistants):</strong> These act like an analyst on demand, answering free-form investment questions and automating parts of due diligence. The goal is to replicate or augment the work an analyst does for a portfolio manager &#8211; digesting financial reports, conducting industry research, and even writing up initial findings in response to a natural language query.</p></li><li><p><strong>Excel Copilots (Financial Modeling Aides):</strong> Tools that integrate with Excel or spreadsheet workflows to automate financial model building and data updating based on natural language instructions. For example, you might ask the tool to build a discounted cash flow model or pull the latest financials for a company, and it generates the model or populates data accordingly. These range from general Excel add-ins to more specialized tools focused on particular types of models or data sources.</p></li><li><p><strong>&#8220;Terminal 2.0&#8221; Platforms (Next-Gen Market Terminals):</strong> Startups re-imagining the Bloomberg Terminal experience with AI at the core. They typically offer real-time news summaries, intelligent alerts, and built-in research copilots as part of the interface. The idea is to provide a modernized market terminal that not only streams data and news but also leverages LLMs to surface insights (for example, summarizing why a stock is moving or flagging unusual patterns) in a more user-friendly way.</p></li><li><p><strong>AI Model Providers (Alpha, Quant, &amp; Forecasting Labs):</strong> Companies developing new foundational models, quant, or machine learning techniques tailored to financial data and time-series forecasting. Rather than a user interface, these firms often sell predictions or signals), or they offer API access to their proprietary models, optimizers, or signals. They frequently target quants. </p></li><li><p><strong>Data Extraction Tools:</strong> Startups focused on pulling structured data from unstructured sources such as SEC filings, earnings call transcripts, websites, or PDF reports. These tools often use AI to parse complex documents and output clean datasets, often in tabular form. In some cases they function as general web scrapers optimized for finance (for example, extracting KPIs from a 10-K filing automatically).</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p>These categories aren&#8217;t perfect. Some startups do many things (like Alphasense with their Carousal acquisition). The lines between categories can blur (e.g., new terminals versus research copilots that show real-time data). Other categories might also split in the future. &#8220;Alpha, Quant, &amp; Forecasting Labs,&#8221; for instance, covers a wide range of activities, from building timeseries transformers to Ai-augmented back-testing tools for quants. This is a first attempt to classify startups, and I&#8217;ll certainly revise. Please <a href="http://forms.gle/7xvmKY2EBwYtq3hx8">send</a> any suggestions or corrections.</p><p>Beyond these labels, there are adjacent categories where AI is being applied in the broader asset management and finance realm. Some startups target regulatory and compliance automation (e.g. using AI to scan trades or communications for compliance issues). Others build AI tools for financial advisors, investor relations teams, or asset allocators (for instance, helping wealth managers sift through research or aiding IR teams in crafting reports). There are also variants of the above categories aimed at <em>retail</em> investors &#8211; usually simplified interfaces with added social or educational features to make AI-driven insights accessible to individuals.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xe0-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dddcbfa-9fc3-47a3-a4b5-64e28f44554e_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xe0-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dddcbfa-9fc3-47a3-a4b5-64e28f44554e_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Xe0-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dddcbfa-9fc3-47a3-a4b5-64e28f44554e_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Xe0-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dddcbfa-9fc3-47a3-a4b5-64e28f44554e_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Xe0-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dddcbfa-9fc3-47a3-a4b5-64e28f44554e_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xe0-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dddcbfa-9fc3-47a3-a4b5-64e28f44554e_1024x1024.png" width="484" height="484" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5dddcbfa-9fc3-47a3-a4b5-64e28f44554e_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:484,&quot;bytes&quot;:1734357,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://magis.substack.com/i/176539642?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dddcbfa-9fc3-47a3-a4b5-64e28f44554e_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Xe0-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dddcbfa-9fc3-47a3-a4b5-64e28f44554e_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Xe0-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dddcbfa-9fc3-47a3-a4b5-64e28f44554e_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Xe0-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dddcbfa-9fc3-47a3-a4b5-64e28f44554e_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Xe0-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dddcbfa-9fc3-47a3-a4b5-64e28f44554e_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">My AI-generated rendition of the race for the AI Analyst</figcaption></figure></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Originally published by Bloomberg: <a href="https://www.bloomberg.com/news/articles/2025-10-15/ken-griffin-says-genai-fails-to-help-hedge-funds-produce-alpha">https://www.bloomberg.com/news/articles/2025-10-15/ken-griffin-says-genai-fails-to-help-hedge-funds-produce-alpha</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>It also may be true that the successful startups will be focused on middle or back-office. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p><a href="https://www.alpha-sense.com/">https://www.alpha-sense.com/</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>I excluded Alphasense in my original list of AI startups, because it predates the ChatGPT moment by a lot. At the time, I felt it would be difficult to draw bright lines around which pre-ChatGPT companies to include (ie. at what point do you have to list just any financial software vendor like Bloomberg that launched some AI feature). Nonetheless, it is clear they have had focused execution on this vertical and so I have added them. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p><a href="https://www.alpha-sense.com/press/alphasense-surpasses-500m-in-arr/">https://www.alpha-sense.com/press/alphasense-surpasses-500m-in-arr/</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>In their press release, they mention that expansion in Asia has also been a major contributor as well. The impact of the Tegus acquisition in June 2024 on subsequent net new ARR is also unclear.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p><a href="https://x.com/CharlieZvible/status/1979171610604003426">https://x.com/CharlieZvible/status/1979171610604003426</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><p>I tried to reach out to Charlie to learn more on the AI features above and was dismissed as building a competitor (Disclosure: I am not). I am also perpetually annoyed by Alphasense&#8217;s stance on their API (not offering one). So, I have a ways to go before I can fully endorse the company or bet on its success. </p></div></div>]]></content:encoded></item><item><title><![CDATA[Why do Startups Power Perplexity Finance?]]></title><description><![CDATA[A common &#8220;party trick&#8221; I observed while compiling and maintaining my list of hedge fund oriented AI startups is a relatively basic demo.]]></description><link>https://magis.substack.com/p/nextgen-financial-data-providers</link><guid isPermaLink="false">https://magis.substack.com/p/nextgen-financial-data-providers</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Fri, 29 Aug 2025 20:11:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bgFl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4188dc4-bb81-41a7-8b76-5ec72ca6663f_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A common &#8220;party trick&#8221; I observed while compiling and maintaining my list of hedge fund oriented AI startups is a relatively basic demo. Users are presented with a chat bar, they type &#8220;What are NVDA&#8217;s revenues the past 5 years?&#8221; and the interface returns a bar chart or table with the data. This demo is so common, it forms the basis of some financial evaluation benchmarks for LLMs.</p><p>The demo is a red herring with an intriguing wrinkle. Accurately parsing SEC filings to retrieve this data is legitimately difficult to do. However, consuming structured financial data this way is impractical and unnecessary. In practice, quants use structured financial data feeds while non-technical investors will use interfaces like Bloomberg Terminal. Pulling NVDA revenues in Bloomberg is far faster than any inference or parsing. So, while this demo demonstrates LLM capabilities, it is a poor pitch to customers. Simultaneously, it demonstrates that offline, batch-based extraction and cleaning of public domain data should become a commodity.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Surprisingly, existing incumbents have poorly served this new AI market. Consider Perplexity &#8211; a little digging suggests that they have a partnership with Factset, but the data you see on <a href="http://perplexity.com/finance">perplexity.com/finance</a> comes from startups<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>. So the question is: why? What are the oppurtunities on which to outcompete the incumbents and become the new data standard for the AI era? Here are some ideas: </p><ol><li><p>Avoiding (alleged) Anticompetitive Identifiers</p><ol><li><p>I&#8217;ve written about regulatory capture stemming from government agencies using proprietary identifiers<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. Beyond this, proprietary identifier owners commonly extract fees from users by inserting clauses into data contracts that users may be unaware are not required. Lawsuits continue in this area<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>.</p></li><li><p>OpenFIGI is a step in the right direction<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>.</p></li></ol></li><li><p>Modern Delivery Mechanisms</p><ol><li><p>Modern data delivery means delivering query results via REST APIs and bulk data via S3, Snowflake, Databricks, or similar bulk-sharing service.</p></li><li><p>Many incumbent data providers have tried (and largely failed) to create their own delivery mechanisms or white-labelled data environments. Others charge a premium to deliver data via modern interfaces (the equivalent, in my opinion, of a SaaS vendor charging extra to access their website with the latest version of Chrome).</p></li></ol></li><li><p>Transparent Contracts</p><ol><li><p>Almost all AI frontier companies have usage-based billing for API usage and seat based usage. Usage-based pricing schemes are naturally more complex and more difficult to forecast than fixed licenses. The complexity of AI companies&#8217; pricing pales in comparison though to what is common amongst incumbent data vendors.</p></li><li><p>Anyone familiar with an incumbent data vendor&#8217;s usage based contract knows it takes a combination of lawyers, accountants, and engineers to scope its exact cost.</p></li><li><p>A good litmus test for reasonable contracts is public availability of contracts and rate cards. Data purchase agreements contain no proprietary secrets a competitor could steal. Even if they did, protecting this information in a tight-knit industry would be impractical.</p></li></ol></li><li><p>Good and Public Documentation</p><ol><li><p>Lack of transparent documentation is a huge problem. Browsing proprietary portals for answers is simply annoying.</p></li></ol></li></ol><p>Speaking with hedge fund data executives, I'm often met with head nods followed by shrugs. Few funds are large enough to truly change the practices of one of the major data incumbents. Too few competitors exist to exert enough competitive pressure. This presents an opportunity for startups to meet these needs &#8211; especially since the barrier to entries in this market are coming down, not up.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bgFl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4188dc4-bb81-41a7-8b76-5ec72ca6663f_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bgFl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4188dc4-bb81-41a7-8b76-5ec72ca6663f_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!bgFl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4188dc4-bb81-41a7-8b76-5ec72ca6663f_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!bgFl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4188dc4-bb81-41a7-8b76-5ec72ca6663f_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!bgFl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4188dc4-bb81-41a7-8b76-5ec72ca6663f_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bgFl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4188dc4-bb81-41a7-8b76-5ec72ca6663f_1024x1024.png" width="527" height="527" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a4188dc4-bb81-41a7-8b76-5ec72ca6663f_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:527,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bgFl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4188dc4-bb81-41a7-8b76-5ec72ca6663f_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!bgFl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4188dc4-bb81-41a7-8b76-5ec72ca6663f_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!bgFl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4188dc4-bb81-41a7-8b76-5ec72ca6663f_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!bgFl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4188dc4-bb81-41a7-8b76-5ec72ca6663f_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For those who've followed my blog, you may sense a bit of regret. Cybersyn, my now-defunct startup<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>, provided both public domain and proprietary data, focusing mostly on the latter. On the public domain data side, my biggest question remains the long-term defensibility of this business. Nonetheless, Cybersyn had blue-chip users (and customers) that could have bought the same data from Bloomberg, Factset, or S&amp;P. Cybersyn&#8217;s datasets remain among the most popular on Snowflake Marketplace today. Being back in a data buying role, I now need my own previous products. So, there is some thread to pull here. </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://alexizydorczyk.com/ai-financial-data.html&quot;,&quot;text&quot;:&quot;My List of AI Data Startups&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://alexizydorczyk.com/ai-financial-data.html"><span>My List of AI Data Startups</span></a></p><p>Some excellent new data providers follow these principles, including Financial Modeling Prep, Quartr, and <a href="https://databento.com/">Databento</a>. Open source projects, like <a href="https://github.com/john-friedman/datamule-python">Datamule</a>, also show promise. It's no coincidence that innovative AI startups like Perplexity use these vendors as data sources. A paradigm change in technology will create opportunities for startups in adjacencies. Perhaps some of these startups will also find answers to the defensibility question.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>It's worth noting some exceptions beyond DaaS providers&#8217; control, especially when data comes from a very limited set of vendors. For instance, real-time market data is relatively centrally controlled by exchanges. Data vendors are subject to data owners' controls. So, if exchanges mandate a data governance regime incompatible with modern data-sharing practices, data vendors can do little. That said, some startups like Databento have navigated this quagmire to offer more modern data products.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Found in the footnotes of Perplexity Finance results:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Py8p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbea978-32e6-4875-bd0e-9de4d290b3a7_1758x342.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Py8p!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbea978-32e6-4875-bd0e-9de4d290b3a7_1758x342.png 424w, https://substackcdn.com/image/fetch/$s_!Py8p!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbea978-32e6-4875-bd0e-9de4d290b3a7_1758x342.png 848w, https://substackcdn.com/image/fetch/$s_!Py8p!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbea978-32e6-4875-bd0e-9de4d290b3a7_1758x342.png 1272w, https://substackcdn.com/image/fetch/$s_!Py8p!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbea978-32e6-4875-bd0e-9de4d290b3a7_1758x342.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Py8p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbea978-32e6-4875-bd0e-9de4d290b3a7_1758x342.png" width="1456" height="283" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dbbea978-32e6-4875-bd0e-9de4d290b3a7_1758x342.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:283,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:133000,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://magis.substack.com/i/172293903?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbea978-32e6-4875-bd0e-9de4d290b3a7_1758x342.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Py8p!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbea978-32e6-4875-bd0e-9de4d290b3a7_1758x342.png 424w, https://substackcdn.com/image/fetch/$s_!Py8p!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbea978-32e6-4875-bd0e-9de4d290b3a7_1758x342.png 848w, https://substackcdn.com/image/fetch/$s_!Py8p!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbea978-32e6-4875-bd0e-9de4d290b3a7_1758x342.png 1272w, https://substackcdn.com/image/fetch/$s_!Py8p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbea978-32e6-4875-bd0e-9de4d290b3a7_1758x342.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;b8f2ea82-6fd0-4b31-9679-749b8dda6a55&quot;,&quot;caption&quot;:&quot;Data businesses can be particularly valuable if they have moats that guard their product from replication. Such moats can come from legitimate technological innovation or business partnerships. The data moats worth criticizing are those built with anticompetitive government relationships: regulatory capture&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Unfair Data Moats &amp; Regulatory Capture&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5502194,&quot;name&quot;:&quot;Alex Izydorczyk&quot;,&quot;bio&quot;:&quot;I&#8217;m working on something new in Data/Data Science. Formerly I was the Head of Data Science at Coatue. https://alexizydorczyk.com/&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ad90856-1fb7-4609-b52a-46269d6a6fc2_3801x2534.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-09-22T19:52:51.709Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!SCrH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa9512cb-0767-4ac3-8b2d-5ece2e853502_1024x1024.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://magis.substack.com/p/unfair-data-moats-and-regulatory&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:149259103,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:6,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Magis&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!wYJp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c41f48-a7aa-4609-966a-ee62fc65f2e4_640x640.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>Tim Baker has chronicled these lawsuits and allegations on <a href="https://www.linkedin.com/posts/tim-baker-fintech-venturing_cusip-antitrust-cusip-activity-7296633258413338626-9ObE?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAAxqMAsBM1K8zkW7GRhYpiQqtEPLLdFcTxU">LinkedIn</a> and on <a href="https://www.waterstechnology.com/emerging-technologies/7951678/waters-wavelength-podcast-tim-baker-on-cusip-lawsuit-data-copyrights-and-innovation-in-market-data">this podcast</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>And there appears to be <a href="https://www.waterstechnology.com/data-management/7951952/regulators-recommend-figi-over-cusip-isin-for-reporting-in-fdta-proposal">consensus building</a> among regulators and industry standards organizations on this topic. In my opinion, Bloomberg&#8217;s data products generally fail to meet the criteria outlined here, but the group working on FIGI deserves credit. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;7f8aad06-82d5-45ed-bda3-53c2d7412a99&quot;,&quot;caption&quot;:&quot;On September 11, 1973, Augusto Pinochet seized power in Chile, ending Salvador Allende&#8217;s regime and the Project Cybersyn initiative. Pinochet&#8217;s rule is infamous for its human rights abuses and brutal repression, but the economic legacy is a separate, more complex story. Under his regime, the \&quot;Chicago Boys,\&quot; a group of market-oriented economists trained &#8230;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Lessons from Cybersyn&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5502194,&quot;name&quot;:&quot;Alex Izydorczyk&quot;,&quot;bio&quot;:&quot;I&#8217;m working on something new in Data/Data Science. Formerly I was the Head of Data Science at Coatue. https://alexizydorczyk.com/&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ad90856-1fb7-4609-b52a-46269d6a6fc2_3801x2534.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-12-23T14:03:33.840Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!7m1t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23df15b-5719-4af6-9967-5abb62ce98fa_2880x1519.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://magis.substack.com/p/lessons-from-cybersyn&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:153425761,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:29,&quot;comment_count&quot;:4,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Magis&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!wYJp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c41f48-a7aa-4609-966a-ee62fc65f2e4_640x640.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[AI for Hedge Funds Tracker]]></title><description><![CDATA[Completing my post about tracking AI startups for hedge funds]]></description><link>https://magis.substack.com/p/ai-for-hedge-funds-tracker</link><guid isPermaLink="false">https://magis.substack.com/p/ai-for-hedge-funds-tracker</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Sun, 22 Jun 2025 20:22:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!oPZj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9a8b16-1fc2-48d8-bcf3-17e4aa3fefb6_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A couple of weeks ago, I <a href="https://magis.substack.com/p/genai-for-hedge-funds-startups">posted</a> about startups I have been tracking that are building for hedge fund investors. Since then, I have received very helpful inbound pointing out startups I missed. I have since added many of these that I have missed. </p><p>I have centralized the list on my personal website<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, so that I can update it without having to republish posts here (and Substack does not have great support for tables). Click below to see the list:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://alexizydorczyk.com/ai-for-hedge-funds.html&quot;,&quot;text&quot;:&quot;Hedge Fund AI Startup List&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://alexizydorczyk.com/ai-for-hedge-funds.html"><span>Hedge Fund AI Startup List</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oPZj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9a8b16-1fc2-48d8-bcf3-17e4aa3fefb6_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oPZj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9a8b16-1fc2-48d8-bcf3-17e4aa3fefb6_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!oPZj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9a8b16-1fc2-48d8-bcf3-17e4aa3fefb6_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!oPZj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9a8b16-1fc2-48d8-bcf3-17e4aa3fefb6_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!oPZj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9a8b16-1fc2-48d8-bcf3-17e4aa3fefb6_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oPZj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9a8b16-1fc2-48d8-bcf3-17e4aa3fefb6_1536x1024.png" width="632" height="421.47802197802196" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c9a8b16-1fc2-48d8-bcf3-17e4aa3fefb6_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:632,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Generated image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Generated image" title="Generated image" srcset="https://substackcdn.com/image/fetch/$s_!oPZj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9a8b16-1fc2-48d8-bcf3-17e4aa3fefb6_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!oPZj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9a8b16-1fc2-48d8-bcf3-17e4aa3fefb6_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!oPZj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9a8b16-1fc2-48d8-bcf3-17e4aa3fefb6_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!oPZj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9a8b16-1fc2-48d8-bcf3-17e4aa3fefb6_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>My general criteria for including startups includes:</p><ul><li><p><strong>Focus</strong>: I avoid general-purpose, productivity types of tools that <em>could</em> be used by hedge funds. The exception is where the founders have a hedge fund background and have spent time on this specific use case. </p></li><li><p><strong>Startups</strong>: I avoid mature businesses that have launched AI-features. This is necessarily a judgement call, but the line has to be drawn somewhere (otherwise I would be listing Bloomberg, etc. which seems unhelpful). </p></li><li><p><strong>Adjacencies</strong>: I have occasionally included startups building for private equity, especially if there seems to be a natural leap towards public markets</p></li><li><p><strong>Institutional</strong>: I avoid businesses that are retail-investor focused, even if they could be conceivably be used by someone working at a hedge fund. The exception is where the startup has firm, published plans to target institutions as well. </p></li></ul><p></p><p>In the future, I plan to complete classifications around funds centered on</p><ul><li><p>Silicon Valley (&#128187;) vs. Wall Street (&#128200;) DNA: do the founders come from top-tier backgrounds on the buy-side or from the OpenAI mafia? In practice, the best startups will blend both, but in the near-term I have noticed distinct differences stemming from the founding teams&#8217; background. </p></li><li><p>Product Types: there are some repeating product modalities that are helpful to classify. For instance, there is revolving theme around building modified &#8220;deep research interfaces&#8221; or &#8220;Cursor for Excel Copilots&#8221;. There seems to be a few of these recurring common patterns as we all search for PMF. </p></li></ul><p>These are necessarily subjective judgement calls. As always, please feel free to share your feedback or send me links to startups I have missed.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><a href="https://alexizydorczyk.com/ai-for-hedge-funds.html">https://alexizydorczyk.com/ai-for-hedge-funds.html</a></p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[GenAI for Hedge Funds Startups]]></title><description><![CDATA[An incomplete survey of startups]]></description><link>https://magis.substack.com/p/genai-for-hedge-funds-startups</link><guid isPermaLink="false">https://magis.substack.com/p/genai-for-hedge-funds-startups</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Tue, 03 Jun 2025 12:15:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!dX16!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53689c0d-f483-4c42-be1c-7b089dfe7cf2_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I've been writing less frequently due to a new engagement. Lately, I've been focused on how hedge funds are using AI and LLMs&#8212;both to boost productivity and generate alpha. I've met with many of the startups in this space and developed strong views on what&#8217;s working and what isn&#8217;t. The range of founders is striking&#8212;from LLM PhDs to ex-hedge fund analysts learning to code on the job. No clear winner has emerged yet. At some point, I hope to share my own take. </p><p>In the meantime, I wanted to share a list of startups I have come across. This is a non-exhaustive list &#8212; if I have missed one, please message me:</p><div class="directMessage button" data-attrs="{&quot;userId&quot;:5502194,&quot;userName&quot;:&quot;Alex Izydorczyk&quot;,&quot;canDm&quot;:null,&quot;dmUpgradeOptions&quot;:null,&quot;isEditorNode&quot;:true}" data-component-name="DirectMessageToDOM"></div><p>Indeed, Cunningham&#8217;s Law states: </p><blockquote><p>the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer.</p></blockquote><p>With your help, I hope to compile a complete list and update this post. If you are an engineer and looking to work in this space, please also reach out to me<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>.</p><p>So with that in mind, I list out an incomplete list of startups building AI products for hedge funds, at least partially focused on public equities by my arbitrary judgement. I avoid already well-known vertical software providers adopting AI features.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dX16!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53689c0d-f483-4c42-be1c-7b089dfe7cf2_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dX16!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53689c0d-f483-4c42-be1c-7b089dfe7cf2_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!dX16!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53689c0d-f483-4c42-be1c-7b089dfe7cf2_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!dX16!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53689c0d-f483-4c42-be1c-7b089dfe7cf2_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!dX16!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53689c0d-f483-4c42-be1c-7b089dfe7cf2_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dX16!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53689c0d-f483-4c42-be1c-7b089dfe7cf2_1536x1024.png" width="637" height="424.8125" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/53689c0d-f483-4c42-be1c-7b089dfe7cf2_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:637,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Generated image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Generated image" title="Generated image" srcset="https://substackcdn.com/image/fetch/$s_!dX16!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53689c0d-f483-4c42-be1c-7b089dfe7cf2_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!dX16!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53689c0d-f483-4c42-be1c-7b089dfe7cf2_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!dX16!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53689c0d-f483-4c42-be1c-7b089dfe7cf2_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!dX16!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53689c0d-f483-4c42-be1c-7b089dfe7cf2_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><a href="https://aiera.com/">Aiera</a> </p></li><li><p><a href="https://alphawatch.ai/">AlphaWatch AI</a></p></li><li><p><a href="https://aq22.ai/">AQ22</a> </p></li><li><p><a href="https://brightwave.io/">Brightwave</a> </p></li><li><p><a href="https://www.usecurrent.ai/">Current</a> </p></li><li><p><a href="https://daloopa.com/">Daloopa</a> </p></li><li><p><a href="https://dili.ai/">Dili AI</a> </p></li><li><p><a href="https://diligentiq.com/">DiligentIQ</a> </p></li><li><p><a href="https://earningsedge.ai/">EarningsEdge.ai</a> </p></li><li><p><a href="https://endex.ai/">Endex</a> </p></li><li><p><a href="https://finchat.io/">FinChat</a> </p></li><li><p><a href="https://finpilot.ai/">Finpilot</a> </p></li><li><p><a href="https://finster.ai/">Finster AI</a> </p></li><li><p><a href="https://fintool.com/">Fintool</a></p></li><li><p><a href="https://firaresearch.com/">Fira</a> </p></li><li><p><a href="https://formulainsight.io/">Formula Insight</a> </p></li><li><p><a href="https://hebbia.com/">Hebbia</a> </p></li><li><p><a href="https://hudson-labs.com/">Hudson Labs</a> </p></li><li><p><a href="https://implied.com/">Implied</a> </p></li><li><p><a href="https://invesst.ai/">Invesst</a></p></li><li><p><a href="https://keye.co/">Keye</a></p></li><li><p><a href="https://linqalpha.com/">LinqAlpha</a> </p></li><li><p><a href="https://matterfact.com/">Matterfact</a> </p></li><li><p><a href="https://metal.ai/">Metal</a> </p></li><li><p><a href="https://getmidas.ai/">Midas AI</a></p></li><li><p><a href="https://nosible.ai/">Nosible</a></p></li><li><p><a href="https://octagonai.co/">Octagon AI</a> </p></li><li><p><a href="https://openbb.co/">OpenBB</a></p></li><li><p><a href="https://pascalailabs.com/">Pascal AI Labs</a> </p></li><li><p><a href="https://permutable.ai/">Permutable AI</a> </p></li><li><p><a href="https://plux.ai/">Plux</a> </p></li><li><p><a href="https://www.portraitanalytics.ai/">Portrait Analytics</a></p></li><li><p><a href="https://quantly-ai.com/">Quantly</a> </p></li><li><p><a href="https://quillai.com/">Quill AI</a> </p></li><li><p><a href="https://reflexivity.com/">Reflexivity</a> </p></li><li><p><a href="https://rogo.ai/">Rogo</a></p></li><li><p><a href="https://rowspace.ai/">Rowspace AI</a> </p></li><li><p><a href="https://samaya.ai/">Samaya AI</a> </p></li><li><p><a href="https://secinsights.ai/">SEC Insights</a> </p></li><li><p><a href="https://sixhq.ai/">Six AI (Six HQ)</a> </p></li><li><p><a href="https://www.stockinsights.ai/">StockInsights AI</a> </p></li><li><p><a href="https://structify.ai/">Structify</a> </p></li><li><p><a href="https://tenzingmemo.com/">Tenzing Memo</a> </p></li><li><p><a href="https://uptrends.ai/">Uptrends.ai </a></p></li><li><p><a href="https://valuesense.io/">Value Sense</a></p></li></ul><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work. </p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I am particularly hiring Sr Frontend Engineers, experienced in Typescript, but have roles for many technical profiles.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Lessons from Cybersyn]]></title><description><![CDATA[What I learned from running a DaaS startup for two years.]]></description><link>https://magis.substack.com/p/lessons-from-cybersyn</link><guid isPermaLink="false">https://magis.substack.com/p/lessons-from-cybersyn</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Mon, 23 Dec 2024 14:03:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7m1t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23df15b-5719-4af6-9967-5abb62ce98fa_2880x1519.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>On September 11, 1973, Augusto Pinochet seized power in Chile, ending Salvador Allende&#8217;s regime and the Project Cybersyn initiative. Pinochet&#8217;s rule is infamous for its human rights abuses and brutal repression, but the economic legacy is a separate, more complex story. Under his regime, the "Chicago Boys," a group of market-oriented economists trained at the University of Chicago, implemented controversial reforms that, despite their divisiveness, were largely successful and continue to shape Chilean politics today.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7m1t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23df15b-5719-4af6-9967-5abb62ce98fa_2880x1519.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7m1t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23df15b-5719-4af6-9967-5abb62ce98fa_2880x1519.png 424w, https://substackcdn.com/image/fetch/$s_!7m1t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23df15b-5719-4af6-9967-5abb62ce98fa_2880x1519.png 848w, https://substackcdn.com/image/fetch/$s_!7m1t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23df15b-5719-4af6-9967-5abb62ce98fa_2880x1519.png 1272w, https://substackcdn.com/image/fetch/$s_!7m1t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23df15b-5719-4af6-9967-5abb62ce98fa_2880x1519.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7m1t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23df15b-5719-4af6-9967-5abb62ce98fa_2880x1519.png" width="1456" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b23df15b-5719-4af6-9967-5abb62ce98fa_2880x1519.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image attribution: https://commons.wikimedia.org/wiki/File:CyberSyn-render-107.png&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image attribution: https://commons.wikimedia.org/wiki/File:CyberSyn-render-107.png" title="Image attribution: https://commons.wikimedia.org/wiki/File:CyberSyn-render-107.png" srcset="https://substackcdn.com/image/fetch/$s_!7m1t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23df15b-5719-4af6-9967-5abb62ce98fa_2880x1519.png 424w, https://substackcdn.com/image/fetch/$s_!7m1t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23df15b-5719-4af6-9967-5abb62ce98fa_2880x1519.png 848w, https://substackcdn.com/image/fetch/$s_!7m1t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23df15b-5719-4af6-9967-5abb62ce98fa_2880x1519.png 1272w, https://substackcdn.com/image/fetch/$s_!7m1t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb23df15b-5719-4af6-9967-5abb62ce98fa_2880x1519.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In a less dramatic turn, I recently shut down my data startup, Cybersyn. Named after the Chilean Project Cybersyn, our company shared the vision of using real-time data to measure the economy&#8212;though with a focus on shareholder value rather than Marxist ideals. Cybersyn will be a footnote in Snowflake&#8217;s history<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>, much like its namesake in Chile&#8217;s. However, for those interested in data businesses, I&#8217;ve captured my key learnings here for posterity.</p><p>When we started Cybersyn, we had a core thesis:</p><ul><li><p>Use of third-party data would grow, and there would be a &#8220;moneyball-ization&#8221; of more industries driving demand for novel third-party data. This would be similar to what I&#8217;d seen in discretionary asset management.</p></li><li><p>Underutilized consumer data could be licensed from data owners, providing them with new, high margin revenue, and derived into useful products.</p></li><li><p>Cybersyn could adopt a capital-intensive model for licensing data, then scale by leveraging network effects and operating leverage (no revenue shares). Snowflake&#8217;s financial support, reputation, and distribution would enable this strategy.</p></li><li><p>The &#8220;10x&#8221; innovative data product would combine transaction, point-of-sale, and clickstream data at the consumer level. This product would offer insights into consumer behavior, combining the best of what the industry calls syndicate and panel data.</p></li></ul><p>The core business model proved very difficult to execute. We learned that:</p><ul><li><p>On the customer side, outside of asset management, <em>insights</em> datasets are most valuable for strategic decision making at the largest of companies. It may seem that digital-native DTC brands would be early adopters, but in practice, the time and complexity to realize a return on data-driven insights does not justify large contract values. I did not appreciate just how much longer actions and decisions take in operating business versus investing. Activation or performance marketing datasets likely would be easier to sell while subscale.</p></li><li><p>Consequently, it was difficult to find nimble, early adopters for whom the ROI for better data would make sense but who could move very fast. While we made genuine innovation on the product, facing existing insights providers for large CPG manufacturers&#8217; business head-on was difficult without very broad data coverage.</p></li><li><p>On the supply side, large corporations (who have the most valuable data) were, in principle, interested in monetizing their data, and this trend was clearly accelerating. However, creating mutually beneficial deals proved very difficult. Data from any one company is narrow. We had to acquire multiple datasets and pay upfront, despite the time needed to generate meaningful insights (and therefore revenue). There was a tricky balance between convincing large organizations the opportunity of monetizing data was large, while simultaneously trying to negotiate a low initial price.</p></li><li><p>Alternatively, starting with commercially available proprietary data (and innovating on its processing) was far more affordable, but still challenging due to the time needed to build differentiated products, especially if the only distinction is in the derivative calculations. With already available data, the bar for building something differentiated was far higher.</p></li><li><p>In the end, the challenge with the Cybersyn business model could be summarized as having higher than expected capital-intensity, paired with slow and non-gradual offsetting revenue, due to the R&amp;D timelines needed to build the &#8220;10x&#8221; product.</p></li></ul><p>Beyond these difficulties, there were also exogenous factors:</p><ul><li><p>The public market began to value profits over growth. Snowflake was not exempt from this scrutiny and our status as a consolidated entity added constraints.</p></li><li><p>Marginal venture dollars sought direct AI companies, while becoming tighter more broadly.</p></li></ul><p>The above realizations led to the conclusions:</p><ul><li><p>The money we raised for Cybersyn was not enough to accomplish our vision. We could not buy enough additional data types (point-of-sale, clickstream) beyond transaction data, while still maintaining a long enough runway to ensure we had a chance to complete and market with the &#8220;10x&#8221; product.</p></li><li><p>We could not raise more money because of the change in financing conditions (we were not, squarely, an AI company), our unusual capital structure, and most importantly, financial profile.</p></li><li><p>Continuing to build best-in-class transaction data products seemed unlikely to lead to the scale of outcome that excited us.</p></li><li><p>The public domain data products became very popular, even to the point where there is likely a company to be built around that alone.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> Demand from AI inference use cases and the improving ability of LLMs to clean and structure data both supported this business. I remain unsure whether any moats will protect first-movers in this space, but there is significant demand here. I predict an explosion of low-cost new options data from LLM-based data structuring.</p></li><li><p>Cybersyn had the opportunity to return significant capital to our investors based on our financial position and by selling the assets of the public domain business to Snowflake. </p></li></ul><p>These realizations led us to shut down for the purpose of returning maximum capital. In another version of this story, a more reckless (or courageous) founder might have pushed ahead, spending the remaining capital on acquiring the necessary data, even with uncertain prospects for fast revenue growth. Snowflake, like other data technology companies, face tough capital allocation decisions in the new AI world. Personally, I still believe proprietary data content may still be a deserving strategic choice for companies.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>On a personal note, I owe immense gratitude to a large number of people. First and foremost, the Cybersyn team took the risk to go on this journey. I am most proud of the talent density we assembled  Second, this journey would not have been possible without Thomas Laffont, Christian Kleinerman, Mike Scarpelli, Lauren Reeder, and Mike Vernal. It was an immense privilege to work with Coatue, Sequoia, and Snowflake. Finally, a very large number of partners, customers, suppliers, and other investors were instrumental - I will not attempt a list for brevity and fear of omission, but you know who you are! Thank you. </p><p>I look forward to continuing to write and work on the topics in this blog, onto 2025!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vYzB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94485f2c-5a9c-47eb-bc3a-2cc8ee34f85b_1600x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vYzB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94485f2c-5a9c-47eb-bc3a-2cc8ee34f85b_1600x1200.png 424w, https://substackcdn.com/image/fetch/$s_!vYzB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94485f2c-5a9c-47eb-bc3a-2cc8ee34f85b_1600x1200.png 848w, https://substackcdn.com/image/fetch/$s_!vYzB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94485f2c-5a9c-47eb-bc3a-2cc8ee34f85b_1600x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!vYzB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94485f2c-5a9c-47eb-bc3a-2cc8ee34f85b_1600x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vYzB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94485f2c-5a9c-47eb-bc3a-2cc8ee34f85b_1600x1200.png" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94485f2c-5a9c-47eb-bc3a-2cc8ee34f85b_1600x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2768902,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vYzB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94485f2c-5a9c-47eb-bc3a-2cc8ee34f85b_1600x1200.png 424w, https://substackcdn.com/image/fetch/$s_!vYzB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94485f2c-5a9c-47eb-bc3a-2cc8ee34f85b_1600x1200.png 848w, https://substackcdn.com/image/fetch/$s_!vYzB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94485f2c-5a9c-47eb-bc3a-2cc8ee34f85b_1600x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!vYzB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94485f2c-5a9c-47eb-bc3a-2cc8ee34f85b_1600x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A part of the Cybersyn, celebrating the holidays, 2024.</figcaption></figure></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>See my synopsis of the Chilean Cybersyn story below, <em><a href="https://mitpress.mit.edu/9780262525961/cybernetic-revolutionaries/">Cybernetic Revolutions</a>, </em>or the new podcast series, <em><a href="https://open.spotify.com/show/7xlRxnooUnl48JVo726YXn">Santiago Boys</a></em></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;b504e1f0-5f41-4c40-9c3d-76936e693e1d&quot;,&quot;caption&quot;:&quot;In November 1970, Salvador Allende became president of Chile, after winning a narrow plurality in the September election and then being chosen by the Congress. Allende was running for Popular Unity, a left-wing coalition that included communists, socialists, and smaller radical parties. Its economic platform centered on a plan by Chilean economist Pedro&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Project Cybersyn&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5502194,&quot;name&quot;:&quot;Alex Izydorczyk&quot;,&quot;bio&quot;:&quot;I&#8217;m working on something new in Data/Data Science. Formerly I was the Head of Data Science at Coatue. https://alexizydorczyk.com/&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ad90856-1fb7-4609-b52a-46269d6a6fc2_3801x2534.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2022-05-24T20:41:42.055Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1ff7c915-18f4-4024-a37f-2b62eafd776e_2560x1350.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://magis.substack.com/p/project-cybersyn&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:56402961,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Magis&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c41f48-a7aa-4609-966a-ee62fc65f2e4_640x640.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>As most readers will be aware, Snowflake was Cybersyn&#8217;s largest backer, and we had a unique partnership with the company. See <a href="https://magis.substack.com/p/about-cybersyn-the-company">here</a> for the background. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>I published extensively on the topic of public data below. I think the rate at which LLMs can automate and assist in the core value propositions is understated. </p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0d2c5895-671a-4324-a436-689d06ff64b4&quot;,&quot;caption&quot;:&quot;Governments publish a vast variety of economic data at an impressive level of depth. In the United States, the Bureau of Labor Statistics alone publishes more than eight hundred thousand monthly time series, covering hundreds of geographic regions. Individual time series can track data as specific as the wages of workers in a specific-sized restaurant b&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Free as in freedom, not as in beer.&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5502194,&quot;name&quot;:&quot;Alex Izydorczyk&quot;,&quot;bio&quot;:&quot;I&#8217;m working on something new in Data/Data Science. Formerly I was the Head of Data Science at Coatue. https://alexizydorczyk.com/&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ad90856-1fb7-4609-b52a-46269d6a6fc2_3801x2534.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-02-14T23:23:10.767Z&quot;,&quot;cover_image&quot;:null,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://magis.substack.com/p/free-as-in-freedom-not-as-in-beer&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:102945027,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:3,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Magis&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c41f48-a7aa-4609-966a-ee62fc65f2e4_640x640.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;50738251-d086-4575-b138-e60185518318&quot;,&quot;caption&quot;:&quot;Governments and other organizations often publish &#8220;open&#8221; data. This data is often freely available, but the cost of effectively using it is often significant. In a previous essay, I described some of the initial challenges and contemplated solutions in operationalizing and distributing open data. The below is an extended list of additional specific chal&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Free as in freedom, not as in beer, Pt. 2&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5502194,&quot;name&quot;:&quot;Alex Izydorczyk&quot;,&quot;bio&quot;:&quot;I&#8217;m working on something new in Data/Data Science. Formerly I was the Head of Data Science at Coatue. https://alexizydorczyk.com/&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ad90856-1fb7-4609-b52a-46269d6a6fc2_3801x2534.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-10-01T21:04:32.771Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b030a0e-3a3d-499b-a830-00f56093d6c0_1024x1024.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://magis.substack.com/p/free-as-in-freedom-not-as-in-beer-ef2&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:137575592,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:4,&quot;comment_count&quot;:1,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Magis&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c41f48-a7aa-4609-966a-ee62fc65f2e4_640x640.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></div></div>]]></content:encoded></item><item><title><![CDATA[Cybersyn's Product Release]]></title><description><![CDATA[My startup's beta product for Consumer Spending data]]></description><link>https://magis.substack.com/p/cybersyns-product-release</link><guid isPermaLink="false">https://magis.substack.com/p/cybersyns-product-release</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Wed, 02 Oct 2024 15:12:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2fd8d5-abca-4aa3-b0e9-b968f5454b7c_781x483.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This blog is a personal commentary on the intersection of data, finance, and technology and not a Cybersyn (the startup I run) corporate blog. Once in a while , I cross-post announcements from Cybersyn</em> <em>when I think the content is relevant today and for posterity. Reproduced is a copy below. You can try the product <a href="https://trycurrent.cybersyn.com/">here</a>.</em></p><div><hr></div><p>Today we are releasing our first consumer spending application and updating our brand and content. We started Cybersyn two years ago to reinvent how the world measures the economy. Today&#8217;s release is the product of that effort.&nbsp;</p><p>Consumer Current<sup>TM</sup> Beta is a market intelligence tool, available as a data feed and / or web app, that estimates property-level consumer spending for over 8,000 U.S. merchants (and over 25,000 at the zip code level). Powered by anonymized and aggregated card transactions, Consumer Current delivers representative dollar and growth estimates. It also provides analytics like retention, consumer demographics, and sales breakdowns by channel and marketplace.</p><p>Consumer Current enables operators of consumer-facing companies to:</p><ul><li><p><strong>Discover Growing Brands</strong></p></li></ul><p><em>Our wide coverage of merchants allows discovery of even new companies. This ranges from up-and-coming DTC brands driven by social media to disruptive AI startups.&nbsp;</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MlOE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7dd1e9-67b6-46d5-b33f-1e4253decd3c_717x443.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MlOE!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7dd1e9-67b6-46d5-b33f-1e4253decd3c_717x443.gif 424w, https://substackcdn.com/image/fetch/$s_!MlOE!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7dd1e9-67b6-46d5-b33f-1e4253decd3c_717x443.gif 848w, https://substackcdn.com/image/fetch/$s_!MlOE!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7dd1e9-67b6-46d5-b33f-1e4253decd3c_717x443.gif 1272w, https://substackcdn.com/image/fetch/$s_!MlOE!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7dd1e9-67b6-46d5-b33f-1e4253decd3c_717x443.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MlOE!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7dd1e9-67b6-46d5-b33f-1e4253decd3c_717x443.gif" width="717" height="443" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2b7dd1e9-67b6-46d5-b33f-1e4253decd3c_717x443.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:443,&quot;width&quot;:717,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MlOE!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7dd1e9-67b6-46d5-b33f-1e4253decd3c_717x443.gif 424w, https://substackcdn.com/image/fetch/$s_!MlOE!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7dd1e9-67b6-46d5-b33f-1e4253decd3c_717x443.gif 848w, https://substackcdn.com/image/fetch/$s_!MlOE!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7dd1e9-67b6-46d5-b33f-1e4253decd3c_717x443.gif 1272w, https://substackcdn.com/image/fetch/$s_!MlOE!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7dd1e9-67b6-46d5-b33f-1e4253decd3c_717x443.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Benchmark against competitors at hyperlocal levels</strong></p><ul><li><p><em>It is one thing to know your national market share, but quite another to decompose that into trade area dynamics to understand whether performance is driven by competitors or macro consumer conditions&nbsp;</em></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gh3L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2fd8d5-abca-4aa3-b0e9-b968f5454b7c_781x483.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gh3L!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2fd8d5-abca-4aa3-b0e9-b968f5454b7c_781x483.gif 424w, https://substackcdn.com/image/fetch/$s_!Gh3L!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2fd8d5-abca-4aa3-b0e9-b968f5454b7c_781x483.gif 848w, https://substackcdn.com/image/fetch/$s_!Gh3L!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2fd8d5-abca-4aa3-b0e9-b968f5454b7c_781x483.gif 1272w, https://substackcdn.com/image/fetch/$s_!Gh3L!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2fd8d5-abca-4aa3-b0e9-b968f5454b7c_781x483.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gh3L!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2fd8d5-abca-4aa3-b0e9-b968f5454b7c_781x483.gif" width="781" height="483" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fa2fd8d5-abca-4aa3-b0e9-b968f5454b7c_781x483.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:483,&quot;width&quot;:781,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gh3L!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2fd8d5-abca-4aa3-b0e9-b968f5454b7c_781x483.gif 424w, https://substackcdn.com/image/fetch/$s_!Gh3L!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2fd8d5-abca-4aa3-b0e9-b968f5454b7c_781x483.gif 848w, https://substackcdn.com/image/fetch/$s_!Gh3L!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2fd8d5-abca-4aa3-b0e9-b968f5454b7c_781x483.gif 1272w, https://substackcdn.com/image/fetch/$s_!Gh3L!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2fd8d5-abca-4aa3-b0e9-b968f5454b7c_781x483.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><em>You can dig into store level benchmarks to understand micro level performance trends</em></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uwoX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd394d012-9016-48a5-af01-743aa1a1a6ab_760x470.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uwoX!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd394d012-9016-48a5-af01-743aa1a1a6ab_760x470.gif 424w, https://substackcdn.com/image/fetch/$s_!uwoX!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd394d012-9016-48a5-af01-743aa1a1a6ab_760x470.gif 848w, https://substackcdn.com/image/fetch/$s_!uwoX!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd394d012-9016-48a5-af01-743aa1a1a6ab_760x470.gif 1272w, https://substackcdn.com/image/fetch/$s_!uwoX!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd394d012-9016-48a5-af01-743aa1a1a6ab_760x470.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uwoX!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd394d012-9016-48a5-af01-743aa1a1a6ab_760x470.gif" width="760" height="470" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d394d012-9016-48a5-af01-743aa1a1a6ab_760x470.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:470,&quot;width&quot;:760,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uwoX!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd394d012-9016-48a5-af01-743aa1a1a6ab_760x470.gif 424w, https://substackcdn.com/image/fetch/$s_!uwoX!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd394d012-9016-48a5-af01-743aa1a1a6ab_760x470.gif 848w, https://substackcdn.com/image/fetch/$s_!uwoX!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd394d012-9016-48a5-af01-743aa1a1a6ab_760x470.gif 1272w, https://substackcdn.com/image/fetch/$s_!uwoX!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd394d012-9016-48a5-af01-743aa1a1a6ab_760x470.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Understand who your customers are behaviorally&nbsp;</strong></p></li></ul><p><em>Beyond traditional demographics, understanding where your customers shop outside your four walls &#8211; both with competitors but also in adjacent markets &#8211; is key to deciding strategy</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Qpsm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16b3a7c7-a4bc-4d4e-9722-7760c2aa72ae_759x469.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Qpsm!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16b3a7c7-a4bc-4d4e-9722-7760c2aa72ae_759x469.gif 424w, https://substackcdn.com/image/fetch/$s_!Qpsm!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16b3a7c7-a4bc-4d4e-9722-7760c2aa72ae_759x469.gif 848w, https://substackcdn.com/image/fetch/$s_!Qpsm!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16b3a7c7-a4bc-4d4e-9722-7760c2aa72ae_759x469.gif 1272w, https://substackcdn.com/image/fetch/$s_!Qpsm!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16b3a7c7-a4bc-4d4e-9722-7760c2aa72ae_759x469.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Qpsm!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16b3a7c7-a4bc-4d4e-9722-7760c2aa72ae_759x469.gif" width="759" height="469" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/16b3a7c7-a4bc-4d4e-9722-7760c2aa72ae_759x469.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:469,&quot;width&quot;:759,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Qpsm!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16b3a7c7-a4bc-4d4e-9722-7760c2aa72ae_759x469.gif 424w, https://substackcdn.com/image/fetch/$s_!Qpsm!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16b3a7c7-a4bc-4d4e-9722-7760c2aa72ae_759x469.gif 848w, https://substackcdn.com/image/fetch/$s_!Qpsm!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16b3a7c7-a4bc-4d4e-9722-7760c2aa72ae_759x469.gif 1272w, https://substackcdn.com/image/fetch/$s_!Qpsm!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16b3a7c7-a4bc-4d4e-9722-7760c2aa72ae_759x469.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You can try <a href="https://trycurrent.cybersyn.com/?utm_source=news&amp;utm_medium=website&amp;utm_campaign=current-launch">Consumer Current here</a>.</p><p><strong>What&#8217;s Different?</strong></p><p>Panel-based credit and debit card data has been commercially available for a while and hedge funds were among the first to use it. We borrow from that approach and my personal experience pioneering estimates at Coatue. Consumer Current stands out through:&nbsp;</p><ul><li><p><strong>Merchant Coverage: </strong>15k+ merchants in the web application and over 25k+ merchants in the data feed</p></li><li><p><strong>Granular &amp; Hyperlocal Estimates: </strong>Sales estimates and related analytics down to the property level.&nbsp;</p></li><li><p><strong>Fast Interface: </strong>Consumer Current enables users to interact with the data in real-time, with the speed and responsiveness you would expect from a consumer app.</p></li></ul><p><strong>Accuracy</strong></p><p>Accuracy is central to what we do. While Consumer Current is not perfect&#8212;hence its <em>Beta</em> status&#8212;we're committed to building a product that can be rigorously benchmarked against US Census economic data, company earnings, and private sales. We believe in translating panel data into population-level estimates, despite the added complexity. Our focus is on fast iteration and continuous improvement, driven by feedback and data from early customers.</p><p><strong>Licensing and Delivery</strong></p><p>We try to live up to what we consider are the best standards in data licensing. This includes transparent pricing, delivering data instantly via Snowflake, and enterprise-wide licenses. Cybersyn Current does not require any specific technical expertise to start using, but the most of the data can be made by accessing the underlying feed via Snowflake Marketplace.</p><p><strong>What else?</strong></p><p>As we release Consumer Current, we are also updating our branding to better represent all that we do.&nbsp;</p><p>We believe a necessary prerequisite to building the best proprietary market intelligence product is to first take advantage of all the public domain data that is freely available but not easily accessible. That&#8217;s why we are rebranding our public data products to Cybersyn Foundations, centralizing all of this public data into one place, with both free and paid tiers, to democratize and reduce costs to access this information via Snowflake Marketplace.&nbsp;</p><p>We have released an updated <a href="https://foundations.cybersyn.com/?utm_source=news&amp;utm_medium=website&amp;utm_campaign=current-launch">Data Catalog</a> that allows users to search by source and variable all of our public domain data products. Alongside refreshed <a href="https://docs.cybersyn.com/?utm_source=news&amp;utm_medium=website&amp;utm_campaign=current-launch">documentation</a>, this is a step forward in data discoverability.&nbsp;</p><p><strong>Where next?</strong></p><p>We&#8217;ve only begun to explore the potential of consumer payments data. Our roadmap includes expanding merchant coverage and enabling self-service custom analytics through Snowflake Native Applications. We also see the future in combining payments, point-of-sale, and clickstream data for a true omnichannel view&#8212;something the market lacks today.</p><p>As for AI and LLMs (what blog post today couldn&#8217;t mention them!), their most transformative role in market research will be automating insight discovery. Today, our platform answers your questions. <em>Tomorrow, it will tell you what to ask.</em></p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/p/cybersyns-product-release?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! You would do me and Cybersyn a huge favor by sharing this post.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/p/cybersyns-product-release?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://magis.substack.com/p/cybersyns-product-release?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Unfair Data Moats & Regulatory Capture]]></title><description><![CDATA[How data licensing and government data can go wrong]]></description><link>https://magis.substack.com/p/unfair-data-moats-and-regulatory</link><guid isPermaLink="false">https://magis.substack.com/p/unfair-data-moats-and-regulatory</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Sun, 22 Sep 2024 19:52:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!SCrH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa9512cb-0767-4ac3-8b2d-5ece2e853502_1024x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Data businesses can be particularly valuable if they have moats that guard their product from replication<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>. Such moats can come from legitimate technological innovation or business partnerships. The data moats worth criticizing are those built with anticompetitive government relationships: regulatory capture<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. There are two types of common unfair moats: ones based on improperly proprietary identifiers and ones based on outright unfair data access.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SCrH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa9512cb-0767-4ac3-8b2d-5ece2e853502_1024x1024.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SCrH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa9512cb-0767-4ac3-8b2d-5ece2e853502_1024x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SCrH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa9512cb-0767-4ac3-8b2d-5ece2e853502_1024x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SCrH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa9512cb-0767-4ac3-8b2d-5ece2e853502_1024x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SCrH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa9512cb-0767-4ac3-8b2d-5ece2e853502_1024x1024.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SCrH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa9512cb-0767-4ac3-8b2d-5ece2e853502_1024x1024.jpeg" width="382" height="382" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fa9512cb-0767-4ac3-8b2d-5ece2e853502_1024x1024.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:382,&quot;bytes&quot;:443192,&quot;alt&quot;:&quot;Early 20th century cartoon style uncle sam with his leg caught in a closed bear trap to represent regulatory capture. There should be a library, bookshelf, scrolls, or something representing datasets and knowledge in the background&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Early 20th century cartoon style uncle sam with his leg caught in a closed bear trap to represent regulatory capture. There should be a library, bookshelf, scrolls, or something representing datasets and knowledge in the background" title="Early 20th century cartoon style uncle sam with his leg caught in a closed bear trap to represent regulatory capture. There should be a library, bookshelf, scrolls, or something representing datasets and knowledge in the background" srcset="https://substackcdn.com/image/fetch/$s_!SCrH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa9512cb-0767-4ac3-8b2d-5ece2e853502_1024x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SCrH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa9512cb-0767-4ac3-8b2d-5ece2e853502_1024x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SCrH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa9512cb-0767-4ac3-8b2d-5ece2e853502_1024x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SCrH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa9512cb-0767-4ac3-8b2d-5ece2e853502_1024x1024.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Proprietary identifiers, also understood as data join keys<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>, often replace what should be open industry standards. Regulatory, tax, and statistical agencies often require the private sector to submit data and publish aggregate statistics for commercial, academic, and personal use. Various industries use agreed upon standard identifiers for companies, investable securities, and other entities that make the data easy to use. This presents an opportunity for regulatory capture. When a proprietary standard is mandated, the copyright holder can extract high fees.&nbsp;</p><p>One such example is CUSIP, an identifier used by financial market participants to identify companies and investable securities. The identifier is copyrighted and owned by the American Bankers Association and the operating company was recently bought by Factset<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>. Both use and distribution of CUSIP, even indirectly, requires a license. This is problematic because regulatory agencies such as FINRA, SEC, and the CFPB use the identifier in ostensibly public domain data releases. CUSIP benefits from an effective government mandate to do business with them<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>.&nbsp;</p><p>Another example is the DUNS number, which was used as a primary entity identifier for several government agencies, including the General Services Administration&#8217;s SAM database of government contractors. This regime required users and distributors of the data to license products from Dun &amp; Bradstreet. While obtaining a DUNS number was ostensibly free, it gave Dun &amp; Bradstreet a monopoly in entity validation services it provided to the government and it gave Dun &amp; Bradstreet an unfair (effectively mandated) advantage in collecting data on business entities other data providers did not have. Several states and international governments still require the DUNS number, and this is advertised on the DUNS website<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>.</p><p>A second regulatory capture model occurs when companies gain privileged access to government data that they monetize. Bill of ladings data are an example of such unevenly available data. Bill of ladings are government forms collected when goods are imported into the United States by sea, approximately equivalent to shipping labels. The data is commercially valuable because it can be used to research supply chains. While the data comes from the government (the Custom and Border Protection Agency, specifically), how and where to access the data is not clearly documented. Instead, several commercial entities obtain this data and sell it. The majority of customers are likely unaware of the exact source of this data. Freedom of information requests are apparently denied in relation to obtaining this data but certain companies are able to find the right contact and obtain the data for a fee<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a>. A similar situation exists with United Kingdom Gilt price data. UK gilt prices were previously calculated by the Debt Management Office (DMO). In 2016, the agency ran a RFP and accepted a proposal by FTSE/Tradeweb to take over calculating daily closing prices<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a>.&nbsp;</p><p>The mere fact that the government charges for certain data is not problematic. Government agencies incur real costs in procuring and analyzing data, and it makes sense to charge if the primary beneficiaries are only a subset of the private sector. Further, independently reviewed RFPs to grant a private company the right to process and publish such data - as happened in the UK Gilt case - are preferable to an entirely opaque process. However, I am skeptical that technocrats should ever select a single provider (even if stakeholders claim they prefer a single authoritative source today, such &#8216;benevolent&#8217; monopolies fail to anticipate changing circumstances and new stakeholders &#8211; for instance, the advent of AI/LLM users may well change the optimum)<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-9" href="#footnote-9" target="_self">9</a>.</p><p>There are reasons to be optimistic that at least certain cases of both regulatory moats can be eroded. Numerous financial regulatory agencies recently proposed moving away from CUSIP<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-10" href="#footnote-10" target="_self">10</a> to the FIGI<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-11" href="#footnote-11" target="_self">11</a> and are soliciting public comment<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-12" href="#footnote-12" target="_self">12</a>. SAM.Gov announced, two years ago, that it will move away from the DUNS number to its own, open, standard<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-13" href="#footnote-13" target="_self">13</a>. Further still, the recent litigation around CUSIP has led to questions about whether identifier numbers alone (as opposed to in their totality) can be copyrighted at all<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-14" href="#footnote-14" target="_self">14</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-15" href="#footnote-15" target="_self">15</a>. Similar examples exist in the case where the government produces expensive data to the private sector. For instance, the USPS began charging for their change-of-address database and Fannie Mae and Freddie Mac charged for commercial use and redistribution of their data. In each of those cases, there are multiple competing vendors<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-16" href="#footnote-16" target="_self">16</a>, transparency in the license agreement needed to access the data, and transparency in pricing. Any new data vendor can agree to the license and compete on data distribution and value-add.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>I also want to point out a few distinct potentially anti-competitive data licensing cases outside the scope of the above comments. There are data products where governments fall short in data integration, so commercial entities step in. This situation is only problematic when other businesses are not allowed the same raw data access. Competition is different from convenience &#8211; the mere requirement for high upfront capital expenditure does not make a market anticompetitive. For instance, CoreLogic, BlackKnight Financial, and Attom sell mortgage deed data they gather from county governments. They bear the cost of data standardization and integrating with each county government. In theory, this seems like it should be competitive. What would be problematic, however, would be if certain counties release data only to certain vendors or counties lack transparency in how competing vendors might participate (as in the Bill of Ladings case). A second case, not to be overlooked, is that data vendors may engage in traditional anti-competitive metrics, such as price collusion, that are not regulatory capture, strictly speaking. I do not cover such cases in this essay.&nbsp;</p><p>Cleaning, integrating, and distributing public domain data is a valuable commercial service that private sector data companies should be paid for but there will always be a temptation to build anti-competitive moats. That&#8217;s lazy. Data companies should compete on value-add on top of public data rather than attempting to be a tax on users. This serves the best interest of the private sector customer, the government, and, most importantly, the taxpayer.&nbsp;</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Many of the companies on my list of data companies have enduring moats. Counterpoint Global Research (ran by<a href="https://www.michaelmauboussin.com/about"> Michael Mauboussin</a>) has a <a href="https://x.com/punchcardinvest/status/1579039530325139456?s=20">great list of wide-moat businesses</a>, a surprising number of which are data businesses. </p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;bb181789-007a-404b-bfa6-69ccfa88b7a8&quot;,&quot;caption&quot;:&quot;The following does not represent and is not intended to be investment advice.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Some data on data companies&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5502194,&quot;name&quot;:&quot;Alex Izydorczyk&quot;,&quot;bio&quot;:&quot;I&#8217;m working on something new in Data/Data Science. Formerly I was the Head of Data Science at Coatue. https://alexizydorczyk.com/&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ad90856-1fb7-4609-b52a-46269d6a6fc2_3801x2534.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2022-10-03T00:41:42.499Z&quot;,&quot;cover_image&quot;:null,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://magis.substack.com/p/some-data-on-data-companies&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:76172861,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:11,&quot;comment_count&quot;:2,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Magis&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c41f48-a7aa-4609-966a-ee62fc65f2e4_640x640.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Credit goes to <a href="https://www.linkedin.com/in/tim-baker-fintech-venturing/">Tim Baker</a> and his LinkedIn posts that made me dive into this subject (in addition to just operating a company in this space). I highly recommend following him and some of his <a href="https://www.linkedin.com/pulse/cusip-anti-trust-case-2-updates-tim-baker-cfa/">posts</a> and <a href="https://www.linkedin.com/posts/tim-baker-fintech-venturing_lei-fdta-figi-activity-7226649207066030081-ryzA?utm_source=share&amp;utm_medium=member_desktop">comments</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>Auren Hoffman has a great explanation of exactly how valuable data join keys are <a href="https://www.safegraph.com/blog/data-standards-and-the-join-key">here</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>A quick summary on <a href="https://en.wikipedia.org/wiki/CUSIP">CUSIP from Wikipedia</a>. CUSIP Global Services was <a href="https://investor.factset.com/news-releases/news-release-details/factset-completes-acquisition-cusip-global-services">recently bought</a> by Factset.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>An <a href="https://www.risk.net/risk-management/7959517/as-legal-letters-fly-cusip-licensing-debate-rolls-on">example</a> of what happens if you <em>indirectly</em> receive their data. Although, <a href="https://www.waterstechnology.com/regulation/7936086/class-action-lawsuit-takes-aim-at-cusip-sp-factset-aba">legal fights</a> are emerging.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>Worth reading the <a href="https://www.dnb.com/duns/duns-number-and-government.html">DUNS website</a> on the GSA change.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p>See FOIA denial <a href="https://www.data-liberation-project.org/requests/cbp-bills-of-lading/">here</a>, for example, from the Data Liberation project.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><p>See here for the <a href="https://www.dmo.gov.uk/data/gilt-market/historical-prices-and-yields/">notice</a> that this was transitioned to FTSE/Tradeweb.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-9" href="#footnote-anchor-9" class="footnote-number" contenteditable="false" target="_self">9</a><div class="footnote-content"><p>One can read the <a href="https://www.dmo.gov.uk/media/p1wfss54/prfinalreport.pdf">full RFP review</a> and make your own decision if the outcome is desirable. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-10" href="#footnote-anchor-10" class="footnote-number" contenteditable="false" target="_self">10</a><div class="footnote-content"><p>A good summary of that proposal was <a href="https://www.fdic.gov/news/financial-institution-letters/2024/proposed-joint-rule-establishing-data-standards-under">issued by the FDIC</a>. The full explanation of the joint rule and methods for public comments, as a result of the Financial Data Transparency Act can be read <a href="https://www.fdic.gov/system/files/2024-07/fr-npr-on-financial-data-transparency-act.pdf">here</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-11" href="#footnote-anchor-11" class="footnote-number" contenteditable="false" target="_self">11</a><div class="footnote-content"><p><a href="https://www.openfigi.com/about/figi">FIGI</a> was originally developed by Bloomberg, but it has transitioned into an independent and open standard with permissive open licensing. While all open source projects have risk when primarily developed by a single, well resourced, commercial developer, this is still the <a href="https://www.linkedin.com/posts/tim-baker-fintech-venturing_regulators-recommend-figi-over-cusip-isin-activity-7227061374256910336-QRhx?utm_source=share&amp;utm_medium=member_desktop">best open standard</a> that exists to my knowledge. Other standards, such as <a href="https://www.gleif.org/en/about/this-is-gleif">LEI</a> or <a href="https://permid.org/">PermID</a> (operated by a Bloomberg competitor), are also viable.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-12" href="#footnote-anchor-12" class="footnote-number" contenteditable="false" target="_self">12</a><div class="footnote-content"><p>I will leave it to the reader to decide if the <a href="https://www.linkedin.com/posts/tim-baker-fintech-venturing_aba-response-to-fdta-proposal-activity-7238621251311738880-sYw0?utm_source=share&amp;utm_medium=member_desktop">ABA and CUSIP&#8217;s response public comment</a> sounds like someone who is <em>definitely not benefiting from an unfair monopoly.</em></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-13" href="#footnote-anchor-13" class="footnote-number" contenteditable="false" target="_self">13</a><div class="footnote-content"><p>SAM.gov <a href="https://www.fsd.gov/gsafsd_sp?id=kb_article_view&amp;sysparm_article=KB0045975">notice</a> to move away from DUNS and a good summary of the logic <a href="https://www.gsa.gov/about-us/organization/federal-acquisition-service/integrated-award-environment-iae/iae-systems-information-kit/unique-entity-identifier-update#:~:text=On%20April%204%2C%202022%2C%20the,website%20to%20obtain%20their%20identifier.">here</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-14" href="#footnote-anchor-14" class="footnote-number" contenteditable="false" target="_self">14</a><div class="footnote-content"><p><a href="https://www.linkedin.com/in/tim-baker-fintech-venturing/">Tim Baker</a> summarizes this well in his <a href="https://www.linkedin.com/pulse/cusip-anti-trust-case-2-updates-tim-baker-cfa/">Linkedin post</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-15" href="#footnote-anchor-15" class="footnote-number" contenteditable="false" target="_self">15</a><div class="footnote-content"><p>Worth noting that the EU <a href="https://a-teaminsight.com/blog/european-commission-finally-releases-sp-isin-fees-judgement-considers-vendor-to-be-abusing-its-dominant-position/?brand=rti">took anticompetitive legal action</a> against CUSIP Global Service&#8217;s previous own, S&amp;P, previously &#8212; although, this was around the specifics of issuing a related identifier, ISIN, rather than CUSIP, specifically. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-16" href="#footnote-anchor-16" class="footnote-number" contenteditable="false" target="_self">16</a><div class="footnote-content"><p>For instance, <a href="https://postalpro.usps.com/ncoalink/Full_Service_Provider_Licensees">here is every vendor</a> with full access to USPS COA data. And <a href="https://capitalmarkets.fanniemae.com/authorized-redistributors-fannie-mae-research-data">here is the same</a> from Fannie Mae, along with the standard data redistribution agreement.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Data Marketplaces in a LLM World]]></title><description><![CDATA[And Some Draft Ideas]]></description><link>https://magis.substack.com/p/data-marketplaces-in-a-llm-world</link><guid isPermaLink="false">https://magis.substack.com/p/data-marketplaces-in-a-llm-world</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Sun, 25 Aug 2024 14:02:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rQXY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25c59dfe-267f-4936-b051-7b800974dca6_1024x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A perennial idea among external data consumers is a data marketplace. There are many large specialized data vendors, and thousands of small ones, each with often hundreds of datasets &#8211; so why would there not be a central marketplace to discover and buy them?&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rQXY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25c59dfe-267f-4936-b051-7b800974dca6_1024x1024.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rQXY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25c59dfe-267f-4936-b051-7b800974dca6_1024x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rQXY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25c59dfe-267f-4936-b051-7b800974dca6_1024x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rQXY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25c59dfe-267f-4936-b051-7b800974dca6_1024x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rQXY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25c59dfe-267f-4936-b051-7b800974dca6_1024x1024.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rQXY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25c59dfe-267f-4936-b051-7b800974dca6_1024x1024.jpeg" width="384" height="384" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25c59dfe-267f-4936-b051-7b800974dca6_1024x1024.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:384,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;minimalist line art of a data market filled with robots&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="minimalist line art of a data market filled with robots" title="minimalist line art of a data market filled with robots" srcset="https://substackcdn.com/image/fetch/$s_!rQXY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25c59dfe-267f-4936-b051-7b800974dca6_1024x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rQXY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25c59dfe-267f-4936-b051-7b800974dca6_1024x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rQXY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25c59dfe-267f-4936-b051-7b800974dca6_1024x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rQXY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25c59dfe-267f-4936-b051-7b800974dca6_1024x1024.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Have any attempts at data marketplaces succeeded at large scale? I think the answer is <em>sort of</em>. There have been successful pseudo-marketplaces whereby a single buyer takes control of the data and integrates it into a single product. The most successful of these has undoubtedly been the Bloomberg Terminal. Further, some marketplaces have been successful for narrow use cases, especially where there is a single data format or join key being sold. For example, advertising activation marketplaces for audiences have done well.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>There has also been relative success at distributing free data in marketplace form. For example, <a href="https://fred.stlouisfed.org/">FRED</a> or <a href="https://data.gov/">Data.gov</a> - government attempts to catalog public data - have been successful judged by the downloads these repositories drive relative to the individual sources. Non-government open data repositories, such as Kaggle, have built large user bases (and website traffic estimates to <a href="http://kaggle.com/datasets">kaggle.com/datasets</a> indicate this is the most popular part of the website). In both cases, these serve an aggregation function while avoiding traditional data marketplace challenges because the data is public domain.&nbsp;</p><p>A new wave of data marketplaces, funded by database infrastructure providers such as Snowflake, AWS, Databricks, and Google have also recently been released. Success is still an open question<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>, although they clearly have a core advantage in that they tie the data buying and transfer to the same place that analysis occurs.&nbsp;&nbsp;</p><p>A few common, but recurring, issues I have observed among many attempts include:</p><ul><li><p><strong>Lack of neutrality</strong>: There have been attempts made by large data sellers to build their own data marketplaces. These DaaS companies can offer distribution to their already large customer base. Simultaneously, there could be some sort of efficiency or delivery benefit to the customer, given that the customer already has a MSA with the DaaS incumbent. In practice, the distribution advantage only works if the barrier for existing customers to adopt the new dataset is very low. For example, Bloomberg&#8217;s Terminal constantly makes new datasets available to end users without much action needed on the end users&#8217; part &#8211; so data sellers get distribution without additional legal work. This model struggles in cases where the DaaS incumbent has a large data business that conflicts with marketplace participants: naturally, Factset is never going to list their data on S&amp;P&#8217;s marketplace and most asset managers are going to use some Factset and some S&amp;P content &#8211; so neither party can build a complete product.</p></li><li><p><strong>Discoverability</strong>: Finding the right dataset to answer a business question is difficult. Good metadata search is essential. There may also be questions an analyst does not know they <em>could</em> be asking because they do not know such data might exist.&nbsp;&nbsp;Conviction, a venture firm, has a <a href="https://www.conviction.com/startups#manageable-metadata">good write-up</a> about why this problem may well be solved with LLMs.</p></li><li><p><strong>Licensing and Monetization</strong>: The process of buying data varies by industry, but involves a general contracting process similar to enterprise software but also a data specific compliance process. Certain industries have developed standards for this, but the majority of marketplaces have not meaningfully solved the pain point of transacting. As with other failed marketplace categories, often the data provider and consumer find it easier (and cheaper) to take the transaction offline.&nbsp;&nbsp;</p></li></ul><p>A fair question to ask is whether the advent of AI creates opportunities for either solving some of the above challenges or solving new challenges with data marketplaces. A few ideas I have recently come across include:</p><ul><li><p><strong>AI Inference</strong>: I am skeptical that selling data without strong marginal-temporal value for training AIs is a good business. Selling data for AI use at inference time, however, may be a good business. It is possible that AI use cases will present new opportunities for data sales that may be better sold with usage or consumption based systems rather than traditional bulk licensing. Eric Schmidt made a similar prediction recently: that data usage for AIs will look like music royalties<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>. Furthermore, selling embeddings of datasets, rather than the raw datasets themselves, may well be a value-add to many LLM software providers.&nbsp;</p></li><li><p><strong>Legal Indemnification</strong>: A key friction in data acquisition and sales is that regulatory and compliance burden placed on the buyer to determine whether they can actually buy and use the data on offer. The increasing amount of regulations (CCPA in California, look-alike laws in Virginia, Maine, and others, GDPR in Europe) and the challenge in understanding these laws means that it is particularly costly to stay compliant.</p><ul><li><p>Traditional marketplaces absolve themselves of liability for their sellers&#8217; products for the obvious reason that policing and verifying each seller's products would add a tremendous cost and potential liability to the marketplace. This works fine in areas where liabilities are not a concern for buyers or where the marketplace platform can enforce very heavy technical standards on providers. In data sales,&nbsp;</p></li><li><p>Note that this is very different from just providing &#8216;standard&#8217; terms or suggestions as to what disclosures data providers need to make &#8211; such suggestions are relatively low value-add&nbsp;</p></li><li><p>If data buyers could be assured that a given dataset was legal and compliant for their use case, the entity providing that assurance - from a third party perspective - would add a lot of value to the selling process, easily enough to justify a significant take rate. The arrangement would work well for data sellers as well, as the added confidence would surely increase the liquidity of the market.&nbsp;</p></li><li><p>Potential data sellers may also be reluctant to enter the business because of a lack of expertise on whether their exhaust data could be compliantly monetized.&nbsp;</p></li></ul></li><li><p><strong>Bundling</strong>: I predict that the market for data bundles will grow, as more companies will have increased capacity to process unstructured information. The marginal value of any <em>specific</em> one dataset may not increase, but the need for economical consumption of a very large variety of datasets will increase. Many of these use cases will initially be experimental and much of the value add may well be marginal. As such, bundles which provide a wide range of data.&nbsp;</p><ul><li><p>Shishir Mehrotra <a href="https://coda.io/@shishir/four-myths-of-bundling">lays out</a> a great economic explanation for this &#8211; bundles are a good deal for both consumers and suppliers when the bundle is constructed such that the number of &#8216;<em>casual fans</em>&#8217; increases<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>. I think this is exactly the dynamic that will play out with data to be used for AI inference.&nbsp;</p></li><li><p>For example, today there exists a well defined market for earnings transcripts among asset managers. Firms like S&amp;P and Factset charge for transcribed earnings transcripts calls. Asset managers rely on this information for analysis and trading signals &#8211; the demand for this dataset among this customer base is relatively elastic: a price of 50K vs. 250K does not really matter, so long as the data is perfectly accurate. The cost of an asset manager not having this data would be much higher.</p></li><li><p>Earnings transcripts could also be used by non-financial firms for understanding and identifying new selling opportunities, key supplier risks, and key competitor plans. Startups might also have new ideas that rely on this type of unstructured data. Traditionally the market for using this data has been very small because (a) the processing and structuring cost for transcripts did not justify the effort (b) transcripts data was expensive. With LLMs, the effort to process and synthesize the data is low, so if the price could be made acceptable, there could be a market. The marginal value of this data may initially be low, because the use cases are new and the data is almost substitutable with news stories, SEC filings, press releases, and so on &#8211; but the there now exists a clear need for <em>some</em> of this data and if a data provider or marketplace benefits from economies of scale in collecting it, it makes sense to offer such data at a much lower cost in a bundle. In Shishir&#8217;s terminology, the number if <em>casual fans</em> for this data has massively increased.&nbsp;</p></li></ul></li></ul><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>LiveRamp and TradeDesk&#8217;s marketplace, in particular, are examples of successful execution.&nbsp;</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>As far as I know, none of these companies have released any kind of comprehensive metrics on these marketplaces.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>Unfortunately, the <a href="https://www.youtube.com/watch?app=desktop&amp;v=T_JKIkSf93Y">video</a> appears taken down. I will post a new link if it ever becomes available.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p><a href="https://coda.io/@shishir/four-myths-of-bundling">Four Myths of Bundling by Shishir Mehrotra</a></p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[How Data Markets Fail]]></title><description><![CDATA[Or, why you can't buy certain datasets]]></description><link>https://magis.substack.com/p/how-data-markets-fail</link><guid isPermaLink="false">https://magis.substack.com/p/how-data-markets-fail</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Sun, 07 Jul 2024 20:56:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!G1we!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e458cb1-07e0-4615-bbd5-bdacb614fccc_1024x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this essay, I reflect on some of the practical challenges of data monetization and relate them to theoretical concepts in <em>information economics<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></em>.<em> </em>I then point out how these theoretical failures are real  and result in obviously useful data &#8211; MLS listings or B2B transactions for example &#8211; not being widely available. I explain two particular market failure conditions resulting from the non-exclusionary nature of data and from the cold-start problem data aggregators face. By sharing these observations, I hope to solicit solutions for solving these challenges.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G1we!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e458cb1-07e0-4615-bbd5-bdacb614fccc_1024x1024.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G1we!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e458cb1-07e0-4615-bbd5-bdacb614fccc_1024x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!G1we!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e458cb1-07e0-4615-bbd5-bdacb614fccc_1024x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!G1we!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e458cb1-07e0-4615-bbd5-bdacb614fccc_1024x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!G1we!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e458cb1-07e0-4615-bbd5-bdacb614fccc_1024x1024.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G1we!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e458cb1-07e0-4615-bbd5-bdacb614fccc_1024x1024.jpeg" width="472" height="472" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e458cb1-07e0-4615-bbd5-bdacb614fccc_1024x1024.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:472,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;A scene of an eerily empty summer farmer's market in a soft, watercolor style. The stalls should appear almost empty and the market abandoned. The crates that should be empty of produce. There should be some broken bottles on the ground. There should be a subtle, nuanced change in colors using the impasto technique with gouache paint. There should be a light wash to the watercolor painting with minor texture from the watercolor paper and a soft, pastel lavender background. &quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A scene of an eerily empty summer farmer's market in a soft, watercolor style. The stalls should appear almost empty and the market abandoned. The crates that should be empty of produce. There should be some broken bottles on the ground. There should be a subtle, nuanced change in colors using the impasto technique with gouache paint. There should be a light wash to the watercolor painting with minor texture from the watercolor paper and a soft, pastel lavender background. " title="A scene of an eerily empty summer farmer's market in a soft, watercolor style. The stalls should appear almost empty and the market abandoned. The crates that should be empty of produce. There should be some broken bottles on the ground. There should be a subtle, nuanced change in colors using the impasto technique with gouache paint. There should be a light wash to the watercolor painting with minor texture from the watercolor paper and a soft, pastel lavender background. " srcset="https://substackcdn.com/image/fetch/$s_!G1we!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e458cb1-07e0-4615-bbd5-bdacb614fccc_1024x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!G1we!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e458cb1-07e0-4615-bbd5-bdacb614fccc_1024x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!G1we!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e458cb1-07e0-4615-bbd5-bdacb614fccc_1024x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!G1we!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e458cb1-07e0-4615-bbd5-bdacb614fccc_1024x1024.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Economists refer to data as non-excludable, because once it is sold once, infinite copies can be made, thereby depriving the data seller from charging for additional copies. In practice, measures are taken to enforce exclusion. Data markets fail when the exclusion measures taken end up being not effective enough or too restrictive. These measures can be contractual or technical. Certain attributes of data - such as marginal temporal value - can cause data to expire and so also assist in creating exclusion. Contractual measures can be as simple as requiring data consumers to sign a licensing agreement. These generally work well when the data consumers are vetted, reputable, and unlikely to want to engage in legal disputes. Technical measures, such as data clean-rooms, aggregation constraints, digital watermarks, and so on can often be effective. Technical measures always involve a trade-off between flexibility (thereby value to the consumer) and protection (thereby insurance for the seller). If taken too far, both contractual and technical measures can entirely prevent the market, or part of a market from using a given dataset.&nbsp;&nbsp;</p><p>For example, data from multiple listing services (MLS)<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> is notoriously expensive and difficult to license. MLS&#8217; serve as cooperatives among realtors in specific geographic areas. They are highly fragmented and require extremely specific compliance rules and circumstances to license data<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>. This data model makes sense for industry participants &#8211; data is highly localized and there is a real risk that if the data were readily available, unsanctioned competitors could create competing listing or realty services without participating. Therefore, MLS data is extremely tightly controlled and perhaps intentionally fragmented. These measures (mostly) protect the dominance of the National Association of Realtors and realtor licensing system<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>. On the other hand, it creates a market failure in that noncompetitive consumers at a lower price in the demand curve are left unserved. These consumers are not able to pay hundreds of thousands of dollars for access and do not fit into prescribed acceptable use cases, but they have noncompetitive use cases that have nothing to do directly with realtors or running competing listing websites<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>. The lack of an effective technology to price discriminate by use case, means that data remain practically unavailable to certain portions of the market (economists would call this deadweight loss). The severity of this problem depends on how you size the noncompetitive market for real estate data.&nbsp;&nbsp;</p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:89883914,&quot;url&quot;:&quot;https://blog.aaronkardell.com/p/your-guide-to-licensing-listing-data&quot;,&quot;publication_id&quot;:1155326,&quot;publication_name&quot;:&quot;Aaron Kardell's Blog&quot;,&quot;publication_logo_url&quot;:null,&quot;title&quot;:&quot;Your Guide to Licensing Listing Data for Your Residential Real Estate Tech Startup&quot;,&quot;truncated_body_text&quot;:&quot;Hello! This Sunday newsletter explores startups, short-term rentals, or whatever random thing has entered my mind this last week. I pick one topic weekly to go deep on and have some disparate quick hits at the end.&quot;,&quot;date&quot;:&quot;2022-12-11T22:14:32.872Z&quot;,&quot;like_count&quot;:2,&quot;comment_count&quot;:0,&quot;bylines&quot;:[{&quot;id&quot;:60879195,&quot;name&quot;:&quot;Aaron Kardell&quot;,&quot;handle&quot;:&quot;givethanks100&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/b3b91c32-ff19-48cc-bcae-92326f2797dd_400x400.jpeg&quot;,&quot;bio&quot;:&quot;Husband. Father. Founder &amp; CEO @HomeSpotter; now working to simplify real estate w/ our acquirer @GetLWolf. Striving to act justly, love mercy, and walk humbly.&quot;,&quot;profile_set_up_at&quot;:&quot;2022-12-17T02:00:34.695Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:523958,&quot;user_id&quot;:60879195,&quot;publication_id&quot;:592114,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:592114,&quot;name&quot;:&quot;HomeSpotter #givethanks100&quot;,&quot;subdomain&quot;:&quot;homespottergivethanks100&quot;,&quot;custom_domain&quot;:&quot;homespotter.givethanks100.com&quot;,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Giving thanks to 100+ individuals that positively impacted HomeSpotter&quot;,&quot;logo_url&quot;:null,&quot;author_id&quot;:60879195,&quot;theme_var_background_pop&quot;:&quot;#FF6B00&quot;,&quot;created_at&quot;:&quot;2021-12-03T04:23:14.105Z&quot;,&quot;rss_website_url&quot;:null,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Aaron Kardell&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;is_personal_mode&quot;:false}},{&quot;id&quot;:1107608,&quot;user_id&quot;:60879195,&quot;publication_id&quot;:1155326,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:1155326,&quot;name&quot;:&quot;Aaron Kardell's Blog&quot;,&quot;subdomain&quot;:&quot;aaronkardell&quot;,&quot;custom_domain&quot;:&quot;blog.aaronkardell.com&quot;,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;All my random thoughts in one place, served up weekly. Sometimes I write about startups, parenting, or short term rentals.&quot;,&quot;logo_url&quot;:null,&quot;author_id&quot;:60879195,&quot;theme_var_background_pop&quot;:&quot;#9A6600&quot;,&quot;created_at&quot;:&quot;2022-10-23T18:24:11.082Z&quot;,&quot;rss_website_url&quot;:null,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Aaron Kardell&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;is_personal_mode&quot;:false}}],&quot;twitter_screen_name&quot;:&quot;akardell&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:false,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://blog.aaronkardell.com/p/your-guide-to-licensing-listing-data?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><span></span><span class="embedded-post-publication-name">Aaron Kardell's Blog</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">Your Guide to Licensing Listing Data for Your Residential Real Estate Tech Startup</div></div><div class="embedded-post-body">Hello! This Sunday newsletter explores startups, short-term rentals, or whatever random thing has entered my mind this last week. I pick one topic weekly to go deep on and have some disparate quick hits at the end&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">3 years ago &#183; 2 likes &#183; Aaron Kardell</div></a></div><p>A second market failure occurs when owners of data have insufficient incentive to share it in any one transaction, and no aggregator or coordinator exists. This failure is akin to a lack of market makers in financial markets. This situation occurs when owners&#8217; data is only valuable if combined with many other owners&#8217; data but not so valuable alone. It leads to a &#8220;cold start&#8221; coordination problem wherein individual owners ask for too high a price relative to what individual data consumers can bid, but the collective demand of a group of consumers for a collection of owners&#8217; data would have a clearing price. To solve this, a data aggregator must incur the cost of acquiring data from each source. Doing so is expensive and logistically difficult because it must be done nearly simultaneously across owners to minimize the time the aggregator licenses some data but not yet enough to have a viable product to market. An analogous problem sometimes occurs simultaneously when the owners&#8217; data requires a large amount of cleaning, modeling, or transformation to be useful. In such cases, the R&amp;D cost may eclipse the ability of any one data consumer to incur, but the buying power of all consumers combined would suffice.&nbsp;</p><p>Data aggregators, therefore, have two useful roles to play: one to consolidate raw data and a second to develop data products. This is quite different from data marketplaces, data catalogs, or data brokers.&nbsp; Beyond coordination, data aggregators also incur <em>duration risk</em>, by which I mean they front capital data owners today, in exchange for data product revenue in the future.&nbsp;</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Please subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Revenue share agreements are often proposed as an equitable solution to eliminate <em>duration risk</em>. Such agreements involve aggregators paying owners a percentage of data product revenues. In practice, revenue shares are far from perfect solutions for both parties. For aggregators, revenue shares limit the amount of data combination that can be done by permanently impairing margins. They can be particularly detrimental to unlocking full data product potential if they are structured as an absolute percentage of sales &#8211; eventually, only so many datasets can be combined before the endeavor becomes unprofitable. This leads to a less compelling product (which is bad for everyone). Also, for data owners specifically, revenue-shares may not cross a greater-than zero dollar profit hurdle short-term if a significant amount of R&amp;D work (therefore time) is required to make a useful product. In theory, the perfectly rational, profit-maximizing company would agree to any incremental marginal profit from monetizing their data. In practice, companies have dollar profit hurdles much larger than zero to overcome. These hurdles are due to either institutional inertia or due to the real but difficult-to-quantify expected value costs of data monetization (legal risk, press risk, employee distraction, etc.). Revenue shares are therefore not panacea solution to these challenges.</p><p>A combination of these unhappy circumstances lead to obviously marketable data being unavailable. For example, B2B transaction data is <em>particularly </em>fragmented, its owners assign a <em>particularly</em> high hurdle value, and it requires a<em> particularly</em> heavy amount of modeling to make useful. It is perhaps not surprising then that it is largely not an available product today.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/p/how-data-markets-fail/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://magis.substack.com/p/how-data-markets-fail/comments"><span>Leave a comment</span></a></p><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><a href="https://en.wikipedia.org/wiki/Information_economics">https://en.wikipedia.org/wiki/Information_economics</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p><a href="https://en.wikipedia.org/wiki/Multiple_listing_service">What is a MLS?</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>API consolidation based on <a href="https://www.reso.org/">RESO</a> or <a href="https://www.bridgeinteractive.com/developers/zillow-group-data/">Zillow&#8217;s Bridg</a>e are not what they appear at first glance &#8212; you still need to license with each MLS</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>You may ask if this is anticompetitive and there have been legal cases about it (for example, <a href="https://www.justice.gov/atr/case/us-v-national-association-realtors">here</a>) but these cases have largely focused on sharing MLS listings between different types of realtors or brokerages.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>You could imagine a range of use cases for the data &#8212; such as investment or economic analysis, property tech startups, etc. that would not really undermine the purpose of protecting participating realtors and brokerages (regardless of your view of the desirability of the protection intent).</p></div></div>]]></content:encoded></item><item><title><![CDATA[Data Dividends]]></title><description><![CDATA[Is it different this time? A perennially challenging idea with a glimmer of hope.]]></description><link>https://magis.substack.com/p/data-dividends</link><guid isPermaLink="false">https://magis.substack.com/p/data-dividends</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Wed, 29 May 2024 17:49:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!kXQw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F747ea86d-9de7-41ed-a3df-550364bfbab4_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Market research and targeted advertising requires consumer data, specifically purchase and digital behavior records. Data can be collected actively or passively. Active collection requires consumers to take surveys, submit records, or make some other effort to contribute data. Consumers are required to participate by law, volunteer, or are paid. Collected data can be high quality, but obvious downsides include noncompliance, incomplete data, time required on the part of participants, and effort required to recruit participants<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>. These disadvantages lead to relatively small samples and infrequent collection. Passive data collection does not require effort from the consumer, rather data is collected continuously in the background by an app, bank, or service the consumer is using. The comparative advantage to active collection is that the collected data is more complete, less biased, and more consumers can be reached. Privacy laws and consumer opinion dictates that passive data collection needs consumer opt-in. Consumer opt-in can be obtained in exchange for free products and services (&#8220;in-kind&#8221;) or for explicit payment (&#8220;data dividends&#8221;). The former is well understood - companies like Meta offer their products for free, in exchange for collecting data from their user base that can be used by or sold to data users. The latter is less common but has been the frequent subject of speculation. Proponents of data dividends note that they combine the best attributes of active data collection (very explicit opt-in) with the best attributes of passive data collection (low effort for the consumer).&nbsp;&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kXQw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F747ea86d-9de7-41ed-a3df-550364bfbab4_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kXQw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F747ea86d-9de7-41ed-a3df-550364bfbab4_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!kXQw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F747ea86d-9de7-41ed-a3df-550364bfbab4_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!kXQw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F747ea86d-9de7-41ed-a3df-550364bfbab4_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!kXQw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F747ea86d-9de7-41ed-a3df-550364bfbab4_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kXQw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F747ea86d-9de7-41ed-a3df-550364bfbab4_1024x1024.png" width="280" height="280" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/747ea86d-9de7-41ed-a3df-550364bfbab4_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:280,&quot;bytes&quot;:986390,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kXQw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F747ea86d-9de7-41ed-a3df-550364bfbab4_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!kXQw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F747ea86d-9de7-41ed-a3df-550364bfbab4_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!kXQw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F747ea86d-9de7-41ed-a3df-550364bfbab4_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!kXQw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F747ea86d-9de7-41ed-a3df-550364bfbab4_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To my knowledge, no data dividend funded dataset exists of comparable scale to in-kind funded datasets. Despite limitations introduced by privacy laws, changes by corporations in response to media attention<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>, multiple startup efforts, and emerging interest driven by the crypto community, data dividend datasets have not become common.&nbsp;</p><p>My hypothesis is that the unit economics of data dividends are structurally challenging and may preclude large scale adoption. Data dividends scales linearly with the number of participants. In contrast, in-kind payments scale sub-linearly because the marginal cost of provisioning an application or software service is low and can decrease with scale. For instance, Meta&#8217;s cost of hosting a marginal user is very low and the marginal cost decreases after each order of magnitude in scale.&nbsp;</p><p>The unfavorable marginal costs of data dividends could be overcome, of course, if revenue scaled faster but prevailing data prices per consumer are too low. This is again best illustrated by Meta. Meta&#8217;s average revenue per user is approximately ~$40 per year or $3.3 per month<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>. Meta is just one example of an in-kind data buyer, but Meta&#8217;s monetization per user likely represents a best case scenario<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>. </p><p>In exchange for the data that generates this revenue, Meta users can use the family of Meta products (Facebook, Instagram, Whatsapp, Messenger) for free. The monthly in-kind value of the Meta family of apps likely far exceeds $40 per year<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>. This places data dividend programs into the unenviable position of having to compete against a difficult unit economic equation. &nbsp;&nbsp;</p><p>A corollary challenge is adverse selection and redundant opt-in of participants. Consumers willing to accept a low price for their data is correlated with certain demographic, socioeconomic, and other biases. For many (but not all) market research and advertising purposes, this set of users is less valuable. Also, consumers with this preference are likely to sign up for many such data dividend programs to maximize their earnings &#8211; so the total universe of data dividend users generated even across data dividend programs often has high overlap.</p><p>Finally, there is a lack of consensus among policy stakeholders that data dividends are a good solution. For instance, the Electronic Frontier Foundation (&#8220;EFF&#8221;), a digital rights and privacy advocacy group, often in favor of stricter data privacy laws, actually opposes data dividend programs<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>. This is surprising at first glance since the media-driven argument against passive data collection has been that consumers are deceived is clearly not relevant with data dividends. Instead the EFF argues that data dividends are unfair on the basis that the consumer is deceived into the relative value of their data.&nbsp;</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p><strong>So, is the data dividend model hopeless?</strong> </p><p>Some promising ideas I have come across include:</p><p><strong>Hybrid Value</strong>: It is conceivable that certain data dividend products <em>also</em> provide value in-kind such that the combined value of the dividend with the utility from the app together overcomes the unit economic hurdles. Consumers would <em>explicitly </em>know they are selling their data (thereby accomplishing the stated goal of most dividend programs) but the app also provided enough utility that the magnitude of direct payments could be economical. For instance, a dividend program may offer gamified experience that is fun to participate in, in addition to being paid for data. Where the lines lie between in-kind offers and true dividend programs, I will leave to the reader.</p><p><strong>Variable Upside</strong>: Novel data dividend models could offer variable dividends dependent on some third party process or user action. For example, a quant fund may offer a stake in results in exchange for contributing data. Similar profit-sharing programs were popularized with the crypto wave as Decentralized Autonomous Organizations (&#8220;DAOs&#8221;). Or, a dividend program may offer discounts for purchases consumers were going to make anyway in exchange for their data.&nbsp;</p><p><strong>What am I missing?</strong></p><p>I could be wrong. This time it could be different. It is possible that consumer preferences for privacy change drastically wherein the unit economics are not the incentive driver (rather, participation becomes an ethical statement). It is possible that privacy oriented changes, such as the elimination of internet cookies, drives up the price of identifiable data. And, it is possible that someone invents use cases for consumer data that are so lucrative that very high dividends are justified.&nbsp;If you have counter examples or if you are aware of a very large data dividend driven dataset, please reach out.</p><p>There has been much discussion about how to compensate authors, journalists, artists, and other content creators for their work being used in AI. I have heard variations of the data dividend pitch to accomplish this. Proponents of such a solution would do well to consider the consumer data precedent. </p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I have previously written a more full accounting of the problems with government surveys:  </p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;f5d40550-38ff-4403-8d62-68329372d255&quot;,&quot;caption&quot;:&quot;Pick up a macroeconomic textbook, and you&#8217;ll read chapters of theory, mathematical equations, and models of rational behavior. Heavy on mathematical and social sciences, such books are light on one essential thing: strong evidence that the theories, equations, and models correspond to the real world.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Datanomics&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5502194,&quot;name&quot;:&quot;Alex Izydorczyk&quot;,&quot;bio&quot;:&quot;I&#8217;m working on something new in Data/Data Science. Formerly I was the Head of Data Science at Coatue. https://alexizydorczyk.com/&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ad90856-1fb7-4609-b52a-46269d6a6fc2_3801x2534.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2022-02-13T23:23:44.021Z&quot;,&quot;cover_image&quot;:null,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://magis.substack.com/p/datanomics&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:48708109,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:10,&quot;comment_count&quot;:2,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Magis&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c41f48-a7aa-4609-966a-ee62fc65f2e4_640x640.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>For instance, Apple&#8217;s <a href="https://www.forbes.com/sites/johnkoetsier/2020/06/29/apple-killed-the-idfa-what-else-dies/?sh=2b288cd5262f">decision</a> to &#8220;kill&#8221; IDFA.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>Based on the annualized Q4 ARPP in Meta&#8217;s 2023 <a href="https://d18rn0p25nwr6d.cloudfront.net/CIK-0001326801/c7318154-f6ae-4866-89fa-f0c589f2ee3d.pdf">10-K</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>This is just a matter of opinion, but I would make the case that Meta is among the most lucrative data monetization. Meta owns effectively an exclusive dataset with extremely specialized ad targeting ability and is extremely penetrated among advertisers (data buyers). Therefore, I think Meta&#8217;s revenue per user represents, more or less, the highest value scenario for data sellers.  </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>Again, a matter of opinion and difficult to estimate exactly. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>The EFF&#8217;s full argument is <a href="https://www.eff.org/it/deeplinks/2020/10/why-getting-paid-your-data-bad-deal">here</a>.</p></div></div>]]></content:encoded></item><item><title><![CDATA[AI Data Licensing Deals ]]></title><description><![CDATA[Ongoing List of Generative AI Content Licensing Deals]]></description><link>https://magis.substack.com/p/ai-data-licensing-deals</link><guid isPermaLink="false">https://magis.substack.com/p/ai-data-licensing-deals</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Sat, 20 Apr 2024 18:43:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!S5J1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20d74619-08fd-4a42-8ef9-b3f176bfa827_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I have compiled a non-comprehensive list of content licensing deals that have been publicly reported, with a particular focus on interesting details disclosed in SEC filings and earnings transcripts. If I have missed any deals, particularly from public companies that mention such deals in earnings transcripts, analyst days, or SEC filings, please email me and I will add them. </p><ul><li><p><strong>Reddit</strong> </p><ul><li><p>Widely reported to have a ~$60M per year access for ongoing and historical data with Google<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>. S-1 reports<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> they expect to recognize $66.4M by end of 2024. Aggregate contract value of $203M with terms ranging 2-3 years. </p></li></ul></li><li><p><strong>Shutterstock</strong></p><ul><li><p>Existing deals with Meta and a 6-year<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> deal with OpenAI.</p></li><li><p>$25-50M deals with Amazon, Apple<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> based on statement by CFO</p></li><li><p>The relevant segment, <em>Data Distribution and Services</em><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a><em>,</em>  grew from $15.9M in &#8216;21 to $137M in &#8216;23, suggesting very roughly ~$100M of already recognized revenue from generative AI licensing.</p></li><li><p>Shutterstock CEO referenced existing deals with OpenAI and Meta and expressed desire to expand licensing to broader audience in Data Marketplace like Snowflake, AWS, etc.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a></p></li></ul></li><li><p><strong>Yelp</strong></p><ul><li><p>Perplexity reportedly licensed data from Yelp<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a>, though it remains unclear if this deal is materially different from other pre-generative AI deals Yelp enters into to distribute reviews, restaurants, etc.</p></li><li><p>Yelp reports data licensing in its <em>Other</em> category<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a> of ~$47M but this includes other types of revenue. This number jumped from ~21M in &#8216;20 to 47M in &#8216;23<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-9" href="#footnote-9" target="_self">9</a> which suggests a generate AI bump of ~25M. </p></li></ul></li><li><p><strong>Reuters</strong></p><ul><li><p>Added $22M in the Reuters News Segment<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-10" href="#footnote-10" target="_self">10</a>, the majority of which was apparently driven by &#8220;transactional&#8221; content licensing for artificial intelligence. This increased News Segment margin by 6.5%, so we can surmise this incremental revenue was largely content licensing.</p></li></ul></li><li><p><strong><a href="https://www.wiley.com/en-us">Wiley</a></strong></p><ul><li><p>Onetime fee of $23M for previously published academic articles and books<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-11" href="#footnote-11" target="_self">11</a>. The CEO expressed interest in finding more such deals. Wiley owns both academic journals and a large boom publishing business.</p></li></ul></li><li><p><strong><a href="https://www.freepik.com/">Freepik</a></strong></p><ul><li><p>Licensed ~200 million images at 2 to 4 cents per image, suggesting ~6M/license to at least two AI firms<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-12" href="#footnote-12" target="_self">12</a></p></li></ul></li><li><p><strong>Axel Springer, Associated Press, <a href="https://www.lemonde.fr/en/">Group Le Monde</a>, <a href="https://www.prisa.com/en">Prisa</a> </strong></p><ul><li><p>Associated Press was among the first organization to announce licensing data<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-13" href="#footnote-13" target="_self">13</a></p></li><li><p>Multi-year contract for ongoing and historical access to Le Monde corpus with OpenAI<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-14" href="#footnote-14" target="_self">14</a> announced simultaneously with Prisa deal<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-15" href="#footnote-15" target="_self">15</a> </p></li><li><p>I have not been able to find specific financial details for any of these, but <em>The Information</em> reports<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-16" href="#footnote-16" target="_self">16</a> OpenAI offers 1-5M per corpus whereas Apple offers 50M over a multi-year period </p></li><li><p>Previously, for display (not generative AI purposes), <em>WSJ</em> reported Meta was offering publishers 3M<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-17" href="#footnote-17" target="_self">17</a></p></li></ul></li><li><p><strong>X (formerly Twitter)</strong></p><ul><li><p>Well covered that Firehose access will cost 42K/month or 2.5M per year although it is not quite clear what rights/use case this level of access comes with. While largely reported at priced too high, a 2.5M price would seem reasonable or even cheap relative to the above.</p></li></ul></li><li><p><strong>StackOverflow</strong></p><ul><li><p>Deal with Google Gemini that also includes workflow integration with Google Cloud console<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-18" href="#footnote-18" target="_self">18</a> </p></li></ul></li><li><p><strong>Photobucket</strong></p><ul><li><p>Negotiating contracts at &#8220;5 cents and $1 dollar per photo and more than $1 per video&#8221;.  With 13 billion photos (an order of magnitude more than Shutterstock), this could be a significant revenue source but we would have to assume the rights and usefulness of the entire content library are not as robust.</p></li></ul></li><li><p><strong>Automattic (Tumblr &amp; Wordpress)</strong></p><ul><li><p>OpenAI and Midjourney apparently licensed, or at least evaluated, data from both Tumblr and Wordpress<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-19" href="#footnote-19" target="_self">19</a> though I was unable to find any specific financials.</p></li></ul></li><li><p><strong>NewsCorp</strong></p><ul><li><p>Owner of Fox News, NY Post, among others is reported to be near data deals<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-20" href="#footnote-20" target="_self">20</a> and expects to be &#8220;core content provider&#8221; based on last earnings call<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-21" href="#footnote-21" target="_self">21</a>.  </p></li><li><p>Based on earnings call comments, deal is is in negotiation as of Februry &#8216;24 but given CEO&#8217;s compliment to Sam Altman&#8217;s approach, it is reasonable to guess OpenAI is involved.</p><p></p></li></ul></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!S5J1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20d74619-08fd-4a42-8ef9-b3f176bfa827_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!S5J1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20d74619-08fd-4a42-8ef9-b3f176bfa827_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!S5J1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20d74619-08fd-4a42-8ef9-b3f176bfa827_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!S5J1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20d74619-08fd-4a42-8ef9-b3f176bfa827_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!S5J1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20d74619-08fd-4a42-8ef9-b3f176bfa827_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!S5J1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20d74619-08fd-4a42-8ef9-b3f176bfa827_1024x1024.png" width="406" height="406" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/20d74619-08fd-4a42-8ef9-b3f176bfa827_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:406,&quot;bytes&quot;:2027401,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!S5J1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20d74619-08fd-4a42-8ef9-b3f176bfa827_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!S5J1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20d74619-08fd-4a42-8ef9-b3f176bfa827_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!S5J1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20d74619-08fd-4a42-8ef9-b3f176bfa827_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!S5J1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20d74619-08fd-4a42-8ef9-b3f176bfa827_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>Who is missing?</strong></p><p>I was unable to find any specific data deal references from TripAdvisor, TikTok, or SoundCloud, which all seem like obvious candidates for data deals. Quora also has been involved in launching AI products<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-22" href="#footnote-22" target="_self">22</a> bit I have found no obvious data deal disclose.</p><p>NY Times<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-23" href="#footnote-23" target="_self">23</a> and IAC<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-24" href="#footnote-24" target="_self">24</a> have taken a more combative stance, using the courts to protect IP rather than negotiating data deals. Other companies, like Thomson Reuters engages in data licensing but has taken selective court action<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-25" href="#footnote-25" target="_self">25</a>.</p><p>Other companies such as <a href="https://www.relx.com/">RELX Group</a> (owners of LexisNexis and Elsevier) have announced AI products<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-26" href="#footnote-26" target="_self">26</a> and workflow integrations with Microsoft<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-27" href="#footnote-27" target="_self">27</a> but I have not found an outright content for training licensing deal. I have anecdotally noticed similarly situations with DaaS providers like Factset, Bloomberg, and S&amp;P Global &#8212; although it is difficult to discern if any of their licensing deals are explicitly for AI training given they are in the business of content licensing in the first place.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Reuters <a href="https://www.reuters.com/technology/reddit-ai-content-licensing-deal-with-google-sources-say-2024-02-22/">article</a> that first detailed Google deal. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Reddit S-1 <a href="https://www.bamsec.com/filing/162828024011789/1?cik=1713445&amp;hl=440678:440690&amp;hl_id=nynd1e6xxl">disclosure</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p><a href="https://www.prnewswire.com/news-releases/shutterstock-expands-partnership-with-openai-signs-new-six-year-agreement-to-provide-high-quality-training-data-301873298.html">Press Release</a> from Shutterstock</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p><em>Reuters</em> <a href="https://www.reuters.com/technology/inside-big-techs-underground-race-buy-ai-training-data-2024-04-05/">article</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>Segment definition in latest <a href="https://www.bamsec.com/filing/154934624000007/1?cik=1549346&amp;hl=15531:15866&amp;hl_id=4ks8gk6gel">Shutterstock 10-K</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>Earnings <a href="https://app.tegus.co/app/company/2034/shutterstock-inc/content/event/672f2539-8d97-4520-b129-d32b993f6905/event_transcript/1703b4357776446eaf437e15d2eda4fb?prev=EARNINGS_AND_EVENTS">call transcript</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p>Verge <a href="https://www.theverge.com/2024/3/12/24098728/perplexity-chatbot-yelp-suggestions-data-ai">article</a> that appears to announce Perplexity, Yelp partnership</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><p><a href="https://www.bamsec.com/filing/134501624000009/1?cik=1345016&amp;hl=33190:33391&amp;hl_id=eky3d42lxx">Definition</a> of <em>Other</em> Segment that includes <em>Other Partnerships</em> sub-segment that includes the relevant revenues.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-9" href="#footnote-anchor-9" class="footnote-number" contenteditable="false" target="_self">9</a><div class="footnote-content"><p><a href="https://www.bamsec.com/filing/134501624000009/1?cik=1345016&amp;table=70">Yelp 10-K</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-10" href="#footnote-anchor-10" class="footnote-number" contenteditable="false" target="_self">10</a><div class="footnote-content"><p>Thomson Reuters <a href="https://app.tegus.co/app/company/2204/thomson-reuters-corp/document/event_transcript/17e8a9aa79ea44628e2450e61b5cafb9?prev=EARNINGS_AND_EVENTS&amp;activeOutline=9">Earnings Call</a> transcript</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-11" href="#footnote-anchor-11" class="footnote-number" contenteditable="false" target="_self">11</a><div class="footnote-content"><p>Disclosed by Interim Wiley CEO in prepared remarks in <a href="https://app.tegus.co/app/company/5439/john-wiley-sons-inc/content/event/ed3fc019-f397-4f5a-853b-0f0ad8e01260/event_transcript/2fcdd488db6145f49190d3c1d93d5cac?prev=EARNINGS_AND_EVENTS">Earnings Call</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-12" href="#footnote-anchor-12" class="footnote-number" contenteditable="false" target="_self">12</a><div class="footnote-content"><p><em>Reuters</em> <a href="https://www.reuters.com/technology/inside-big-techs-underground-race-buy-ai-training-data-2024-04-05/">article</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-13" href="#footnote-anchor-13" class="footnote-number" contenteditable="false" target="_self">13</a><div class="footnote-content"><p><a href="https://apnews.com/article/openai-chatgpt-associated-press-ap-f86f84c5bcc2f3b98074b38521f5f75a">Associated Press own coverage</a> of the deal</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-14" href="#footnote-anchor-14" class="footnote-number" contenteditable="false" target="_self">14</a><div class="footnote-content"><p><a href="https://www.lemonde.fr/en/about-us/article/2024/03/13/le-monde-signs-artificial-intelligence-partnership-agreement-with-open-ai_6615418_115.html">Announcement</a> on Le Monde website</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-15" href="#footnote-anchor-15" class="footnote-number" contenteditable="false" target="_self">15</a><div class="footnote-content"><p>OpenAI blog post mentioning Le Monde and Prisa</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-16" href="#footnote-anchor-16" class="footnote-number" contenteditable="false" target="_self">16</a><div class="footnote-content"><p><em>The Information</em> <a href="https://www.theinformation.com/articles/openai-offers-publishers-as-little-as-1-million-a-year">article</a> </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-17" href="#footnote-anchor-17" class="footnote-number" contenteditable="false" target="_self">17</a><div class="footnote-content"><p><em>Wall Street Journal</em> <a href="https://www.wsj.com/articles/facebook-offers-news-outlets-millions-of-dollars-a-year-to-license-content-11565294575?mod=breakingnews">article</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-18" href="#footnote-anchor-18" class="footnote-number" contenteditable="false" target="_self">18</a><div class="footnote-content"><p><em>Wired</em> <a href="https://www.wired.com/story/google-deal-stackoverflow-ai-giants-pay-for-data/">article</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-19" href="#footnote-anchor-19" class="footnote-number" contenteditable="false" target="_self">19</a><div class="footnote-content"><p><em>404 Media</em> <a href="https://www.404media.co/tumblr-and-wordpress-to-sell-users-data-to-train-ai-tools/">article</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-20" href="#footnote-anchor-20" class="footnote-number" contenteditable="false" target="_self">20</a><div class="footnote-content"><p><em>NY Post</em> (owned by Newscorp) <a href="https://nypost.com/2024/02/08/business/news-corp-in-advanced-talks-with-ai-firms-on-deals-to-license-content-ceo-says/">own reporting</a> on deal</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-21" href="#footnote-anchor-21" class="footnote-number" contenteditable="false" target="_self">21</a><div class="footnote-content"><p>Newscorp <a href="https://app.tegus.co/app/company/4159/news-corp/content/event/59c2513a-ca24-4898-b1e7-e25987a8a7d2/event_transcript/039e923256f24805adde405b53d6d1df?prev=EARNINGS_AND_EVENTS">earnings transcript</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-22" href="#footnote-anchor-22" class="footnote-number" contenteditable="false" target="_self">22</a><div class="footnote-content"><p>Quora <a href="https://quorablog.quora.com/Poe-1">Poe</a> launch</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-23" href="#footnote-anchor-23" class="footnote-number" contenteditable="false" target="_self">23</a><div class="footnote-content"><p><em>NY Times</em> own <a href="https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html">coverage of a lawsuit</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-24" href="#footnote-anchor-24" class="footnote-number" contenteditable="false" target="_self">24</a><div class="footnote-content"><p>Barry Diller&#8217;s <a href="https://www.cnbc.com/2023/09/26/barry-diller-says-fair-use-needs-to-be-redefined-to-address-ai.html">comments</a> on CNBC</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-25" href="#footnote-anchor-25" class="footnote-number" contenteditable="false" target="_self">25</a><div class="footnote-content"><p><a href="https://today.westlaw.com/Document/I720ff0cab79911ed8636e1a02dc72ff6/View/FullText.html?transitionType=Default&amp;contextData=(sc.Default)&amp;firstPage=true">Reuters v. Ross Intelligence</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-26" href="#footnote-anchor-26" class="footnote-number" contenteditable="false" target="_self">26</a><div class="footnote-content"><p>LexisNexis product announcements <a href="https://www.lawnext.com/2023/05/lexisnexis-enters-the-generative-ai-fray-with-limited-release-of-new-lexis-ai-using-gpt-and-other-llms.html">summary</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-27" href="#footnote-anchor-27" class="footnote-number" contenteditable="false" target="_self">27</a><div class="footnote-content"><p>Microsoft partnership <a href="https://www.lawnext.com/2023/08/lexisnexis-lays-out-more-details-on-its-collaboration-with-microsoft-to-roll-out-generative-ai-products.html">details</a></p></div></div>]]></content:encoded></item><item><title><![CDATA[Timeseries, Transformers, and Two Cultures]]></title><description><![CDATA[How Transformer-Based Neural Networks could Predict Chaotic Timeseries]]></description><link>https://magis.substack.com/p/timeseries-transformers-and-two-cultures</link><guid isPermaLink="false">https://magis.substack.com/p/timeseries-transformers-and-two-cultures</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Sat, 06 Apr 2024 15:01:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!K2p6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff003d0d6-3201-474d-8b67-1cb44f991702_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the academic literature on statistical learning (&#8220;AI&#8221; if you prefer&#8221;), there is a <a href="https://projecteuclid.org/journals/statistical-science/volume-16/issue-3/Statistical-Modeling--The-Two-Cultures-with-comments-and-a/10.1214/ss/1009213726.full">famous paper</a> by Leo Breiman<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> about two cultures in statistics<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. One culture assumes that observed data are generated by some stochastic process that can be modeled. The other culture assumes the data generating process is unknown but algorithms can make predictions about it, even if they cannot describe the causal process behind that prediction (ie. a &#8220;blackbox&#8221;). At the time the paper was written, the first culture was in vogue &#8211; statisticians, economists, and social scientists focused on proposing models that could explain the world and then fit the data to the model. These models propose interpretable statistical representations and causal theorems of the world, but often perform poorly at prediction and empirical data deviate from the model. The black box approach to statistics was, at the time, less common. However, this second culture caught up as large datasets (ie. &#8220;big data&#8221;) and cheap compute made methods that previously were only hypothesized to work practical. Competition from computer science departments also drove this culture forward. In most prediction and forecasting applications today, the state of the art now relies on this second culture. Neural networks<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>, in particular, ended up outperforming most other methods in many applications. As of this writing, a relatively simple neural network architecture, called the transformer, has yielded astounding prediction results in the area of language, images, and video (referred to as &#8220;unstructured&#8221; data). The applications of these predictions, in areas such as chatbots, self-driving cars, or video generation, has made the second culture dominant.&nbsp;</p><p>However, these impressive advances in deep learning have not yielded similar advances in forecasting <em>chaotic systems</em>. Examples of such systems include geopolitics, financial markets, or the weather. This category of prediction problem shares some characteristics. They involve structured tabular data, most often dealing with timeseries, the best known models fit the data extremely weakly, the processes are non-stationary<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>, and human ability at this class of problems is poor and not limited just by speed, information, or computation. Neural networks, including transformers, have not transformed this field yet. Attempts at applying transformer based models to timeseries have had mixed results. Amazon and Salesforce, for instance, recently released Chronos<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> and Morai<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a> respectively, a transformer based foundational models. Interestingly, basic benchmarks show these complex models underperform ensembles of standard econometric forecasting models<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a>. Systematic evidence from the M-Competitions show that linear models continue to perform well and tree-based models generally win the competitions<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!K2p6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff003d0d6-3201-474d-8b67-1cb44f991702_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!K2p6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff003d0d6-3201-474d-8b67-1cb44f991702_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!K2p6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff003d0d6-3201-474d-8b67-1cb44f991702_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!K2p6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff003d0d6-3201-474d-8b67-1cb44f991702_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!K2p6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff003d0d6-3201-474d-8b67-1cb44f991702_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!K2p6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff003d0d6-3201-474d-8b67-1cb44f991702_1024x1024.png" width="478" height="478" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f003d0d6-3201-474d-8b67-1cb44f991702_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:478,&quot;bytes&quot;:1478832,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!K2p6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff003d0d6-3201-474d-8b67-1cb44f991702_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!K2p6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff003d0d6-3201-474d-8b67-1cb44f991702_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!K2p6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff003d0d6-3201-474d-8b67-1cb44f991702_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!K2p6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff003d0d6-3201-474d-8b67-1cb44f991702_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I have heard three competing hypotheses as to whether and how transformer based neural networks will eventually overcome this relative underperformance in chaotic system forecasting: first, that they will eventually solve this given the rate improvement, second that not enough data exists for them to do so, and third, that they will solve chaotic forecasting indirectly, by helping practitioners source more specialized statistical methods.&nbsp;</p><p>The first hypothesis is the most straightforward. It proposes that after future improvements to transformer-based models, they will begin to perform well on chaotic systems prediction. I do not have a deep enough intuition to really evaluate the merit of this hypothesis. All statistical learning approaches &#8211; deep learning, tree-based methods, hierarchical Bayes, and others &#8211; take the basic approach of estimating a huge number of parameters and then regularizing them to prevent overfitting<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-9" href="#footnote-9" target="_self">9</a>. It is certainly conceivable that sufficiently large transformer-based models, with the right balance of parameter count and regularization penalty, could replicate other statistical approaches.&nbsp;</p><p>The second hypothesis is that we do not have enough data &#8211; or at least enough observations of events &#8211; due to timescales to forecast non-stationary systems. This may sound counterintuitive as financial markets, for instance, generate terabytes of data a day based on the high frequency events. Other chaotic systems like astronomy or weather generate the largest structured datasets. This argument hinges on the possibility that certain stochastic processes across regimes, and those regimes take an extremely long time to observe. For example, if financial markets behave differently under recessions, it is surely problematic that only around 8 recessions have occurred since 1945<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-10" href="#footnote-10" target="_self">10</a>. The same can be said of major geopolitical events such as wars, climate patterns like global warming, or astronomical phenomena. The timescale at which events of interest occur makes it very difficult to capture sufficient observations. Related to this reasoning is the problem of reflexivity: the very act of successfully forecasting certain chaotic systems at scale will change those systems. For instance, if OpenAI released a model that could predict stock prices, market participants would immediately employ it, causing it to lose power. More generally, if participants in a chaotic system react to forecasts and adjust accordingly, top-level timeseries models on aggregate timeseries will not produce useful forecasts &#8211; the <em>Lucas Critique</em><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-11" href="#footnote-11" target="_self">11</a> of economics. This hypothesis does imply, however, that micro-level models of agents in a chaotic system (agent-based modeling<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-12" href="#footnote-12" target="_self">12</a>) might be a viable approach and, interestingly, might be possible with transformer models.&nbsp;</p><p>A final proposition is that transformer based models will not assist in pushing chaotic system forecasting forward but that these algorithms will push the state of the art forward by &#8220;sourcing&#8221; appropriate methods. The number of effective statistical algorithms available and well suited to such problems far outpaces the private sectors&#8217; ability to apply them. Many of these algorithms belong in the first culture of statistics, but only produce empirically good predictions in very specific circumstances. As a result, these algorithms are often esoteric, difficult to parameterize, and require domain knowledge. There is a shortage of human capital aware of and able to apply them. Large language models may well serve as &#8220;co-pilots&#8221; or &#8220;auto-pilots&#8221; that identify opportunities to apply highly specialized but non-transformer based models to make forecasts. </p><p>The somewhat amusing implication of this last hypothesis is that black box neural networks &#8211; the epitome of second culture statistics &#8211; may propose, develop, and test first culture models. The second culture will lead us back to the first.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>On a side note, <a href="https://en.wikipedia.org/wiki/Leo_Breiman">Breiman</a> is probably under-appreciated, especially outside academic circles, for his contributions to machine learning. Many of his contributions underpin key concepts in AI/machine learning and some of his statistical learning algorithms, such as Random Forest, remain mainstream.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p><a href="https://projecteuclid.org/journals/statistical-science/volume-16/issue-3/Statistical-Modeling--The-Two-Cultures-with-comments-and-a/10.1214/ss/1009213726.full">Two Cultures</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>Unfortunately, I will use the terms Neural Networks, Deep Learning, and Transformer-based models and their common application in Large Language Models (LLMs)  interchangeably. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>A <a href="https://en.wikipedia.org/wiki/Stationary_process">formal definition</a> of non-stationarity. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p><a href="https://www.amazon.science/blog/adapting-language-model-architectures-for-time-series-forecasting">Amazon Blog Post on Chronos</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p><a href="https://blog.salesforceairesearch.com/moirai/">Salesforce Blog Post on Morai</a> </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p>Benchmarks on <a href="https://www.linkedin.com/posts/mergenthaler_happyforecasting-timeseries-forecasting-activity-7176679432214982656-maET/?utm_source=share&amp;utm_medium=member_desktop">Chronos</a> and <a href="https://www.linkedin.com/posts/mergenthaler_happyforecasting-timeseries-forecasting-activity-7179237340177952768-cMs5/?utm_source=share&amp;utm_medium=member_desktop">Morai</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><p>See the M-Competitions <a href="https://forecasters.org/resources/time-series-data/">here</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-9" href="#footnote-anchor-9" class="footnote-number" contenteditable="false" target="_self">9</a><div class="footnote-content"><p><a href="http://www.stat.columbia.edu/~gelman/">Andrew Gelman</a> has an interesting discussion on this topic in context of Breiman&#8217;s paper <a href="http://www.stat.columbia.edu/~gelman/research/published/gelman_breiman.pdf">here</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-10" href="#footnote-anchor-10" class="footnote-number" contenteditable="false" target="_self">10</a><div class="footnote-content"><p><a href="https://en.wikipedia.org/wiki/List_of_recessions_in_the_United_States#Great_Depression_onward_(1929%E2%80%93present)">List of recessions</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-11" href="#footnote-anchor-11" class="footnote-number" contenteditable="false" target="_self">11</a><div class="footnote-content"><p><a href="https://en.wikipedia.org/wiki/Lucas_critique">Lucas Critique</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-12" href="#footnote-anchor-12" class="footnote-number" contenteditable="false" target="_self">12</a><div class="footnote-content"><p>I have written about LLM-based agent based models here: </p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0bf807a6-f3d2-4df2-aced-dd574372ea86&quot;,&quot;caption&quot;:&quot;I have recently been thinking about old ideas that could be revived based on the recent progress in AI and LLMs. Recently, I wrote about the Semantic Web. Another similar area is agent-based modeling. Analogous to the semantic web, this technique has underdelivered and&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;AI Towns and Agent Based Modeling&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5502194,&quot;name&quot;:&quot;Alex Izydorczyk&quot;,&quot;bio&quot;:&quot;I&#8217;m working on something new in Data/Data Science. Formerly I was the Head of Data Science at Coatue. https://alexizydorczyk.com/&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ad90856-1fb7-4609-b52a-46269d6a6fc2_3801x2534.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-08-26T20:44:36.417Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92b8b04f-8ce8-42d9-862b-2e9041d7219f_881x756.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://magis.substack.com/p/ai-towns-and-agent-based-modeling&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:136441115,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:8,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Magis&quot;,&quot;publication_logo_url&quot;:&quot;&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></div></div>]]></content:encoded></item><item><title><![CDATA[Churn Paradox]]></title><description><![CDATA[And, why you might be calculating Net Dollar Retention wrong]]></description><link>https://magis.substack.com/p/churn-paradox</link><guid isPermaLink="false">https://magis.substack.com/p/churn-paradox</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Tue, 20 Feb 2024 23:00:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UGfM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd9250a9-9843-4341-9a58-d057ca86bc6c_677x541.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p><em>There are three kinds of lies: lies, damned lies, and statistics.</em></p><p><em>&#8212; Mark Twain<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></em></p></blockquote><p>Simpson&#8217;s paradox is a famous statistical paradox taught in Statistics 101 classes<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. The paradox can be illustrated with its most famous example: sex bias in admission to UC Berkeley. In 1973, 44% of male applicants were admitted to the PhD program they applied to whereas the rate was only 35% for female applicants. In aggregate, it appears that there is a statistically significant bias against female applicants. However, when applications are disaggregated by department, the majority of 85 departments had no significant bias against either sex, 4 were biased against women, and 6 were biased against men. This apparent contradiction is resolved when one considers that women applied at higher frequencies to departments with lower overall acceptance rates. In aggregate, the acceptance rate for women appears significantly lower than for men only because this aggregate is proportionally weighted by the number of applicants to each department &#8211; so the aggregate rate for women overweighs (relative to men) departments that are overall harder to get into. Similar paradoxes occur in medical research &#8211; for instance, a treatment may appear more effective at curing cancer overall, but that effect disappears when the patients in the study are split by severity of the cancer.&nbsp;</p><p>The lesson from Simpson&#8217;s paradox is that aggregate metrics can lead to the wrong conclusions when one does not consider the composition of the aggregate. Statisticians classify such studies as missing confounding variables &#8211; what accounts for the differences in groups is something other than the treatment being studied. This phenomenon is often obvious ex post but can be hard to recognize, as in the UC Berkeley case: if researchers were to test a new cancer drug only on benign forms of cancer, clearly their mortality results would look better when compared to existing drugs.&nbsp;</p><p><a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4722115">Dan McCarthy et al.</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> recently published a fascinating study demonstrating an aggregation paradox occurring in the construction of churn metrics frequently used by investors to study a company&#8217;s performance over time. The paradox is statistically interesting because the aggregation bias is extremely subtle, but it is also practically important because it can lead to the somewhat puzzling conclusions that contradict common investor assumptions. The bias implies that:</p><ol><li><p>A decreasing aggregate churn rate (or inversely, an increasing retention rate) may not indicate that the long-term quality of a company&#8217;s revenue base is increasing.&nbsp;</p></li><li><p>A sudden increase in aggregate churn rate (or inversely, a sudden increase in retention rate) may be the result of an increase in new customers and <em>not</em> say anything about the quality of a company&#8217;s revenue base.&nbsp;</p></li></ol><p>For clarity, aggregate churn rate (or inversely, retention rate) refers to a common but problematic construction of a very common business metric meant to measure the stickiness or longevity of customers with a subscription service or product. Churn rates are expressed as percentages (i.e., what percentage of customers left a service). Aggregate churn rates refer to the construction of this metric over an entire customer base. This is one of the most common reported formulations, especially in quarterly reports, financial statements, and investor updates. For instance, Netflix defines churn rate as &#8220;Churn is a monthly measure defined as customer cancellations in the quarter divided by the sum of beginning subscribers and gross subscriber additions, then divided by three months.&#8221;<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> Similarly, Freshworks, a software company, defines a retention rate similarly when it explains that &#8220;To calculate net dollar retention rate as of a particular date, we first determine &#8220;<em>Entering ARR [Annual Recurring Revenue],&#8221; which is ARR from the population of our customers as of 12 months prior to the end of the reporting period. We then calculate the &#8220;Ending ARR&#8221; from the same set of customers as of the end of the reporting period. We then divide the Ending ARR by the Entering ARR to arrive at our net dollar retention rate. Ending ARR includes upsells, cross-sells, and renewals during the measurement period and is net of any contraction or attrition over this period.</em>&#8221;<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> And, in fact, authoritative sources such as Salesforce<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>, Hubspot<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a>, or popular blogs<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a> instruct prospective investors or operators to construct churn or retention metrics in more or less this manner: across the entire comparable customer base. Investors typically assume that decreasing churn rates over time are positive because they indicate increased customer stickiness and, therefore, increased lifetime value of the existing customer base.</p><p>Dan McCarthy et al. show that the common, aggregate construction of churn is problematic when compared over time, due to the varying sizes of customer acquisition cohorts. The authors illustrate the paradox by showing how a company, Hubble Contacts, can report consistently decreasing aggregate churn (i.e., each period, a smaller percentage of their customer base leaves) and yet each subsequent customer cohort has a materially <em>worse</em> retention curve (Figure below). The paradox will become apparent when revenue projections will lower (as the lifetime value of the customer base is progressively lower) despite the company reporting continually better and better churn metrics. </p><p>The paradox occurs when a fixed-period (i.e., 1-month, 12-month, etc.) aggregate churn rate is reported over time, with no regard to the relative customer cohort sizes. The problem is that the average age of a customer base changes at a non-constant rate because the size of customer cohorts (or equivalently, the customer acquisition rate) changes in size over time. This non-constant change in average customer age or, equivalently, different sized customer acquisition cohorts, is problematic because retention curves are, themselves, changing at non-constant rates. So, where each customer cohort is in their curve matters significantly to that cohort&#8217;s next period expected churn, and the aggregate churn metric &#8211; which will average across retention curves for each cohort, weighted by cohort size &#8211;&nbsp; will therefore change partially in proportion to the average customer age. The confounding variable, or missing grouping in other words, is the number of customers in each cohort.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UGfM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd9250a9-9843-4341-9a58-d057ca86bc6c_677x541.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UGfM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd9250a9-9843-4341-9a58-d057ca86bc6c_677x541.png 424w, https://substackcdn.com/image/fetch/$s_!UGfM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd9250a9-9843-4341-9a58-d057ca86bc6c_677x541.png 848w, https://substackcdn.com/image/fetch/$s_!UGfM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd9250a9-9843-4341-9a58-d057ca86bc6c_677x541.png 1272w, https://substackcdn.com/image/fetch/$s_!UGfM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd9250a9-9843-4341-9a58-d057ca86bc6c_677x541.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UGfM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd9250a9-9843-4341-9a58-d057ca86bc6c_677x541.png" width="677" height="541" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fd9250a9-9843-4341-9a58-d057ca86bc6c_677x541.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:541,&quot;width&quot;:677,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UGfM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd9250a9-9843-4341-9a58-d057ca86bc6c_677x541.png 424w, https://substackcdn.com/image/fetch/$s_!UGfM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd9250a9-9843-4341-9a58-d057ca86bc6c_677x541.png 848w, https://substackcdn.com/image/fetch/$s_!UGfM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd9250a9-9843-4341-9a58-d057ca86bc6c_677x541.png 1272w, https://substackcdn.com/image/fetch/$s_!UGfM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd9250a9-9843-4341-9a58-d057ca86bc6c_677x541.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The corollary is that a sudden increase in churn rate may occur only when there is a sudden influx of customers (who immediately start at the <em>steepest</em> drop-off of their retention curve). The aggregate churn rate is weighted towards these new customers, and these are <em>expected</em> to churn at the highest rate.&nbsp;</p><p>The paper focuses on companies offering monthly consumer subscriptions; however, its findings are not specific to monthly subscriptions or to consumer companies. The bias is convenient to demonstrate in such conditions although it is generalizable to any type of aggregate churn measured at a fixed period and compared over time. Most software-as-a-service (SaaS) companies, for instance, will consider 12-month net dollar retention. They are not excused from this effect &#8211;&nbsp; annual changes in their NDR will still be driven by the blend of customer ages in their customer base; the resulting changes just show up a year later. A SaaS company can therefore report incrementally favorable results in their net dollar retention rates over time, while each subsequent cohort of customers has worse lifetime values.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Actually, apparently Mark Twain <a href="https://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics">attributed</a> this quote to Benjamin Disraeli, but there does not appear to be a source.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p><a href="https://en.wikipedia.org/wiki/Simpson%27s_paradox">Simpson&#8217;s Paradox</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p><a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4722115">https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4722115</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p><a href="https://www.sec.gov/Archives/edgar/data/1065280/000119312510235785/d10q.htm">Netflix </a>Quarterly Filing</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p><a href="https://www.sec.gov/Archives/edgar/data/1544522/000162828021017717/freshworkss-1.htm">Freshworks S-1</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p><a href="https://www.sec.gov/Archives/edgar/data/1065280/000119312510235785/d10q.htm">Salesforce</a> - How to Calculate Customer Churn Rate and Revenue Churn Rate</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p><a href="https://cdn2.hubspot.net/hub/171901/file-18462400-pdf/ebooks/simple_guide_to_churn_analysis.pdf">Hubspot</a> &#8212; A Simple Guide to Churn Analysis</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><p><a href="https://www.thesaascfo.com/how-to-calculate-net-dollar-retention/">https://www.thesaascfo.com/how-to-calculate-net-dollar-retention/</a></p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[LLM Data Sales: A Market for Lemons?]]></title><description><![CDATA[Why selling data to AI model trainers is a difficult proposition]]></description><link>https://magis.substack.com/p/llm-data-sales-a-market-for-lemons</link><guid isPermaLink="false">https://magis.substack.com/p/llm-data-sales-a-market-for-lemons</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Wed, 14 Feb 2024 14:02:07 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FQYI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa487df71-9d96-44f1-9d6c-cf77535a1f8a_916x922.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Many technology investors and pundits predict that proprietary data owners will capture the majority of economic profits from the recent advances in artificial intelligence, particularly in large language models (LLMs) and generative foundation models<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. A possible conjecture is that a market for training datasets will appear. Model training companies have already reportedly been seeking datasets to buy. The concept of selling data as a core business model is not new &#8211; existing data vendors make billions of dollars per year selling data<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>. However, a business model of selling data for the purpose of training generative models is new but uniquely difficult. If the associated challenges are not solved, value may well still accrue to proprietary data owners, but only to those able to train and operationalize their own models, rather than more broadly<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FQYI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa487df71-9d96-44f1-9d6c-cf77535a1f8a_916x922.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FQYI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa487df71-9d96-44f1-9d6c-cf77535a1f8a_916x922.png 424w, https://substackcdn.com/image/fetch/$s_!FQYI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa487df71-9d96-44f1-9d6c-cf77535a1f8a_916x922.png 848w, https://substackcdn.com/image/fetch/$s_!FQYI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa487df71-9d96-44f1-9d6c-cf77535a1f8a_916x922.png 1272w, https://substackcdn.com/image/fetch/$s_!FQYI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa487df71-9d96-44f1-9d6c-cf77535a1f8a_916x922.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FQYI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa487df71-9d96-44f1-9d6c-cf77535a1f8a_916x922.png" width="396" height="398.5938864628821" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a487df71-9d96-44f1-9d6c-cf77535a1f8a_916x922.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:922,&quot;width&quot;:916,&quot;resizeWidth&quot;:396,&quot;bytes&quot;:1388076,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FQYI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa487df71-9d96-44f1-9d6c-cf77535a1f8a_916x922.png 424w, https://substackcdn.com/image/fetch/$s_!FQYI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa487df71-9d96-44f1-9d6c-cf77535a1f8a_916x922.png 848w, https://substackcdn.com/image/fetch/$s_!FQYI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa487df71-9d96-44f1-9d6c-cf77535a1f8a_916x922.png 1272w, https://substackcdn.com/image/fetch/$s_!FQYI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa487df71-9d96-44f1-9d6c-cf77535a1f8a_916x922.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Traditional data sales largely depend on the value of marginal data points. Marginal data points are necessary because data becomes outdated. The underlying data could be a timeseries where dates have explicit meaning (ie. stock prices) or observations that degrade in value over time (ie. corporate emails of sales leads). In both cases, the data buyer continually subscribes to a license which leads to a sustainable business model for the data vendor. In this traditional scenario, the historical data may still be valuable &#8211; in the case of stock prices it is, in the case of sales leads it is not &#8211; but it is not sufficient. I will refer to this property of data becoming outdated as <em>marginal temporal value</em>.</p><p>The types of data most in demand for training generative models, on the other hand, lack <em>marginal temporal value. </em>For training, the value is in a dataset&#8217;s volume and history. For example, a database of user generated questions and answers, such as Quora, has almost all of its relevant value in the historical data to a model trainer<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>. Quora will have an advantage in expanding this dataset &#8211; the marginal cost of creating a new question and answer observations is cheaper for Quora than for a model trainer &#8211; but the majority of Quora&#8217;s value proposition is that the dataset exists already. And, while there may be some value in getting up-to-date information for time sensitive applications<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>, for truly general reasoning, generative foundational models, this seems significantly less important than amassing enough historical data to allow the model to learn &#8220;to reason&#8221;.&nbsp;</p><p>The related, corollary, property is <em>irrevocability</em>. Once a data vendor shares data with the model trainer, the value exposed becomes difficult to take back. While a license might be revoked, once a model has been trained, the data exists as derivative model weights. There is relatively little precedent around forcing data licensees to destruct <em>derivatives</em> of formerly licensed data. Beyond precedent, it would be challenging to force model trainers to destroy derivative works because of the high cost of creating them (ie. computational cost of training). This conundrum makes pricing and evaluation difficult &#8211; what model trainer would agree to a significant price without being certain of the improvement to their model? The vendor and model trainer will have difficulty agreeing to an <em>upfront</em> price as a result of this lack of information and high cost of revoking<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a>.&nbsp;</p><p>The last difficulty of selling data to model trainers is that the generative nature (ie. ability to perpetually create new datasets from models) makes governance of downstream use cases difficult. The majority of model trainers offer their API as a service to downstream clients, who in turn build a variety of applications and services<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a>. It will be practically difficult to control downstream client usage in such a way that ensures the downstream client derives no benefit post license termination.&nbsp;</p><p>These last two properties &#8211; irrevocability and governance &#8211; are best compared with a comparative example. Consider the traditional scenario where a bank was licensing Bloomberg data for equity research reports. This bank then subsequently decides to not renew their license. The bank would no longer be able to issue new equity research reports but neither would the bank be compelled to retract past, already published, research reports. The bank&#8217;s clients relied on those past research reports and may still benefit for some relatively short period of time after the license lapses &#8211; but the economic benefit is short-lived and soon the bank&#8217;s clients need new reports from the bank to keep benefit (and so the bank needs a new license). The bank&#8217;s clients may have also trained models on these historical reports &#8211; Bloomberg has almost certainly no recourse to pull these but these models are purpose built (narrow) and likely decay in value over time. The bank&#8217;s clients keep benefiting from Bloomberg&#8217;s data but only in a limited, time constrained way.</p><p>In contrast, the generative nature of foundational models makes this far more complicated, especially as sophisticated downstream clients start using models to build and evaluate other models<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-9" href="#footnote-9" target="_self">9</a> or generate synthetic data<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-10" href="#footnote-10" target="_self">10</a>. A model trainer licenses Quora&#8217;s data. That model trainer&#8217;s downstream client then uses the trainer&#8217;s API to further train or fine-tune a more specialized but generative model using the trainer&#8217;s model as an evaluator or as a generator of synthetic data. Quora revokes the license and maybe the model trainer now needs to revoke access to their model &#8211; but what do they do about the client? The downstream model may well be able to perfectly reproduce Quora&#8217;s dataset. On the other hand, what client would sign up for a model where upstream suppliers could influence all future uses, including derived ones?</p><p>Taken together, these three properties of the foundational model training data use case &#8211; a lack of marginal temporal value, irrevocability, and downstream governance &#8211; make data-as-a-service for generative models very difficult.&nbsp;</p><p>Selling data to model trainers is not hopeless, there are some potential partial solutions. For instance, a royalty or revenue-sharing model based on consumption might well assist with fairly pricing (although downstream governance is a question). The irrevocability property could perhaps be circumvented if the marginal value of a dataset could be approximated computationally cheaply. It also may be possible to mitigate these concerns by keeping all usage of models in neutral, third party controlled computational sandboxes<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-11" href="#footnote-11" target="_self">11</a>. Nonetheless, both the business model and the technical challenges associated with this type of sale remain challenging in the near-term.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>From investors: <a href="https://a16z.com/cloud-lessons-for-the-ai-era/">https://a16z.com/cloud-lessons-for-the-ai-era/</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>From pundits: <a href="https://www.forbes.com/sites/forbestechcouncil/2023/10/17/why-the-future-of-generative-ai-lies-in-a-companys-own-data/?sh=23dd445842f6">https://www.forbes.com/sites/forbestechcouncil/2023/10/17/why-the-future-of-generative-ai-lies-in-a-companys-own-data/?sh=23dd445842f6</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;85598472-4c29-4135-a928-2883e3c8cbc8&quot;,&quot;caption&quot;:&quot;The following does not represent and is not intended to be investment advice. The last several years have seen increasing interest and funding for data analytics, big data, and machine learning. A large number of both machine learning and data infrastructure companies (&#8220;modern data stack&#8221;) have led to large financial outcomes for employees, founders, and&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Some data on data companies&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5502194,&quot;name&quot;:&quot;Alex Izydorczyk&quot;,&quot;bio&quot;:&quot;I&#8217;m working on something new in Data/Data Science. Formerly I was the Head of Data Science at Coatue. https://alexizydorczyk.com/&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ad90856-1fb7-4609-b52a-46269d6a6fc2_3801x2534.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2022-10-03T00:41:42.499Z&quot;,&quot;cover_image&quot;:null,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://magis.substack.com/p/some-data-on-data-companies&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:76172861,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:9,&quot;comment_count&quot;:2,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Magis&quot;,&quot;publication_logo_url&quot;:&quot;&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>Amusingly, Sequoia, my own investor, claims in their <a href="https://www.sequoiacap.com/article/generative-ai-act-two/">latest AI article</a> that value <em>did not</em> accrue to proprietary data owners. I disagree, I do not think it is a settled question. The best of AI may well still come from proprietary data. I do agree that easy to replicate data will become commoditized and predict that many copyright challenges will fail, further commoditizing internet data. Of course, predictions are hard, especially about the future. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>Quora is an example for convenience. In actuality, many publishers and social networks have began <a href="https://www.nytimes.com/2023/04/18/technology/reddit-ai-openai-google.html">charging for data</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>Even in time series applications, this is usually inference-oriented rather than training as far as I know. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p>The <a href="https://www.wired.com/story/twitter-data-api-prices-out-nearly-everyone/">acrimony</a> about X&#8217;s, formerly Twitter&#8217;s, data sales is demonstrates this issue. An upfront price potentially appropriate for a generative model is incompatible with the traditional way data is valued. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><p>OpenAI, Anthropic, Mistral, etc.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-9" href="#footnote-anchor-9" class="footnote-number" contenteditable="false" target="_self">9</a><div class="footnote-content"><p>LLMs as a Judge: <a href="https://arxiv.org/pdf/2306.05685.pdf">https://arxiv.org/pdf/2306.05685.pdf</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-10" href="#footnote-anchor-10" class="footnote-number" contenteditable="false" target="_self">10</a><div class="footnote-content"><p>Synethetic data from LLMs: <a href="https://www.amazon.science/blog/using-large-language-models-llms-to-synthesize-training-data">https://www.amazon.science/blog/using-large-language-models-llms-to-synthesize-training-data</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-11" href="#footnote-anchor-11" class="footnote-number" contenteditable="false" target="_self">11</a><div class="footnote-content"><p>Snowflake Container Service or Native Applications, for example.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Why Credit Card Data still makes money]]></title><description><![CDATA[And, why commoditization of alternative data is a lazy argument]]></description><link>https://magis.substack.com/p/why-credit-card-data-still-makes</link><guid isPermaLink="false">https://magis.substack.com/p/why-credit-card-data-still-makes</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Sat, 03 Feb 2024 22:06:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!T4sN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7daec70c-d6c4-407e-9fad-94de5a9c3cea_515x515.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I recently came back from Battlefin Miami<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, a conference for datasets focused on hedge funds. The conference is a trade show for data vendors pitching offerings to data buyers, predominantly hedge funds. One theme that came up was the perceived commoditization of consumer spending data<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. I frequently hear data vendors, buyers, portfolio managers, and hedge fund LPs ask: if alternative data, and in particular consumer spending data, has become commoditized and widely available, why does anyone expect to generate alpha from it?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!T4sN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7daec70c-d6c4-407e-9fad-94de5a9c3cea_515x515.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T4sN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7daec70c-d6c4-407e-9fad-94de5a9c3cea_515x515.png 424w, https://substackcdn.com/image/fetch/$s_!T4sN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7daec70c-d6c4-407e-9fad-94de5a9c3cea_515x515.png 848w, https://substackcdn.com/image/fetch/$s_!T4sN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7daec70c-d6c4-407e-9fad-94de5a9c3cea_515x515.png 1272w, https://substackcdn.com/image/fetch/$s_!T4sN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7daec70c-d6c4-407e-9fad-94de5a9c3cea_515x515.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T4sN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7daec70c-d6c4-407e-9fad-94de5a9c3cea_515x515.png" width="321" height="321" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7daec70c-d6c4-407e-9fad-94de5a9c3cea_515x515.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:515,&quot;width&quot;:515,&quot;resizeWidth&quot;:321,&quot;bytes&quot;:446048,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!T4sN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7daec70c-d6c4-407e-9fad-94de5a9c3cea_515x515.png 424w, https://substackcdn.com/image/fetch/$s_!T4sN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7daec70c-d6c4-407e-9fad-94de5a9c3cea_515x515.png 848w, https://substackcdn.com/image/fetch/$s_!T4sN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7daec70c-d6c4-407e-9fad-94de5a9c3cea_515x515.png 1272w, https://substackcdn.com/image/fetch/$s_!T4sN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7daec70c-d6c4-407e-9fad-94de5a9c3cea_515x515.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>My tongue in cheek answer is that Compustat has been selling a database for decades and alpha is still generated from it. For the non-quants, Compustat is the standard in quantitative financial research<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>. When it first became available, mere access to the data via computers was an advantage. Today, the dataset is offered commercially at relatively accessible prices. It is the definition of commoditized and yet, quantitative funds buy it, use it, and a certain number generate consistent alpha from it. This apparent paradox is resolved when the infrastructure and talent needed to execute a strategy, in addition to mere access<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>, is considered. This observation is not particularly insightful, but worth writing down because it appears so frequently forgotten.&nbsp;</p><p>The infrastructure to execute alpha generating strategies on top of consumer spending data remains a major barrier to entry. It is computationally expensive to run multiple experiments and backtests that incrementally improve forecasting ability. It is similarly expensive to update forecasts frequently. So, teams that have sufficient compute resources and institutional patience to invest in infrastructure benefit from a barrier to entry that smaller firms that rely on syndicate research do not. The incremental cost for syndicate providers to do this marginal work likely scales slower than the return on investment for large buy-side firms to invest in this capability<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>. Further, for at least some types of strategies, computational effort does not increase meaningfully with assets under management &#8211; so large firms can spend a smaller percentage of management fees on more compute. This is a very attractive property &#8211; if an actionable insight for a large cap, liquid stock will cost one million dollars to compute, it is far better to be betting regularly betting hundreds of millions on such insights than tens of millions.&nbsp;This <em>amortization effect</em> also works favorably when considering the costs of building internal tooling for processing alternative data. </p><p>Talent that can generate alpha with consumer spending data also remains extremely rare. The bleeding edge of consumer spending research relies on both having a deep technical understanding of the structure of the data and a deep domain understanding of what matters to markets. For short-term strategies, properly accounting for domain specific issues, such as the impact of sales taxes or revenue recognition, while  building systems that are fast are the crux of outracing the competition. For longer term strategies, understanding which of a potential long list of key investment questions, such as retention or hyperlocal competition, matter for which company, while keeping these calculations computationally feasible is the challenge. I have written about this talent gap at length before<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>.  </p><p>Without pretense of a major insight, I predict infrastructure and talent will remain the key differentiator to execute on information advantages accruing to investment firms that invest in more sophisticated data science teams. Beyond consumer spending data at hedge funds, similar trends are playing out in adjacent industries like venture capital. A mere five years ago, I would often be met with blank stares when telling most venture capital investors that I was focused on using data to find early stage deals. Today, virtually every venture capital firm has at least one person sourcing deals using data. Clearly, mere data feeds of LinkedIn data are no longer enough &#8211; it is about the people and infrastructure you build.&nbsp;Execution is hard and scale begets scale.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><a href="https://www.battlefin.com/events/miami-2024">https://www.battlefin.com/events/miami-2024</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>I will refer to it as consumer spending data, but it is often referred to as <em>credit card data</em> despite representing all types of consumer spending such as checks, debit cards, gift cards, etc. and not necessarily being limited to literal credit cards. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>The prevalence of <a href="https://en.wikipedia.org/wiki/Compustat">Compustat</a> in even academia can be quickly found with <a href="https://scholar.google.com/scholar?hl=en&amp;as_sdt=0%2C33&amp;q=compustat&amp;btnG=">Google Scholar</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>There is no denying that the proliferation of consumer spending data has made an impact on investor expectations and information as it pertains to consumer stocks but it appears that this has merely changed expectations. I recommend <a href="https://www.expectationsinvesting.com/">Expectations Investing</a> on this topic.  </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>For example, how quickly could a syndicate data provider recoup the costs of providing data 1 day faster through price increases likely compares unfavorably to how quickly a large institution could recoup the same cost given the immediate trading advantage.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;e28c65c2-c592-4157-aeee-03af1e40b72e&quot;,&quot;caption&quot;:&quot;Interest in alternative data has seen significant growth in the past several years, particularly in the asset management industry. McKinsey, BCG, and other global consulting firms have pointed out the opportunity that exists for businesses of all kind to monetize internal data and incorporate data from outside the enterprise to improve decision making. &#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;What Makes Alternative Data Scientists Alternative? &quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5502194,&quot;name&quot;:&quot;Alex Izydorczyk&quot;,&quot;bio&quot;:&quot;I&#8217;m working on something new in Data/Data Science. Formerly I was the Head of Data Science at Coatue. https://alexizydorczyk.com/&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ad90856-1fb7-4609-b52a-46269d6a6fc2_3801x2534.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2022-09-06T19:36:15.874Z&quot;,&quot;cover_image&quot;:null,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://magis.substack.com/p/what-makes-alternative-data-scientists&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:72135618,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:5,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Magis&quot;,&quot;publication_logo_url&quot;:&quot;&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Measuring Web Traffic is Very Hard ]]></title><description><![CDATA[An obvious metric that is anything but.]]></description><link>https://magis.substack.com/p/measuring-web-traffic-is-very-hard</link><guid isPermaLink="false">https://magis.substack.com/p/measuring-web-traffic-is-very-hard</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Tue, 26 Dec 2023 20:20:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!LjUr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e4ebfa0-a193-4a95-a18c-1f55d2eba08f_1024x1024.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Measuring web traffic - the volume and characteristics of visitors to web properties - is critical to the digital economy. Website visits are the top of the funnel to all internet driven eCommerce. Each visitor represents an opportunity to make a sale: a &#8220;person that walks into the shop.&#8221; Web visitors further make up the audience for the advertising served by that website. Understanding how many people are visiting and who these people are (or at least what characteristics describe them) is useful for selling and advertising better (the primarily monetization of digital properties). Measuring web traffic also allows for competitive insights. What other websites are customers browsing? Is slow growth a result of macroeconomic conditions or specific to a website? Web traffic is the ultimate normalized comparison between digital properties. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LjUr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e4ebfa0-a193-4a95-a18c-1f55d2eba08f_1024x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LjUr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e4ebfa0-a193-4a95-a18c-1f55d2eba08f_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!LjUr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e4ebfa0-a193-4a95-a18c-1f55d2eba08f_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!LjUr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e4ebfa0-a193-4a95-a18c-1f55d2eba08f_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!LjUr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e4ebfa0-a193-4a95-a18c-1f55d2eba08f_1024x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LjUr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e4ebfa0-a193-4a95-a18c-1f55d2eba08f_1024x1024.webp" width="380" height="380" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2e4ebfa0-a193-4a95-a18c-1f55d2eba08f_1024x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:380,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LjUr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e4ebfa0-a193-4a95-a18c-1f55d2eba08f_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!LjUr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e4ebfa0-a193-4a95-a18c-1f55d2eba08f_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!LjUr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e4ebfa0-a193-4a95-a18c-1f55d2eba08f_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!LjUr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e4ebfa0-a193-4a95-a18c-1f55d2eba08f_1024x1024.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Despite the importance, broad internet measurement is extraordinarily difficult lacking access to every server that hosts content. Measuring web traffic amounts to counting server responses to a client. However, the decentralized nature of the internet makes it difficult to get in between every such request<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>. In contrast, brick &amp; mortar sales in the United States are comparatively very centralized: Visa &amp; Mastercard make up the vast majority of payment processing<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>, the IRS collects every businesses&#8217; revenue, and the US Census can compel any business to respond to surveys<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>. It is far from perfect, but it is pretty straightforward to get accurate estimates<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>. It seems somewhat unfathomable that any entity will be similarly positioned in regards to web traffic, and even less likely such an entity would share the data with third parties.&nbsp;</p><p>Of course, this is possible in some large localized network areas &#8211; Amazon has access to records for all websites relying on AWS servers, an ISP provider may have a monopoly in a country, infrastructure providers like Cisco or Cloudflare<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>, or some very large internet agencies. One special case is walled gardens, where the measuring entity also controls publication on the platform<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>. In these economically happy situations, the entity can price advertising very efficiently because they know exactly who is seeing content.&nbsp;</p><p>A second core difficulty with web traffic measurement is that metric definitions are practically difficult to standardize. Even if every server owner offered to contribute statistics to a centralized data co-op, it would also be required that everyone measure traffic using the exact same software. Basic metrics such as users, sessions, or page views are immensely complicated to count consistently because of the intricacies and complex use patterns of web property access. For instance:&nbsp;</p><ul><li><p>Users: When counting users, do you include bot traffic (can you even with certainty determine who is a bot?)? Can you keep track of the same user if their IP or MAC address has changed? Can you track the same user across devices?&nbsp;</p></li><li><p>Page views: A page view is simply the count of unique web pages that have loaded to any user &#8211; but has at least a couple challenges. Many websites use AJAX to load partial pages and content &#8211; does an AJAX call count as a page view or not?&nbsp; In some cases, such as in a SaaS application, probably not. And in other cases, when someone is scrolling through a list that would otherwise be paginated with unique URLs, it seems like you should if you want to make like-for-like examples. Does viewing a subdomain also count as page view for the main page? What about subdomains that are in iFrames or accessed primarily through an unrelated domain (ie. a checkout page hosted by Shopify)?</p></li><li><p>Sessions: Sessions are similarly ambiguous. How long does the person need to be away or inactive for it to be a new session? It seems it should be dependent on the website content. Do we count sessions that are initiated simply by someone re-opening their browser and the browser loading their previous tabs? Can users have parallel sessions (ie. multiple tabs) or is each tab switch a new session? If users can have parallel sessions, how long can a tab stay inactive before we count it as a new session? If the user has to be physically away from their browser, how do we deal with mobile phones in which users rarely &#8220;shut down&#8221; the web browser.&nbsp;</p></li></ul><p>The point is that the definition of these metrics does not come down to simply agreeing on a standard, rather it is <em>necessarily ambiguous</em> because of the variety of web properties and methods of interacting with them. Web traffic measurement is materially different from television measurement (less ambiguous definitions of reach with discrete, non-continuous events and actions). </p><p>So, given these two difficulties: imperfect data access and necessarily ambiguous metric definitions, how well is web traffic estimated? The best public summary of commercial data providers I have found is Rand Fishkin&#8217;s November 2022 article, <em><a href="https://sparktoro.com/blog/which-3rd-party-traffic-estimate-best-matches-google-analytics">Which 3rd-Party Traffic Estimate Best Matches Google Analytics?</a> </em>The short answer is that none of the major web traffic providers are particularly amazing at making estimates of magnitude, although most providers&#8217; data are at least moderately correlated with the ground-truth as measured by the websites themselves. The core metric Fishkin reports is the percent of the time that a provider&#8217;s monthly user estimate was with +/-30% of the number reported by Google Analytics (the best commercial data provider gets within the range two thirds of the time for large websites). All the providers end up with correlation around ~0.6-0.7, indicating it is comparatively easier to get sequential trends correct (the first derivative) than absolute values<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a>.&nbsp; Fishkin doesn&#8217;t report results on rank correlation (which would answer whether these data providers get <em>relative</em> size correct). </p><p>What can be done to improve? We have been working on this problem for the past few months, and I hope to have say more soon. Nonetheless, a principled measurement approach begins with the approach I outlined a year ago:</p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:91523899,&quot;url&quot;:&quot;https://magis.substack.com/p/creating-impossible-data&quot;,&quot;publication_id&quot;:697049,&quot;publication_name&quot;:&quot;Magis&quot;,&quot;publication_logo_url&quot;:null,&quot;title&quot;:&quot;Creating Impossible Data&quot;,&quot;truncated_body_text&quot;:&quot;Every industry relies on datasets needed for competitive intelligence: the sale of every product, the web traffic to any website, or the usage of every mobile application. These types of datasets are difficult to procure either because no single organization has all the data (for example, internet infrastructure is physically distributed and decentraliz&#8230;&quot;,&quot;date&quot;:&quot;2022-12-19T02:26:43.007Z&quot;,&quot;like_count&quot;:8,&quot;comment_count&quot;:0,&quot;bylines&quot;:[{&quot;id&quot;:5502194,&quot;name&quot;:&quot;Alex Izydorczyk&quot;,&quot;handle&quot;:&quot;magis&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ad90856-1fb7-4609-b52a-46269d6a6fc2_3801x2534.jpeg&quot;,&quot;bio&quot;:&quot;I&#8217;m working on something new in Data/Data Science. Formerly I was the Head of Data Science at Coatue. https://alexizydorczyk.com/&quot;,&quot;profile_set_up_at&quot;:&quot;2022-01-18T16:48:56.621Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:631169,&quot;user_id&quot;:5502194,&quot;publication_id&quot;:697049,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:697049,&quot;name&quot;:&quot;Magis&quot;,&quot;subdomain&quot;:&quot;magis&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Alex Izydorczyk's thoughts on data, finance, and economics, focusing on DaaS (data-as-a-service) businesses. &quot;,&quot;logo_url&quot;:null,&quot;author_id&quot;:5502194,&quot;theme_var_background_pop&quot;:&quot;#2EE240&quot;,&quot;created_at&quot;:&quot;2022-01-18T16:48:03.716Z&quot;,&quot;rss_website_url&quot;:null,&quot;email_from_name&quot;:&quot;Magis by Alex from Cybersyn&quot;,&quot;copyright&quot;:&quot;Alex Izydorczyk&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false}}],&quot;twitter_screen_name&quot;:&quot;aleksizy&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://magis.substack.com/p/creating-impossible-data?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><span></span><span class="embedded-post-publication-name">Magis</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">Creating Impossible Data</div></div><div class="embedded-post-body">Every industry relies on datasets needed for competitive intelligence: the sale of every product, the web traffic to any website, or the usage of every mobile application. These types of datasets are difficult to procure either because no single organization has all the data (for example, internet infrastructure is physically distributed and decentraliz&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">3 years ago &#183; 8 likes &#183; Alex Izydorczyk</div></a></div><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Wikipedia&#8217;s <a href="https://en.wikipedia.org/wiki/Internet_governance">article</a> on internet governance is quite good. There is a persistent debate on <em><a href="https://www.thepublicdiscourse.com/2021/08/77139/">how</a></em><a href="https://www.thepublicdiscourse.com/2021/08/77139/"> centralized</a> the internet might be, but this seems to be more of a governance point than a practical reality. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>To be more accurate, Visa, Mastercard, AmericanExpress, and Discover dominate the market: <a href="https://wallethub.com/edu/cc/market-share-by-credit-card-network/25531">https://wallethub.com/edu/cc/market-share-by-credit-card-network/25531</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>More details on US Census methodologies <a href="https://www.census.gov/retail/marts/how_surveys_are_collected.html">here</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p><a href="https://www.census.gov/retail/sales.html">Monthly Advanced Retail Sales</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>Cloudflare publishes the useful <a href="https://radar.cloudflare.com/">Cloudflare Radar</a>. Cisco publishes a ranked list of websites based on <a href="https://s3-us-west-1.amazonaws.com/umbrella-static/index.html">DNS lookups</a> across their Umbrella network. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>Obviously, Meta would be the canonical example.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p>This might well be because calibrating magnitude for web traffic is particularly difficult as there is no easily definable target population of internet users (unlike say spending households or consumers). </p></div></div>]]></content:encoded></item><item><title><![CDATA[How to do Alt Data Research]]></title><description><![CDATA[A practical guide to succeeding in alternative data]]></description><link>https://magis.substack.com/p/how-to-do-alt-data-research</link><guid isPermaLink="false">https://magis.substack.com/p/how-to-do-alt-data-research</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Thu, 23 Nov 2023 18:54:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!iFoF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373500ff-e42c-496d-b687-a5403817f2f7_1152x964.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I shared a list of strongly opinionated research principles with my data science team and received positive feedback, so I figured I would share them broadly.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iFoF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373500ff-e42c-496d-b687-a5403817f2f7_1152x964.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iFoF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373500ff-e42c-496d-b687-a5403817f2f7_1152x964.png 424w, https://substackcdn.com/image/fetch/$s_!iFoF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373500ff-e42c-496d-b687-a5403817f2f7_1152x964.png 848w, https://substackcdn.com/image/fetch/$s_!iFoF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373500ff-e42c-496d-b687-a5403817f2f7_1152x964.png 1272w, https://substackcdn.com/image/fetch/$s_!iFoF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373500ff-e42c-496d-b687-a5403817f2f7_1152x964.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iFoF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373500ff-e42c-496d-b687-a5403817f2f7_1152x964.png" width="480" height="401.6666666666667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/373500ff-e42c-496d-b687-a5403817f2f7_1152x964.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:964,&quot;width&quot;:1152,&quot;resizeWidth&quot;:480,&quot;bytes&quot;:2164492,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iFoF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373500ff-e42c-496d-b687-a5403817f2f7_1152x964.png 424w, https://substackcdn.com/image/fetch/$s_!iFoF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373500ff-e42c-496d-b687-a5403817f2f7_1152x964.png 848w, https://substackcdn.com/image/fetch/$s_!iFoF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373500ff-e42c-496d-b687-a5403817f2f7_1152x964.png 1272w, https://substackcdn.com/image/fetch/$s_!iFoF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373500ff-e42c-496d-b687-a5403817f2f7_1152x964.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://magis.substack.com/subscribe?"><span>Subscribe now</span></a></p><p>The below is advice for working with alternative data, especially at a buy-side firm or a market research firm like <a href="https://cybersyn.com/">Cybersyn</a>. </p><p>I have found data scientists that come from the tech industry (where you usually work with internal data) or academia (where you rarely see, or at least care about, <em>real</em> data) often take time to learn these principles.&nbsp;When you see a hedge funds or venture firms churning through accomplished data science leaders, I suspect it is because they do not follow these principles. </p><p>When working with alternative data&#8230;&nbsp;</p><ul><li><p><strong>Always measure correlation and mean absolute error (on year-over-year values) on all the timeseries you are trying to predict or nowcast.</strong></p><ul><li><p>This is the toughest standard; if you have high correlation and MAE on YoY % values, you will have good metrics on any other derivative metric as well and won&#8217;t be &#8220;fooled&#8221; by seasonality, tricked by leverage points, etc.&nbsp;&nbsp;</p></li><li><p>You need to look at both metrics and always over the maximum length of time available for any idea you test. </p></li></ul></li><li><p><strong>Test ideas by pushing levers to extreme values</strong>&nbsp;</p><ul><li><p>Many ideas will just &#8220;do nothing&#8221; &#8211; it&#8217;s fastest and most efficient to try extreme values to see if the idea has any &#8220;leverage&#8221; on the real result. You can always finetune later.&nbsp;</p></li><li><p>You will also save yourself the compute cost&#8230; if you make a very small tweak and it does nothing&#8230; it is hard to know if it is because it was a bad idea or if it is because you tweaked the parameter by too little. Move the lever a lot to start.</p></li></ul></li><li><p><strong>Always visualize the data in a timeseries</strong></p><ul><li><p>Regardless of summary stats, always plot the data. You'll often see crazy things that are unrealistic immediately. You have to look at the plots.&nbsp;&nbsp;</p></li><li><p>Try to plot timeseries by strata - if you have a series of variables in your alternative dataset, make multiple plots where you draw a line for the aggregate by each variable. When you see big changes, you get ideas where problems are coming from.&nbsp;</p></li></ul></li></ul><ul><li><p><strong>Always have a &#8220;Strawman&#8221; experiment that is the dumbest thing possible</strong> </p><ul><li><p>If you do &#8220;the dumbest thing possible&#8221; &#8211; how does your solution compare?</p></li><li><p>This can be the &#8220;null&#8221; experiment (ie. how does it compare to existing model)</p></li><li><p>But it can also be sort of the &#8220;dumb thing&#8221; (ie. if we do no re-weighting and just take the weights as they empirically appear in the data or an AR(1)<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> model)</p></li><li><p>You will be surprised at how often the &#8220;dumbest&#8221; solution is incredibly hard to beat.&nbsp;</p></li><li><p>It is very very hard to beat linear regression in out of sample timeseries forecasting of economic variables over the long term. This upsets everyone who comes to the space from tech where modern machine learning dominates linear models in almost everything.&nbsp;</p></li></ul></li><li><p><strong>Use linear relationships where possible</strong></p><ul><li><p>Most alternative data should be linearly related to a response variable. It would be unclear why a relationship would be nonlinear and if you do anything fancy, you should know why it was necessary.&nbsp;</p></li><li><p>Again, this upsets anyone working in tech but it is a fact of life in finance. If you don&#8217;t believe me, try to speak to some quants. </p></li><li><p>At the very least, use a regression as a &#8220;Strawman&#8221; to any fancier model. Also test AR(1) models where applicable<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>.&nbsp;</p></li></ul></li><li><p><strong>Sanity checks everywhere</strong>&nbsp;</p><ul><li><p>Sadly, with alternative data, a really complex, fancy, and useful model can be made useless because of an accidental &#8220;where&#8221; clause early on in the pipeline. It&#8217;s critical to output sanity checks (ie. how many users am I left with at each step, how many did I start out with, etc.)</p></li><li><p>I have found it helpful to plot intermediate steps as well&nbsp;</p></li></ul></li><li><p><strong>Use smallest sample possible or find a way to iterate faster</strong></p><ul><li><p>It is extremely difficult to do research if you are waiting more than a few seconds between cycles. You will never get anywhere if you have to wait hours between experiments.&nbsp;</p></li></ul></li><li><p><strong>Always change one thing at a time at maximum</strong></p><ul><li><p>You can stack innovations, but you probably want to unwind contributions of ideas.&nbsp;</p></li></ul></li><li><p><strong>Don&#8217;t be afraid to &#8220;look at the micro&#8221;</strong></p><ul><li><p>It is often incredibly helpful to isolate a strata of the data: a single user, single website, single store, etc. and look longitudinally what actually happens in the data. It is a good way to get ideas and understand why trends might be appearing in aggregate.&nbsp;</p></li></ul></li><li><p><strong>80/20 rule &#8211; always do the easy, low hanging stuff first, even if it means leaving good ideas on the table</strong></p><ul><li><p>Read <a href="https://patrickcollison.com/fast">this</a> and recognize you need to produce commercially useful results in days. Do whatever it takes to do so, ignore <em>interesting</em> projects that do not move the needle. </p></li><li><p>If you can make something commercially useful, it is better to do that than incrementally improve something that is already useful.</p></li><li><p>Divide all ideas into a 2 dimensional grid with axises: Easy vs. Hard and Low Impact vs. High Impact</p></li></ul></li><li><p><strong>Get familiar with the data&#8217;s summary statistics and memorize them.</strong>&nbsp;</p><ul><li><p>Is 500 samples a lot in this dataset? How many distinct attributes can these variables have? What is the average magnitude of an individual observation?</p></li><li><p>If you do not memorize this, it is harder to have intuition or gut feelings about what problems (or not) you see in the data&nbsp;</p></li></ul></li><li><p><strong>Related to the above, when you see broken data, quantify how big the issue is as a percentage of the total data before doing any work.</strong>&nbsp;</p><ul><li><p>It is easy to notice an error (say duplicates, mislabelled data, etc.) and spend enormous effort trying to fix it only to notice that it affects 0.1% of your sample and thereby fixing it has no effect on the final effect.&nbsp;</p></li><li><p>These types of issues fall into the &#8220;Low Impact, but Hard&#8221; category and aren&#8217;t worth doing.&nbsp;The only way to know is to measure ideas against the scale of the data and you need natural intuition about this to move fast.&nbsp;&nbsp;</p></li></ul></li><li><p><strong>Never trust a data sales person with certainty about the integrity or meaning of data</strong>&nbsp;</p><ul><li><p>You can never be certain the person providing the data themselves has all the answers. Often fields are not <em>exactly</em> what is in the data dictionary or what you would assume.&nbsp;</p></li><li><p>It is almost impossible to extract certain answers; you are better off empirically testing assumptions (and you should empirically test assumptions). Often, asking sales people questions about data results in a long &#8220;game of telephone&#8221; whereby a lot is lost in translation.&nbsp;</p></li></ul></li><li><p><strong>Read the news and get familiar with &#8220;what&#8217;s reasonable&#8221;</strong></p><ul><li><p>If I told you that spending at Walmart increased 30% year-over-year in a non-Covid year, what would you say?</p></li><li><p>If the answer is not an immediate &#8220;that can&#8217;t be right, that&#8217;s way too high!&#8221; then you do not have any intuition about reasonableness. &nbsp;</p></li><li><p>Become familiar with the US population, GDP growth, consumer spending growth, inflation, etc. You do not need to memorize exact numbers, but you do need to have immediate intuition that you can look at a timeseries plot and immediately react if the data &#8220;looks unreasonable&#8221;.&nbsp;</p></li><li><p>Most investment professionals develop this intuition without thinking about it, but data scientists do not in the course of most ordinary data science jobs in tech (surprisingly!).&nbsp;</p></li></ul></li><li><p><strong>Ask real world questions about what the data represents</strong></p><ul><li><p>If you are working with credit card data, for example, remember that different cities and states have different sales taxes, that there is a difference between credit card authorizations versus processed transactions, that certain types of purchases require pre-authorization and others do not, etc.</p></li><li><p>Every alternative dataset represents something in the real world, but the real world is messy and likely the data you have only captures some part of it. Being acutely aware of what is represented helps spawn ideas.&nbsp;</p></li></ul></li><li><p><strong>Alternative data nowcasting / forecasting has (almost) nothing to do with stock returns.</strong>&nbsp;</p><ul><li><p>Alternative data is around forecasting economic realities. Those economic realities are reflected in the market in non-obvious ways based on others&#8217; expectations.&nbsp;</p></li><li><p>Michael Lewis<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> details the case of Jane Street traders correctly predicting Donald Trump would win in 2016&#8230; but then failing to make money off of the trade anyway. If you are using alternative data to invest in the stock market, remember that the &#8220;converting economic predictions to stock market predictions&#8221; is a different step. It is possible to be excellent at the former but bad at the latter.</p></li></ul></li></ul><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>This is just <a href="https://en.wikipedia.org/wiki/Autoregressive_model">ARIMA</a> but even simpler - your guess for t+1 is just t. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>I do not want to overstate the point, clearly machine learning models can be useful in timeseries, but I am underscoring it because I have the bias from data scientists on this point is so extreme. <a href="https://www.nixtla.io/">Nixtla</a>, a startup, is probably the most interesting innovators in using machine learning for timeseries. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>In the <a href="https://www.amazon.com/Going-Infinite-Rise-Fall-Tycoon/dp/1324074337">SBF book</a>.</p></div></div>]]></content:encoded></item><item><title><![CDATA[A Market Research Colossus]]></title><description><![CDATA[Nielsen in A.C. Nielsen's own words]]></description><link>https://magis.substack.com/p/a-market-research-colossus</link><guid isPermaLink="false">https://magis.substack.com/p/a-market-research-colossus</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Fri, 03 Nov 2023 20:29:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!OFqO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99740b8-2c21-4277-bf06-dc9e2ca489a6_1008x982.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I am a fan of podcasts with entrepreneurs as a source of inspiration and business history. Patrick O&#8217;Shaughnessy&#8217;s Invest like the Best<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> or Auren Hoffman&#8217;s World of DaaS<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> are favorites. I have mused on historical founders that would make interesting interviewees. One such founder would be Arthur Charles Nielsen Sr, the founder of the company(ies)<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> that still bear his name today. </p><p>Fortunately, in the case of A.C. Nielsen, the next closest thing is available: a transcript of a speech given in 1969 in which Nielsen recounts the founding, growth, and development of his company and field over the prior forty years. The speech reads like a podcast would sound today. Its entirety, including additional graphics, is <a href="https://www.worldradiohistory.com/Archive-Ratings-Documents/Greater-Prosperity-Through-Marketing-Research-A-C-Nielsen-64.pdf">available online</a>. I recommend reading through the entire speech, but I have pulled out quotes that I personally found interesting, or amusing.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OFqO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99740b8-2c21-4277-bf06-dc9e2ca489a6_1008x982.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OFqO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99740b8-2c21-4277-bf06-dc9e2ca489a6_1008x982.png 424w, https://substackcdn.com/image/fetch/$s_!OFqO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99740b8-2c21-4277-bf06-dc9e2ca489a6_1008x982.png 848w, https://substackcdn.com/image/fetch/$s_!OFqO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99740b8-2c21-4277-bf06-dc9e2ca489a6_1008x982.png 1272w, https://substackcdn.com/image/fetch/$s_!OFqO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99740b8-2c21-4277-bf06-dc9e2ca489a6_1008x982.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OFqO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99740b8-2c21-4277-bf06-dc9e2ca489a6_1008x982.png" width="544" height="529.968253968254" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f99740b8-2c21-4277-bf06-dc9e2ca489a6_1008x982.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:982,&quot;width&quot;:1008,&quot;resizeWidth&quot;:544,&quot;bytes&quot;:978357,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OFqO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99740b8-2c21-4277-bf06-dc9e2ca489a6_1008x982.png 424w, https://substackcdn.com/image/fetch/$s_!OFqO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99740b8-2c21-4277-bf06-dc9e2ca489a6_1008x982.png 848w, https://substackcdn.com/image/fetch/$s_!OFqO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99740b8-2c21-4277-bf06-dc9e2ca489a6_1008x982.png 1272w, https://substackcdn.com/image/fetch/$s_!OFqO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99740b8-2c21-4277-bf06-dc9e2ca489a6_1008x982.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h5>Nielsen was built initially on measuring sales, not media:</h5><div class="pullquote"><p>Then, at serious risk of bankruptcy, we launched a continuous marketing research service known as the NIELSEN DRUG INDEX. Seven months later, a parallel service (called NIELSEN FOOD Index) was established for the food industry.</p></div><p>The business actually started in 1923 with surveying industrial products but encountered a sort of near-death experience during which it was reborn into consumer-oriented market research firm: a crucible moment<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>. </p><p>The post-crucible firm focused on sales of products in stores, not media measurement initially. The word &#8220;Nielsen&#8221; typically evokes association with media ratings, although, those in the industry will know Nielsen measures both<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> media and product sales.</p><p></p><h5>On performance:</h5><div class="pullquote"><p>The original investors suffered along period of doubt and danger, those who hung on and were not protected, by a paternalistic government, against risk-taking have enjoyed 700-fold increase in their original investment.</p><p>Equally significant is the amazingly low turnover among clients. On a dollar basis it has averaged, for the past 30 years, only about three percent per year.</p><p>[&#8230;] the record shows an absolutely unbroken series of sales increases for 30 successive years.</p></div><p>While Nielsen was not a hyper growth company<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a> (at least by today&#8217;s standard) but it was an enduring one<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a>. It had delivered a streak of thirty years of revenue growth in 1969 and compounded its valuation at 18.2% annually at between 1923 and 1969. In comparison, the stock market compounded at 5.45% over the same period<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a>. Nielsen&#8217;s results, ownership structure, and financial returns have been varied since the 1990s, but the brand and reputation endures today. Only a very small percentage of companies public in 1969 endure today, in any form, and even fewer are doing the same thing they started doing in 1933.</p><p></p><h5>Market research and DaaS has never been sexy:</h5><p>Recounting a conversation his son had:</p><div class="pullquote"><p>What's your name, son?" </p><p>"Philip Nielsen."</p><p>"Where does your father work?" </p><p>"On Howard Street, in Chicago."</p><p>"What does he do there?"</p><p>"Market research."</p><p>"Market research? Oh, I see, he's a butcher."</p></div><p>While market research might be better known today that in the 1930s, I have little doubt that information and data-oriented businesses are not exactly well known. I still am regularly asked, by investors or candidates, <a href="https://magis.substack.com/p/some-data-on-data-companies">if there are examples of large companies that sell data</a>. </p><p></p><h5>Nielsen had a purpose driven mission:</h5><div class="pullquote"><p>It is important to recognize that increases in marketing efficiency, which create reductions in the cost of distribution, not only produce larger sales and profits for manufacturers and retailers, but also result, in a competitive economy, in lower prices to consumers, enabling them to expand their buying power and thereby enjoy a higher standard of living.</p><p>[referencing a previous building dedication] I dedicate this building to the task of furthering the science of marketing research a form of human endeavor having for its ultimate objective the increasing of the standard of living in the free countries of the world.</p></div><p>Perhaps a mission is telegraphed in the title of the speech (&#8220;Increasing Prosperity&#8221;&#8230;) but Nielsen frames the advancement of market research and data technologies as increasing market efficiency which leads to higher standards of living<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-9" href="#footnote-9" target="_self">9</a>. It is a humanitarian and accurate<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-10" href="#footnote-10" target="_self">10</a> interpretation of his field and amusing reply to those who bemoan today&#8217;s generation of programmers, data scientists, and others working on &#8220;just optimizing clicks&#8221;.                                                                                                                                               </p><h5>The basic data acquisition and research process has remained the same:</h5><div class="pullquote"><p>An accurate national sample of retail stores [&#8230;]</p><p>The owner of each store is asked in return for cash payments and/or other forms of compensation to cooperate in the research project [&#8230;]</p><p>All figures obtained from the sample of retail stores are expanded, by a complex mathematical procedure</p></div><p>More than another forty years have passed since A.C. Nielsen&#8217;s speech. </p><p>However, much has also stayed the same &#8211; for the better or worse. Nielsen&#8217;s description of his business model is remarkably similar to how market research (and specifically sales intelligence) products operate today, although, data moves faster and digital payments are paramount. Further, Nielsen&#8217;s name remains on now two companies and careful sampling and survey design is still critical to market research.</p><p>Certain things are extremely different. The world has digitized and the scale of data collection possible is orders of magnitude larger. Nielsen makes references to <em>continuous</em> sales monitoring, but by that, he means monthly or quarterly data points. While many institutions still survey at these intervals<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-11" href="#footnote-11" target="_self">11</a>, it is now possible to have almost real-time data. Also illustrative is what is <em>not</em> in the speech. Despite employing many of the same data collection techniques used today, Nielsen makes no mention of privacy, a topic that would be front and center today today.</p><p>I am left with a curiosity, and perhaps an ambition, to figure out who will speak for the next 40 years in market research at the centennial of the original speech. Maybe podcasts won&#8217;t miss their chance after all.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><a href="https://www.joincolossus.com/episodes?prod-episode-release-desc%5BrefinementList%5D%5BpodcastName%5D%5B0%5D=Invest%20Like%20the%20Best">Invest Like the Best</a>; worth noting that the episode in which Patrick interviews Auren is a favorite.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p><a href="https://www.safegraph.com/podcasts">World of DaaS</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>The corporate structure of Nielsen since 1969 is a complex topic in itself outside the scope of this post. Most recently, Nielsen was split into a media data business (&#8220;Nielsen&#8221;) and sales data business (&#8220;NielsenIQ&#8221;). This is the second time the company was split in two, with both segments bearing the <em>Nielsen</em> name.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>Crucible moment &#8212; to borrow a phrase from <a href="https://www.sequoiacap.com/series/crucible-moments/">Sequoia and their podcast</a> I have enjoyed.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>Leaving the corporate split between Nielsen and NielsenIQ aside. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>Nielsen&#8217;s amusing jab at a &#8220;paternalistic&#8221; government may betray his politics, but he is also referencing a joke earlier in the speech where he mentioned the Securities Exchange Commission (SEC) would not be amused with the false precision included in his original investment prospectus. In fact, the formation of Nielsen, the company, predates the establishment of the SEC. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p>Also, my investor readers are directed to the particular line where Nielsen cites dollar retention figures way back in 1969 (albeit, he cites gross, not net, figures). Although did you expect otherwise from a data company founder?</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><p>The S&amp;P 500. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-9" href="#footnote-anchor-9" class="footnote-number" contenteditable="false" target="_self">9</a><div class="footnote-content"><p>This has been shown relatively conclusively, starting with a <a href="https://files.givewell.org/files/DWDA%202009/Interventions/jensen2007.pdf">famous paper in the Journal of Economics</a> about the Kerala fish market.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-10" href="#footnote-anchor-10" class="footnote-number" contenteditable="false" target="_self">10</a><div class="footnote-content"><p>IBID</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-11" href="#footnote-anchor-11" class="footnote-number" contenteditable="false" target="_self">11</a><div class="footnote-content"><p>Unfortunately&#8230; </p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;04c97221-4c4e-433f-81e5-68a91dda1fcd&quot;,&quot;caption&quot;:&quot;Pick up a macroeconomic textbook, and you&#8217;ll read chapters of theory, mathematical equations, and models of rational behavior. Heavy on mathematical and social sciences, such books are light on one essential thing: strong evidence that the theories, equations, and models correspond to the real world.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Datanomics&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5502194,&quot;name&quot;:&quot;Alex Izydorczyk&quot;,&quot;bio&quot;:&quot;I&#8217;m working on something new in Data/Data Science. Formerly I was the Head of Data Science at Coatue. https://alexizydorczyk.com/&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ad90856-1fb7-4609-b52a-46269d6a6fc2_3801x2534.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2022-02-13T23:23:44.021Z&quot;,&quot;cover_image&quot;:null,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://magis.substack.com/p/datanomics&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:48708109,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:10,&quot;comment_count&quot;:2,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Magis&quot;,&quot;publication_logo_url&quot;:&quot;&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></div></div>]]></content:encoded></item><item><title><![CDATA[My Story with Snowflake ]]></title><description><![CDATA[A podcast about my background with Snowflake]]></description><link>https://magis.substack.com/p/my-story-with-snowflake</link><guid isPermaLink="false">https://magis.substack.com/p/my-story-with-snowflake</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Wed, 04 Oct 2023 14:55:49 GMT</pubDate><enclosure url="https://i.scdn.co/image/ab6765630000ba8a3d396a8d9c91fa997887b03a" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I had the opportunity to participate on Snowflake&#8217;s Data Cloud podcast. We chat Snowflake, Coatue, and Cybersyn as well as the requisite AI topic: </p><iframe class="spotify-wrap podcast" data-attrs="{&quot;image&quot;:&quot;https://i.scdn.co/image/ab6765630000ba8a3d396a8d9c91fa997887b03a&quot;,&quot;title&quot;:&quot;Powering Better Decision and Policy Making Through Integrated Data with Alex Izydorczyk, Founder and CEO at Cybersyn&quot;,&quot;subtitle&quot;:&quot;Snowflake&quot;,&quot;description&quot;:&quot;Episode&quot;,&quot;url&quot;:&quot;https://open.spotify.com/episode/7hrtBh4J2iDsaATA1kv49I&quot;,&quot;belowTheFold&quot;:false,&quot;noScroll&quot;:false}" src="https://open.spotify.com/embed/episode/7hrtBh4J2iDsaATA1kv49I" frameborder="0" gesture="media" allowfullscreen="true" allow="encrypted-media" data-component-name="Spotify2ToDOM"></iframe><p></p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Free as in freedom, not as in beer, Pt. 2]]></title><description><![CDATA[Open Data Challenges at Cybersyn: Point-in-time data, schema migrations, not-actually-free licenses, among others.]]></description><link>https://magis.substack.com/p/free-as-in-freedom-not-as-in-beer-ef2</link><guid isPermaLink="false">https://magis.substack.com/p/free-as-in-freedom-not-as-in-beer-ef2</guid><dc:creator><![CDATA[Alex Izydorczyk]]></dc:creator><pubDate>Sun, 01 Oct 2023 21:04:32 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/4b030a0e-3a3d-499b-a830-00f56093d6c0_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Governments and other organizations often publish &#8220;open&#8221; data. This data is often freely available, but the cost of effectively using it is often significant. In a previous essay<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, I described some of the initial challenges and contemplated solutions in operationalizing and distributing open data. The below is an extended list of additional specific challenges <a href="http://cybersyn.com/">Cybersyn</a> is contemplating as we continue to distribute this data.&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZWcV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaf92533-44c2-4475-b557-15a5ad0ae06b_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZWcV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaf92533-44c2-4475-b557-15a5ad0ae06b_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!ZWcV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaf92533-44c2-4475-b557-15a5ad0ae06b_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!ZWcV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaf92533-44c2-4475-b557-15a5ad0ae06b_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!ZWcV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaf92533-44c2-4475-b557-15a5ad0ae06b_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZWcV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaf92533-44c2-4475-b557-15a5ad0ae06b_1024x1024.png" width="260" height="260" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/caf92533-44c2-4475-b557-15a5ad0ae06b_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:260,&quot;bytes&quot;:1710860,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZWcV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaf92533-44c2-4475-b557-15a5ad0ae06b_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!ZWcV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaf92533-44c2-4475-b557-15a5ad0ae06b_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!ZWcV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaf92533-44c2-4475-b557-15a5ad0ae06b_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!ZWcV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaf92533-44c2-4475-b557-15a5ad0ae06b_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Point-in-time Data</strong>&nbsp;</p><ul><li><p>It is often useful to understand how data has changed over time. The historical values of some property are valuable, as is the record of when those values changed. Scientific and engineering cultures refer to this concept as point-in-time (PIT) data, bi-temporality<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>, or vendor-specific names like Snowflake&#8217;s Time Travel<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>.</p><ul><li><p>Capturing revisions as published by the source:</p><ul><li><p>Many public data sources issue restatements or revisions. For example, the Bureau of Labor Statistics will often restate historical unemployment or inflation numbers based on more accurate data becoming available. The challenge here is just organizing this data&nbsp;</p></li></ul></li><li><p>Capturing revisions not published by the source&nbsp;</p><ul><li><p>Other public data sources will revise data but permanently overwrite or delete historical values. This means that a consumer of the data today cannot effectively view the history of changes or reproduce what they would have seen had they looked at it in a past point in time. The only solution for the data distributor, Cybersyn, to snapshot the data and maintain the revision history internally.</p></li></ul></li><li><p>Capturing revisions and errors incurred by us&nbsp;</p><ul><li><p>As a data distributor, there is value in keeping a record of actual actions taken or omitted by us, even if they have nothing to do with the underlying data source. This allows customers to understand how relying on data delivery via Cybersyn&nbsp;would impact them and prevents the consumer from recursively incurring the above costs. </p></li></ul></li></ul></li></ul><p><strong>Schema Migrations</strong>&nbsp;</p><ul><li><p>Schema migrations introduced by the source</p><ul><li><p>The publisher of a data source may often change the underlying schema of a data source. This can happen because some data is no longer available, new values become available, or a methodology has changed.&nbsp;</p></li><li><p>A particularly problematic version of this issue occurs when a data source elects to stop publishing statistics. For instance, the USPS has recently indicated that it will stop publishing population migration data<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>.&nbsp;</p></li></ul></li><li><p>Schema migrations introduced by us</p><ul><li><p>As a data distributor, structuring data into a consumable and joining format requires developing a schema that anticipates future improvements. It has proven extremely difficult to anticipate all future changes or improvements that get thought of in the future, so the schema migrates to reflect improvements or ergonomics.&nbsp;</p></li></ul></li></ul><p><strong>Licenses</strong></p><ul><li><p>Copyleft agreements&nbsp;</p><ul><li><p>Certain public datasets are free, but require attribution or adherence to other terms. Often, these terms need to be passed along through any derived works and impose restrictions on both data intermediaries or end users. The philosophy of Copyleft aside, a data product that uses a combination often.&nbsp;&nbsp;</p></li></ul></li><li><p>Not-actually-free licenses</p><ul><li><p>In other cases, public datasets contain elements that may not actually be freely licensed. While the publishing agency may avoid scrutiny because of their status as a government agency or non-profit, using those data elements in a commercial product could create serious liabilities. For example, CUSIP codes that identify securities are a proprietary identification system that requires a license to redistribute. Yet, CUSIP codes appear in many ostensibly public domain datasets published by government agencies. It is not clear on what terms those agencies are providing the CUSIP data, and the issue of licensing appears to be a &#8220;gray&#8221; area. Certain organizations, such as the Small Business Administration, have started moving away from situations (in the SBA&#8217;s case, moving away from Dun &amp; Bradstreet&#8217;s DUNN number) but historical data still relies on such identifiers.&nbsp;</p></li></ul></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!z1Hj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d7b8ba-55b1-4af3-b6f0-05b1ee981f22_929x697.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!z1Hj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d7b8ba-55b1-4af3-b6f0-05b1ee981f22_929x697.png 424w, https://substackcdn.com/image/fetch/$s_!z1Hj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d7b8ba-55b1-4af3-b6f0-05b1ee981f22_929x697.png 848w, https://substackcdn.com/image/fetch/$s_!z1Hj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d7b8ba-55b1-4af3-b6f0-05b1ee981f22_929x697.png 1272w, https://substackcdn.com/image/fetch/$s_!z1Hj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d7b8ba-55b1-4af3-b6f0-05b1ee981f22_929x697.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!z1Hj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d7b8ba-55b1-4af3-b6f0-05b1ee981f22_929x697.png" width="570" height="427.65339074273413" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/24d7b8ba-55b1-4af3-b6f0-05b1ee981f22_929x697.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:697,&quot;width&quot;:929,&quot;resizeWidth&quot;:570,&quot;bytes&quot;:137029,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!z1Hj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d7b8ba-55b1-4af3-b6f0-05b1ee981f22_929x697.png 424w, https://substackcdn.com/image/fetch/$s_!z1Hj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d7b8ba-55b1-4af3-b6f0-05b1ee981f22_929x697.png 848w, https://substackcdn.com/image/fetch/$s_!z1Hj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d7b8ba-55b1-4af3-b6f0-05b1ee981f22_929x697.png 1272w, https://substackcdn.com/image/fetch/$s_!z1Hj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d7b8ba-55b1-4af3-b6f0-05b1ee981f22_929x697.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The right direction!</figcaption></figure></div><p><strong>Machine Readability:</strong></p><ul><li><p>Website blockers</p><ul><li><p>Many ostensibly public datasets are available only behind logins, rate-limited API endpoints, or lack of bulk download options. It remains a mystery to me, for instance, why the Delaware Division of Corporations forbids automated web scraping while not making bulk download options available. Why shouldn&#8217;t public records be public?</p></li></ul></li><li><p>Data formats</p><ul><li><p>Data formats are incredibly varied. Often, former standards become inconveniences today: for instance, XML, a standard championed in the mid-2000s, has fallen out in favor of JSON but many open data sources still rely on various versions of the XML standard. Only partially adopted standards pose another challenge: data may be selectively available in XBRL<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> or RDF/SparQL<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a> accessible formats. So, building a complete dataset involves stitching together formats even from the same source. For reasons outside the scope of this article, Cybersyn provides all data in relational tables in a SQL data warehouse. This necessitates consolidating all of these formats. Regardless of the preferred data format one chooses, inevitably some data sources will be challenges to consume.&nbsp;</p></li></ul></li><li><p>Upstream Aggregators:</p><ul><li><p>Finally, it is often convenient to retrieve data from sources that, themselves, already aggregate data from underlying sources. For instance, the <a href="https://www.datacommons.org/">Data Commons Project</a> or the St Louis Fed <a href="https://fred.stlouisfed.org/">FRED</a> system. Retrieving data from these sources, while convenient, introduces complexity. All the above issues suddenly multiply as they can all be incurred by the aggregator in addition to being incurred by Cybersyn.&nbsp;</p></li></ul></li></ul><p><strong>Documentation</strong></p><ul><li><p>Standardizing documentation across sources is a significant effort. Documentation for public data sources is scattered across formal documents, websites, and files included alongside the data. Documentation for previous versions of the data is often not present or formatted entirely different.&nbsp;</p></li><li><p>The most important goal of Cybersyn&#8217;s <a href="https://docs.cybersyn.com/">documentation</a> is to make data discoverable. We have failed if someone looking for data we publish does not find it. The design challenges here have surprised me<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a>.</p></li></ul><p>If these problems appeal to you, we are hiring. If you do not see a role that specifically applies to you but these problems sound interesting, please reach out.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://magis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Magis! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://jobs.ashbyhq.com/Cybersyn&quot;,&quot;text&quot;:&quot;Cybersyn Careers&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://jobs.ashbyhq.com/Cybersyn"><span>Cybersyn Careers</span></a></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;f2d0e40a-f7b4-4484-a60e-aabf889e46ed&quot;,&quot;caption&quot;:&quot;Governments publish a vast variety of economic data at an impressive level of depth. In the United States, the Bureau of Labor Statistics alone publishes more than eight hundred thousand monthly time series, covering hundreds of geographic regions. Individual time series can track data as specific as the wages of workers in a specific-sized restaurant b&#8230;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Free as in freedom, not as in beer.&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:5502194,&quot;name&quot;:&quot;Alex Izydorczyk&quot;,&quot;bio&quot;:&quot;I&#8217;m working on something new in Data/Data Science. Formerly I was the Head of Data Science at Coatue. https://alexizydorczyk.com/&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ad90856-1fb7-4609-b52a-46269d6a6fc2_3801x2534.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-02-14T23:23:10.767Z&quot;,&quot;cover_image&quot;:null,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://magis.substack.com/p/free-as-in-freedom-not-as-in-beer&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:102945027,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Magis&quot;,&quot;publication_logo_url&quot;:&quot;&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Martin Fowler&#8217;s <a href="https://martinfowler.com/articles/bitemporal-history.html">blog post</a> on this subject is excellent.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p><a href="https://docs.snowflake.com/en/user-guide/data-time-travel">https://docs.snowflake.com/en/user-guide/data-time-travel</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p><a href="https://about.usps.com/who/legal/foia/library.htm">https://about.usps.com/who/legal/foia/library.htm</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p><a href="https://www.xbrl.org/">XBRL</a> is meant to be a structured format for financial reporting, primarily used by the SEC in the United States.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>Details on RDF/SparQL: <a href="https://www.w3.org/TR/sparql11-query/">https://www.w3.org/TR/sparql11-query/</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p>If you happen to have feedback on our docs, please email me.</p></div></div>]]></content:encoded></item></channel></rss>