<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>watsy0007</title>
<link>https://watsy0007.com/</link>
<atom:link href="https://watsy0007.com/index.xml" rel="self" type="application/rss+xml"/>
<description>data, statistics, and programming</description>
<generator>quarto-1.5.56</generator>
<lastBuildDate>Mon, 26 Aug 2024 00:00:00 GMT</lastBuildDate>
<item>
  <title>I Build My Own Terminal Configuration</title>
  <link>https://watsy0007.com/blog/i_build_my_own_terminal_configuration/</link>
  <description><![CDATA[ 





<section id="why" class="level1">
<h1>Why</h1>
<p>Throughout my career as a developer, I’ve used numerous tools, including various IDEs, database management software, and a wide range of system utilities. I’ve even experimented with more than 20 keyboards. However, as I’ve grown older, I’ve come to realized that time is a precious, and I’ve been wasting too much of it on these trivial distractions. I found myself lost in the constant juggling of new tools and gadgets.</p>
<p>Now, it’s time to take responsibility for my life and focus on long-term goals. I’m striving to do less but think more deeply about how to improve my quality of life.</p>
<p>This realization led me to create my own terminal configuration. Having used MacOS for nearly as decade and worked as a backend engineer for eight years. I can use terminal to do most of my work.</p>
<p>Another catalyst for this change was reading about DHH’s decision to leave Apple and switch to Linux, It inspired me and made me feel it was time for a change in my own workflow.</p>
<p>Lastly, I want to reduce my dependence on constantly evolving enterprise tools. These applications frequently change their UI, introduce flashy new features, and deprecate others. While I understand the need for companies to innovate and make money, it often comes at the cost of user experience and knowledge. I’ve had my expertise rendered obsolete server times when tools I relied on changed dramatically or were discontinued. I’m tired of this cycle and want to minimize the risk. My goal is to use <strong>simple</strong>, <strong>effective tools</strong> that <strong>help me do my work</strong> without unnecessary complexity.</p>
</section>
<section id="goal" class="level1">
<h1>Goal</h1>
<p>Give these concerns, my objectives are clear: I want to use tools that can genuinely improve my coding productivity. Here are my criteria:</p>
<ol type="1">
<li><strong>Open Source</strong>: If the creator abandons the project, I can still use and potentially maintain my own version.</li>
<li><strong>Customizable</strong>: I should be able to disable features I don’t need.</li>
<li><strong>Familiar</strong>: I should still be able to use my preferred tools to reduce the stress of migration.</li>
<li><strong>Keyboard-centric</strong>: I want to use the keyboard as much as possible for efficiency.</li>
</ol>
<p>Following these criteria, I chose NeoVim to replace other IDEs, When selecting a terminal emulator among options like Kitty, WezTerm, iTerm2, and the system default terminal. I opted for WezTerm. This choice was influenced by the fact that I can use Lua to configure both NeoVim and WezTerm.</p>
</section>
<section id="how" class="level1">
<h1>How</h1>
<p>Now, let me introduce you to my terminal configuration step. My toolkit consists of WezTerm, NeoVim, Sketchybar, yabai and skhd. Let’s explore each of these components in detail.</p>
<section id="wezterm-1" class="level2">
<h2 class="anchored" data-anchor-id="wezterm-1">WezTerm</h2>
<p>For my WezTerm configuration, I drew inspiration from the article <a href="https://alexplescan.com/posts/2024/08/10/wezterm/">Okay, I really like WezTerm</a> This comprehensive guide introduced me to WezTerm’s basic features and taught me how to configure them using Lua. Most of my WezTerm configuration is adapted from this excellent article.</p>
</section>
<section id="neovim-1" class="level2">
<h2 class="anchored" data-anchor-id="neovim-1">NeoVim</h2>
<p>When it came to configuring NeoVim, I initially considered several out-of-the-box solutions like DoomVim, LunarVim, AstroVim, etc. My first instinct was to build my configuration from scratch, so I began by reading the official <a href="https://neovim.io/doc/user/index.html">NeoVim documentation</a>. However, after spending an entire afternoon poring over the docs. I realized that NeoVim’s complexity made it impractical for me to start from zero at this stage.</p>
<p>I changed my approach, deciding to use third-party packages to achieve my goals while keeping things as minimal as necessary. Curious about DHH’s NeoVim setup, I investigated the <a href="https://github.com/basecamp/omakub">omakub</a> repository and was impressed by his ingenious solution. DHH use the <a href="https://github.com/LazyVim/starter">starter</a> template for LazyVim, a configuration framework I was already familiar with. This discovery made it convenient for me to begin with the starter template. As a result, my NeoVim configuration is almost fully based on omakub’s setup.</p>
</section>
<section id="sketchybar-1" class="level2">
<h2 class="anchored" data-anchor-id="sketchybar-1">Sketchybar</h2>
<p>Sketchybar is is a highly customizable replacement for the MacOS menu bar. Before discovering Sketchybar, I used <a href="https://www.macbartender.com/">bartender</a> and <a href="https://bjango.com/mac/istatmenus/">iStat Menus</a> to manage and simplify my status bar. Sketchybar offerd a popular, open-source alternative that aligned with my goals. Here is my <a href="https://github.com/watsy0007/sketchybar">sketchybar configuration link</a></p>
</section>
<section id="yabai-skhd-1" class="level2">
<h2 class="anchored" data-anchor-id="yabai-skhd-1">Yabai + skhd</h2>
<p>The final piece of my setup, which I initially considered least important but now use frequently, is a solution for controlling windows with the keyboard. Following common practice in the MacOS power-user community, I chose yabai for window management and skhd for custom keyboard shortcuts.</p>
<p>Yabai allows for advanced window tiling and management, while skhd enables me to create custom keyboard shortcuts for various and application controls. here is my <a href="https://gist.github.com/watsy0007/b6cb19b2fdd38655c2545d3a4a36957b">yabai configuration link</a>, and here is my <a href="https://gist.github.com/watsy0007/e08bfebc74acefe518fc2af504edef07">skhd configuration link</a></p>
<p>This combination has significantly improved my workflow, making window management effortless even though I was already comfortable with trackpad gestures.</p>
</section>
</section>
<section id="conclusion" class="level1">
<h1>Conclusion</h1>
<p>In this post, I’ve explained my motivations for building a custom terminal configuration and provided an overview of the tools I’ve chosen. While I haven’t delved into the specifics of each configuration. I’ve shared links to my setup files for those interested in replicating or adapting my approach.</p>
<p>This journey has not only improve my productivity but also deepened my understand of the tools I use daily. As I continue to use and refine this setup, I look forward to further optimizations and discoveries. If you have any questions or suggestions about my configuration, please let me know.</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a></div></div></section></div> ]]></description>
  <category>productivity</category>
  <guid>https://watsy0007.com/blog/i_build_my_own_terminal_configuration/</guid>
  <pubDate>Mon, 26 Aug 2024 00:00:00 GMT</pubDate>
  <media:content url="https://watsy0007.com/blog/i_build_my_own_terminal_configuration/i-build-my-own-terminal-configuration-thumpnail.webp" medium="image" type="image/webp"/>
</item>
<item>
  <title>Analyzing Blockchain Data with DuckDB: Data Preparation</title>
  <link>https://watsy0007.com/blog/analyzing_blockchain_data_with_duckdb_1/</link>
  <description><![CDATA[ 





<p>I’ve been using <code>DuckDB</code> as a replacement for <code>pandas</code> and <code>Python</code> for data processing tasks. It’s proven to be incredibly convenient.</p>
<p>A friend recently asked about my use of <code>DuckDB</code> in daily work, which inspired me to write a series of articles. This is the first in that series, focusing on how to use <code>DuckDB</code> for intial data processing.</p>
<section id="use-case-analyzing-eth-address-transactions" class="level1">
<h1>Use Case: Analyzing ETH Address Transactions</h1>
<p>Let’s consider a common scenario: analyzing the transaction information of a specific Ethereum (<code>ETH</code>) address.</p>
</section>
<section id="the-traditional-process-vs-duckdb" class="level1">
<h1>The Traditional Process vs DuckDB</h1>
<p>Previously, this process involved several steps:</p>
<ol type="1">
<li>Fetching data using <code>requests</code></li>
<li>Using <code>pandas</code> to preview and clean the data</li>
<li>Finally, analyzing the data using <code>DuckDB</code></li>
</ol>
<p>The data preparation stage typically involves three essential steps:</p>
<ol type="1">
<li>Handing pagination for third-party APIs</li>
<li>Extracting valid fields from the returned data</li>
<li>Cleaning the data, including null values and ata types</li>
</ol>
<p>In this article, we’ll demonstrate how to streamline this process using the <code>blockscout</code> API<sup>1</sup>, <code>DuckDB UDF</code><sup>2</sup>, and <code>DuckDB Macro</code><sup>3</sup>.</p>
</section>
<section id="demonstration" class="level1">
<h1>Demonstration</h1>
<p>We’ll explore two solutions for processing data using <code>DuckDB</code>, First let’s import related dependencies</p>
<div id="7ef32ee6" class="cell" data-execution_count="1">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> duckdb</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> requests</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> duckdb.typing <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> VARCHAR, INTEGER, DuckDBPyType</span>
<span id="cb1-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> json</span></code></pre></div>
</details>
</div>
<section id="solution-1-explorer-implementation" class="level2">
<h2 class="anchored" data-anchor-id="solution-1-explorer-implementation">Solution 1: Explorer Implementation</h2>
<p>This approach focuses on quickly retrieving and processing transaction data. It’s ideal for rapid analysis and verification during the development stage.</p>
<p>Get the transaction information of the <code>ETH</code> address through the <code>blockscout</code> API<sup>4</sup>, the code is as follows:</p>
<div id="d55a89e8" class="cell" data-execution_count="2">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> blockscout_api(module: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>, action: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>, address: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>, start_block: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>, end_block: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>, page: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>, offset: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>]:</span>
<span id="cb2-2">    url_prefix <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'https://eth.blockscout.com/api?module=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>module<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">&amp;action=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>action<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span></span>
<span id="cb2-3">    </span>
<span id="cb2-4">    result <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb2-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">while</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>:</span>
<span id="cb2-6">        url <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>url_prefix<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">&amp;address=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>address<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">&amp;startblock=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>start_block<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">&amp;endblock=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>end_block<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">&amp;page=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>page<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">&amp;offset=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>offset<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">&amp;sort=asc'</span></span>
<span id="cb2-7">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'query page </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>page<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb2-8">        data <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> requests.get(url).json()</span>
<span id="cb2-9">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'message'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'OK'</span>:</span>
<span id="cb2-10">            items <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> data[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'result'</span>]</span>
<span id="cb2-11">            result.extend(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">map</span>(json.dumps,items))</span>
<span id="cb2-12">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb2-13">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">break</span></span>
<span id="cb2-14">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(items) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> offset:</span>
<span id="cb2-15">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">break</span></span>
<span id="cb2-16">        page <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb2-17">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> result</span></code></pre></div>
</details>
</div>
<p>Register the custom function of <code>DuckDB</code></p>
<div id="928250b7" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1">conn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> duckdb.<span class="ex" style="color: null;
background-color: null;
font-style: inherit;">connect</span>()</span>
<span id="cb3-2">conn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> conn.create_function(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'blockscout_api'</span>, blockscout_api)</span></code></pre></div>
</div>
<p>Define the macro of <code>DuckDB</code>, here for demonstration, limit the page and offset, and adjust according to the actual situation when actually using. Note the output <code>query page 1</code> and <code>query page 2</code> below</p>
<div id="1c1cce87" class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1">conn.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb4-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">CREATE OR REPLACE MACRO blockscout_trxs(address, start_block, end_block) as table </span></span>
<span id="cb4-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    select blockscout_api('account', 'txlist', address, start_block, end_block, 1, 2) as data</span></span>
<span id="cb4-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span>)</span></code></pre></div>
</div>
<p>Query the transaction information of the <code>ETH</code> address</p>
<div id="63a3ff16" class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">conn.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb5-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">with raw_transactions as (</span></span>
<span id="cb5-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    select unnest(data) as trx from blockscout_trxs('0x603602E9A2ac7f1E26717C2b2193Fd68f5fafFf6', 20485198, 20490674)</span></span>
<span id="cb5-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">), decode_transactions as (</span></span>
<span id="cb5-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">select </span></span>
<span id="cb5-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    trx-&gt;'$.blockHash' as block_hash,</span></span>
<span id="cb5-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    (trx-&gt;'$.blockNumber')::integer as block_number,</span></span>
<span id="cb5-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    (trx-&gt;'$.timeStamp')::integer as timestamp,</span></span>
<span id="cb5-9"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    to_timestamp(timestamp) as datetime,</span></span>
<span id="cb5-10"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    trx-&gt;'$.hash' as hash,</span></span>
<span id="cb5-11"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    (trx-&gt;'$.transactionIndex')::integer as transaction_index,</span></span>
<span id="cb5-12"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    trx-&gt;'$.from' as 'from',</span></span>
<span id="cb5-13"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    trx-&gt;'$.to' as 'to',</span></span>
<span id="cb5-14"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    trx-&gt;'$.value' as value,</span></span>
<span id="cb5-15"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    trx-&gt;'$.contractAddress' as contract_address,</span></span>
<span id="cb5-16"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    (trx-&gt;'$.gas')::integer as gas,</span></span>
<span id="cb5-17"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    (trx-&gt;'$.gasPrice')::bigint as gas_price,</span></span>
<span id="cb5-18"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    (trx-&gt;'$.gasUsed')::integer as gas_used,</span></span>
<span id="cb5-19"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    trx-&gt;'$.isError' as is_error,</span></span>
<span id="cb5-20"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    trx-&gt;'$.txreceipt_status' as txreceipt_status,</span></span>
<span id="cb5-21"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    trx-&gt;'input' as 'input'</span></span>
<span id="cb5-22"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">from raw_transactions</span></span>
<span id="cb5-23"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">)</span></span>
<span id="cb5-24"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">select </span></span>
<span id="cb5-25"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  block_number,</span></span>
<span id="cb5-26"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  datetime,</span></span>
<span id="cb5-27"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  hash,</span></span>
<span id="cb5-28"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  'from',</span></span>
<span id="cb5-29"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  'to',</span></span>
<span id="cb5-30"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  value,</span></span>
<span id="cb5-31"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">from decode_transactions</span></span>
<span id="cb5-32"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span>).df()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>query page 1
query page 2</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="5">
<div>


<table class="dataframe caption-top table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">block_number</th>
<th data-quarto-table-cell-role="th">datetime</th>
<th data-quarto-table-cell-role="th">hash</th>
<th data-quarto-table-cell-role="th">'from'</th>
<th data-quarto-table-cell-role="th">'to'</th>
<th data-quarto-table-cell-role="th">value</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>20485198</td>
<td>2024-08-08 16:55:23+00:00</td>
<td>"0x16e9d0643ce6bf9bc59d5e6c756a196af2941cefc46...</td>
<td>from</td>
<td>to</td>
<td>"500000000000000000"</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>20488106</td>
<td>2024-08-09 02:38:47+00:00</td>
<td>"0x3f29ab5ba5779df75aee038cb9d529ab7d7e94ff727...</td>
<td>from</td>
<td>to</td>
<td>"500000000000000000"</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>20490674</td>
<td>2024-08-09 11:14:23+00:00</td>
<td>"0xcba85af304112c712c978968ff19fb150cdfd18e1f4...</td>
<td>from</td>
<td>to</td>
<td>"200000000000000000"</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</section>
<section id="solution-2-advanced-implementation-with-field-constraints" class="level2">
<h2 class="anchored" data-anchor-id="solution-2-advanced-implementation-with-field-constraints">Solution 2: Advanced Implementation with Field constraints</h2>
<p>This solution is more robust, suitable for production environments. It addresses potential issues like API field changes and null values in the returned data.</p>
<p>Declare the required fields and types</p>
<div id="92ab5aa7" class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1">fields <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {</span>
<span id="cb7-2">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'blockHash'</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>,</span>
<span id="cb7-3">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'blockNumber'</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>,</span>
<span id="cb7-4">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'timeStamp'</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>,</span>
<span id="cb7-5">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'hash'</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>,</span>
<span id="cb7-6">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'transactionIndex'</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>,</span>
<span id="cb7-7">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from'</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>,</span>
<span id="cb7-8">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'to'</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>,</span>
<span id="cb7-9">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'value'</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>,</span>
<span id="cb7-10">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'contractAddress'</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>,</span>
<span id="cb7-11">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'gas'</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>,</span>
<span id="cb7-12">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'gasPrice'</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>,</span>
<span id="cb7-13">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'gasUsed'</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>,</span>
<span id="cb7-14">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'isError'</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>,</span>
<span id="cb7-15">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'txreceipt_status'</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>,</span>
<span id="cb7-16">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'input'</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>,</span>
<span id="cb7-17">}</span></code></pre></div>
</div>
<p>Request the <code>blockscout</code> API<sup>5</sup> and extract valid fields</p>
<div id="34a4c931" class="cell" data-execution_count="7">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1">field_keys <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fields.keys()</span>
<span id="cb8-2"></span>
<span id="cb8-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> blockscout_api_with_fields(module: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>, action: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>, address: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>, start_block: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>, end_block: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>, page: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>, offset: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>):</span>
<span id="cb8-4">    url_prefix <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'https://eth.blockscout.com/api?module=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>module<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">&amp;action=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>action<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span></span>
<span id="cb8-5">    result <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb8-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">while</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>:</span>
<span id="cb8-7">        url <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>url_prefix<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">&amp;address=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>address<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">&amp;startblock=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>start_block<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">&amp;endblock=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>end_block<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">&amp;page=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>page<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">&amp;offset=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>offset<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">&amp;sort=asc'</span></span>
<span id="cb8-8">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'query page </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>page<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> -&gt; </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>url<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb8-9">        resp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> requests.get(url).json()</span>
<span id="cb8-10">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> resp[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'message'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'OK'</span>:</span>
<span id="cb8-11">            items <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> resp[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'result'</span>]</span>
<span id="cb8-12">            result.extend([{f: i[f] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> f <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> field_keys} <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> items])</span>
<span id="cb8-13">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(items) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> offset:</span>
<span id="cb8-14">                <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">break</span></span>
<span id="cb8-15">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb8-16">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">break</span></span>
<span id="cb8-17">        page <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb8-18">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> result</span></code></pre></div>
</details>
</div>
<p>Register the custom function of <code>DuckDB</code>, note the adjustment of <code>page</code> and <code>offset</code>, only get 1 page of data, no pagination demonstration.</p>
<div id="57aa9690" class="cell" data-execution_count="8">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1">conn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> duckdb.<span class="ex" style="color: null;
background-color: null;
font-style: inherit;">connect</span>()</span>
<span id="cb9-2">conn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> conn.create_function(blockscout_api_with_fields.<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">__name__</span>, blockscout_api_with_fields, [VARCHAR, VARCHAR, VARCHAR, INTEGER, INTEGER, INTEGER, INTEGER], DuckDBPyType(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>[fields]))</span>
<span id="cb9-3">conn.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb9-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">CREATE OR REPLACE MACRO blockscout_trxs_with_fields(address, start_block, end_block) as table </span></span>
<span id="cb9-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    select blockscout_api_with_fields('account', 'txlist', address, start_block, end_block, 1, 5) as data</span></span>
<span id="cb9-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span>)</span></code></pre></div>
</details>
</div>
<p>Query the transaction information of the <code>ETH</code> address</p>
<div id="8b90d921" class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1">conn.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb10-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">with raw_transactions as (</span></span>
<span id="cb10-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    select unnest(data) as trx from blockscout_trxs_with_fields('0x603602E9A2ac7f1E26717C2b2193Fd68f5fafFf6', 20485198, 20490674)</span></span>
<span id="cb10-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">), flatten_transactions as (</span></span>
<span id="cb10-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  select unnest(trx) from raw_transactions</span></span>
<span id="cb10-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">)</span></span>
<span id="cb10-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">select </span></span>
<span id="cb10-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  blockNumber as block_number,</span></span>
<span id="cb10-9"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  to_timestamp(timeStamp) as datetime,</span></span>
<span id="cb10-10"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  hash,</span></span>
<span id="cb10-11"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  'from',</span></span>
<span id="cb10-12"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  'to',</span></span>
<span id="cb10-13"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  value</span></span>
<span id="cb10-14"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">from flatten_transactions</span></span>
<span id="cb10-15"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span>).df()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>query page 1 -&gt; https://eth.blockscout.com/api?module=account&amp;action=txlist&amp;address=0x603602E9A2ac7f1E26717C2b2193Fd68f5fafFf6&amp;startblock=20485198&amp;endblock=20490674&amp;page=1&amp;offset=5&amp;sort=asc</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="9">
<div>


<table class="dataframe caption-top table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">block_number</th>
<th data-quarto-table-cell-role="th">datetime</th>
<th data-quarto-table-cell-role="th">hash</th>
<th data-quarto-table-cell-role="th">'from'</th>
<th data-quarto-table-cell-role="th">'to'</th>
<th data-quarto-table-cell-role="th">value</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>20485198</td>
<td>2024-08-08 16:55:23+00:00</td>
<td>0x16e9d0643ce6bf9bc59d5e6c756a196af2941cefc467...</td>
<td>from</td>
<td>to</td>
<td>500000000000000000</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>20488106</td>
<td>2024-08-09 02:38:47+00:00</td>
<td>0x3f29ab5ba5779df75aee038cb9d529ab7d7e94ff7277...</td>
<td>from</td>
<td>to</td>
<td>500000000000000000</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>20490674</td>
<td>2024-08-09 11:14:23+00:00</td>
<td>0xcba85af304112c712c978968ff19fb150cdfd18e1f48...</td>
<td>from</td>
<td>to</td>
<td>200000000000000000</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</section>
</section>
<section id="summary" class="level1">
<h1>Summary</h1>
<p>By leveraging <code>UDF</code><sup>6</sup> and <code>Macro</code><sup>7</sup> of <code>DuckDB</code>, we can significantly simplify the data processing workflow. This approach makes data analysis more efficient and results in cleaner, more maintainable code.</p>
<p>For day-to-day use, Solution 1 is recommended for quick analysis and verification. However, for production environments, Solution 2 is preferred due to its stricter field constraints, which help prevent issues during data processing.</p>
<p>In addition, you can also use <code>with recursive</code><sup>8</sup> to achieve pagination query, but this process is more complicated, more <code>SQL</code> writing, not recommended for use. If you are interested, I will write an article to share it later.</p>
</section>
<section id="references" class="level1">
<h1>references</h1>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p><a href="https://docs.blockscout.com/for-users/api/rpc-endpoints/account#get-transactions-by-address">Get Transactions By Address - Blockscout</a>↩︎</p></li>
<li id="fn2"><p><a href="https://duckdb.org/docs/api/python/function">Python Runction API - DuckDB</a>↩︎</p></li>
<li id="fn3"><p><a href="https://duckdb.org/docs/sql/statements/create_macro">CREATE MACRO Statement - DuckDB</a>↩︎</p></li>
<li id="fn4"><p><a href="https://docs.blockscout.com/for-users/api/rpc-endpoints/account#get-transactions-by-address">Get Transactions By Address - Blockscout</a>↩︎</p></li>
<li id="fn5"><p><a href="https://docs.blockscout.com/for-users/api/rpc-endpoints/account#get-transactions-by-address">Get Transactions By Address - Blockscout</a>↩︎</p></li>
<li id="fn6"><p><a href="https://duckdb.org/docs/api/python/function">Python Runction API - DuckDB</a>↩︎</p></li>
<li id="fn7"><p><a href="https://duckdb.org/docs/sql/statements/create_macro">CREATE MACRO Statement - DuckDB</a>↩︎</p></li>
<li id="fn8"><p><a href="https://duckdb.org/docs/sql/query_syntax/with#recursive-ctes">With Clause - DuckDB</a>↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a></div></div></section></div> ]]></description>
  <category>duckdb</category>
  <category>blockchain</category>
  <category>analysis</category>
  <guid>https://watsy0007.com/blog/analyzing_blockchain_data_with_duckdb_1/</guid>
  <pubDate>Sat, 10 Aug 2024 00:00:00 GMT</pubDate>
  <media:content url="https://watsy0007.com/blog/analyzing_blockchain_data_with_duckdb_1/analyse-blockchain-data-thumbnail.webp" medium="image" type="image/webp"/>
</item>
<item>
  <title>find missing dates with DuckDB</title>
  <link>https://watsy0007.com/blog/find_missing_dates_with_duckdb/</link>
  <description><![CDATA[ 





<section id="background" class="level1">
<h1>Background</h1>
<p>Recently, the business feedback that there is data missing, and it is necessary to locate the missing dates in order to supplement historical data.</p>
<p>After research, it was decided to use the Gantt chart to display the daily execution of all tasks.</p>
</section>
<section id="duckdb" class="level1">
<h1>DuckDB</h1>
<p>Considering that the task data already exists in <code>Postgres</code>, and visualization is needed, <code>DuckDB</code> has a <code>Postgres</code> plugin that can directly obtain data Therefore, choose to use <code>DuckDB</code> with <code>Jupyter Notebook</code> to quickly verify the logic.</p>
</section>
<section id="actual-operation" class="level1">
<h1>Actual operation</h1>
<p>The part of <code>DuckDB</code> operating <code>Postgres</code> can refer to <a href="https://duckdb.org/docs/extensions/postgres">PostgreSQL Extension</a></p>
<p>Omit the irrelevant code, the code logic is divided into several steps</p>
<section id="get-data" class="level2">
<h2 class="anchored" data-anchor-id="get-data">1. Get data</h2>
<p>Here use mock data</p>
<div id="83bb9932" class="cell" data-execution_count="2">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> duckdb</span>
<span id="cb1-2"></span>
<span id="cb1-3">conn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> duckdb.<span class="ex" style="color: null;
background-color: null;
font-style: inherit;">connect</span>()</span>
<span id="cb1-4"></span>
<span id="cb1-5">mock_data_sql <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb1-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">SELECT * FROM (</span></span>
<span id="cb1-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    VALUES </span></span>
<span id="cb1-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (1, DATE '2024-07-01'),</span></span>
<span id="cb1-9"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (1, DATE '2024-07-02'),</span></span>
<span id="cb1-10"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (1, DATE '2024-07-03'),</span></span>
<span id="cb1-11"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (1, DATE '2024-07-05'),</span></span>
<span id="cb1-12"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (1, DATE '2024-07-06'),</span></span>
<span id="cb1-13"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (2, DATE '2024-07-01'),</span></span>
<span id="cb1-14"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (2, DATE '2024-07-02'),</span></span>
<span id="cb1-15"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (2, DATE '2024-07-03'),</span></span>
<span id="cb1-16"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-01'),</span></span>
<span id="cb1-17"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-02'),</span></span>
<span id="cb1-18"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-03'),</span></span>
<span id="cb1-19"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-04'),</span></span>
<span id="cb1-20"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-05'),</span></span>
<span id="cb1-21"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-06'),</span></span>
<span id="cb1-22"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-07'),</span></span>
<span id="cb1-23"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-08'),</span></span>
<span id="cb1-24"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-09'),</span></span>
<span id="cb1-25"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (4, DATE '2024-07-05'),</span></span>
<span id="cb1-26"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (4, DATE '2024-07-06'),</span></span>
<span id="cb1-27"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (4, DATE '2024-07-07'),</span></span>
<span id="cb1-28"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (4, DATE '2024-07-08'),</span></span>
<span id="cb1-29"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (4, DATE '2024-07-09'),</span></span>
<span id="cb1-30"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  ) AS t(source_id, end_date)</span></span>
<span id="cb1-31"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb1-32">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> conn.execute(mock_data_sql).df()</span>
<span id="cb1-33">df</span></code></pre></div>
</details>
<div class="cell-output cell-output-display" data-execution_count="2">
<div>


<table class="dataframe caption-top table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">source_id</th>
<th data-quarto-table-cell-role="th">end_date</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>1</td>
<td>2024-07-01</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>1</td>
<td>2024-07-02</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>1</td>
<td>2024-07-03</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>1</td>
<td>2024-07-05</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>1</td>
<td>2024-07-06</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">5</td>
<td>2</td>
<td>2024-07-01</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">6</td>
<td>2</td>
<td>2024-07-02</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">7</td>
<td>2</td>
<td>2024-07-03</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">8</td>
<td>3</td>
<td>2024-07-01</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">9</td>
<td>3</td>
<td>2024-07-02</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">10</td>
<td>3</td>
<td>2024-07-03</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">11</td>
<td>3</td>
<td>2024-07-04</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">12</td>
<td>3</td>
<td>2024-07-05</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">13</td>
<td>3</td>
<td>2024-07-06</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">14</td>
<td>3</td>
<td>2024-07-07</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">15</td>
<td>3</td>
<td>2024-07-08</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">16</td>
<td>3</td>
<td>2024-07-09</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">17</td>
<td>4</td>
<td>2024-07-05</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">18</td>
<td>4</td>
<td>2024-07-06</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">19</td>
<td>4</td>
<td>2024-07-07</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">20</td>
<td>4</td>
<td>2024-07-08</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">21</td>
<td>4</td>
<td>2024-07-09</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</section>
<section id="group-by-date" class="level2">
<h2 class="anchored" data-anchor-id="group-by-date">2. Group by date</h2>
<p>The main difficulty in finding missing dates is to group by time continuity, continuous time is placed in the same group, so if a <code>source_id</code> has multiple time periods, it means that there is a time gap.</p>
<p>Use window functions to process the time of the current row, and get the grouping time of the current row based on the time difference. The code is as follows:</p>
<div id="7b5a461b" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1">group_date_sql <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb2-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> SELECT </span></span>
<span id="cb2-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    source_id,</span></span>
<span id="cb2-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    end_date,</span></span>
<span id="cb2-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    end_date - INTERVAL (ROW_NUMBER() OVER (PARTITION BY source_id ORDER BY end_date) - 1) DAY AS group_date</span></span>
<span id="cb2-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  FROM df</span></span>
<span id="cb2-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  order by source_id, end_date</span></span>
<span id="cb2-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb2-9">grouped_date_df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> conn.execute(group_date_sql).df()</span>
<span id="cb2-10">grouped_date_df</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<div>


<table class="dataframe caption-top table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">source_id</th>
<th data-quarto-table-cell-role="th">end_date</th>
<th data-quarto-table-cell-role="th">group_date</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>1</td>
<td>2024-07-01</td>
<td>2024-07-01</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>1</td>
<td>2024-07-02</td>
<td>2024-07-01</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>1</td>
<td>2024-07-03</td>
<td>2024-07-01</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>1</td>
<td>2024-07-05</td>
<td>2024-07-02</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>1</td>
<td>2024-07-06</td>
<td>2024-07-02</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">5</td>
<td>2</td>
<td>2024-07-01</td>
<td>2024-07-01</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">6</td>
<td>2</td>
<td>2024-07-02</td>
<td>2024-07-01</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">7</td>
<td>2</td>
<td>2024-07-03</td>
<td>2024-07-01</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">8</td>
<td>3</td>
<td>2024-07-01</td>
<td>2024-07-01</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">9</td>
<td>3</td>
<td>2024-07-02</td>
<td>2024-07-01</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">10</td>
<td>3</td>
<td>2024-07-03</td>
<td>2024-07-01</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">11</td>
<td>3</td>
<td>2024-07-04</td>
<td>2024-07-01</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">12</td>
<td>3</td>
<td>2024-07-05</td>
<td>2024-07-01</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">13</td>
<td>3</td>
<td>2024-07-06</td>
<td>2024-07-01</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">14</td>
<td>3</td>
<td>2024-07-07</td>
<td>2024-07-01</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">15</td>
<td>3</td>
<td>2024-07-08</td>
<td>2024-07-01</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">16</td>
<td>3</td>
<td>2024-07-09</td>
<td>2024-07-01</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">17</td>
<td>4</td>
<td>2024-07-05</td>
<td>2024-07-05</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">18</td>
<td>4</td>
<td>2024-07-06</td>
<td>2024-07-05</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">19</td>
<td>4</td>
<td>2024-07-07</td>
<td>2024-07-05</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">20</td>
<td>4</td>
<td>2024-07-08</td>
<td>2024-07-05</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">21</td>
<td>4</td>
<td>2024-07-09</td>
<td>2024-07-05</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</section>
<section id="group-by-source_id-and-group-date" class="level2">
<h2 class="anchored" data-anchor-id="group-by-source_id-and-group-date">3. Group by source_id and group date</h2>
<div id="34c4632d" class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1">group_sql <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb3-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">SELECT </span></span>
<span id="cb3-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    source_id,</span></span>
<span id="cb3-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    MIN(end_date) AS start_date,</span></span>
<span id="cb3-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    MAX(end_date) AS end_date,</span></span>
<span id="cb3-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  FROM grouped_date_df</span></span>
<span id="cb3-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  GROUP BY source_id, group_date</span></span>
<span id="cb3-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  ORDER BY source_id, group_date</span></span>
<span id="cb3-9"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb3-10">grouped_df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> conn.execute(group_sql).df()</span>
<span id="cb3-11">grouped_df</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="4">
<div>


<table class="dataframe caption-top table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">source_id</th>
<th data-quarto-table-cell-role="th">start_date</th>
<th data-quarto-table-cell-role="th">end_date</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>1</td>
<td>2024-07-01</td>
<td>2024-07-03</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>1</td>
<td>2024-07-05</td>
<td>2024-07-06</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>2</td>
<td>2024-07-01</td>
<td>2024-07-03</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>3</td>
<td>2024-07-01</td>
<td>2024-07-09</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>4</td>
<td>2024-07-05</td>
<td>2024-07-09</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
</section>
<section id="visualization" class="level2">
<h2 class="anchored" data-anchor-id="visualization">4. Visualization</h2>
<div id="884715b7" class="cell" data-execution_count="5">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> plotly.express <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> px</span>
<span id="cb4-2">fig <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> px.timeline(grouped_df, x_start<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_date'</span>, x_end<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'end_date'</span>, y<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'source_id'</span>)</span>
<span id="cb4-3">fig.update_yaxes(autorange<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"reversed"</span>)</span>
<span id="cb4-4">fig.show()</span></code></pre></div>
</details>
<div class="cell-output cell-output-display">
<div>                            <div id="c63ae8c4-f88d-4844-86d2-1c3250cf558d" class="plotly-graph-div" style="height:525px; width:100%;"></div>            <script type="text/javascript">                require(["plotly"], function(Plotly) {                    window.PLOTLYENV=window.PLOTLYENV || {};                                    if (document.getElementById("c63ae8c4-f88d-4844-86d2-1c3250cf558d")) {                    Plotly.newPlot(                        "c63ae8c4-f88d-4844-86d2-1c3250cf558d",                        [{"alignmentgroup":"True","base":["2024-07-01T00:00:00","2024-07-05T00:00:00","2024-07-01T00:00:00","2024-07-01T00:00:00","2024-07-05T00:00:00"],"hovertemplate":"start_date=%{base}\u003cbr\u003eend_date=%{x}\u003cbr\u003esource_id=%{y}\u003cextra\u003e\u003c\u002fextra\u003e","legendgroup":"","marker":{"color":"#636efa","pattern":{"shape":""}},"name":"","offsetgroup":"","orientation":"h","showlegend":false,"textposition":"auto","x":[172800000.0,86400000.0,172800000.0,691200000.0,345600000.0],"xaxis":"x","y":[1,1,2,3,4],"yaxis":"y","type":"bar"}],                        {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"},"margin":{"b":0,"l":0,"r":0,"t":30}}},"xaxis":{"anchor":"y","domain":[0.0,1.0],"type":"date"},"yaxis":{"anchor":"x","domain":[0.0,1.0],"title":{"text":"source_id"},"autorange":"reversed"},"legend":{"tracegroupgap":0},"barmode":"overlay"},                        {"responsive": true}                    ).then(function(){
                            
var gd = document.getElementById('c63ae8c4-f88d-4844-86d2-1c3250cf558d');
var x = new MutationObserver(function (mutations, observer) {{
        var display = window.getComputedStyle(gd).display;
        if (!display || display === 'none') {{
            console.log([gd, 'removed!']);
            Plotly.purge(gd);
            observer.disconnect();
        }}
}});

// Listen for the removal of the full notebook cells
var notebookContainer = gd.closest('#notebook-container');
if (notebookContainer) {{
    x.observe(notebookContainer, {childList: true});
}}

// Listen for the clearing of the current output cell
var outputEl = gd.closest('.output');
if (outputEl) {{
    x.observe(outputEl, {childList: true});
}}

                        })                };                });            </script>        </div>
</div>
</div>
</section>
<section id="full-code" class="level2">
<h2 class="anchored" data-anchor-id="full-code">5. Full code</h2>
<div id="1b9c8ca4" class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">sql <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb5-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">with raw_data as (</span></span>
<span id="cb5-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  SELECT * FROM (</span></span>
<span id="cb5-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    VALUES </span></span>
<span id="cb5-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (1, DATE '2024-07-01'),</span></span>
<span id="cb5-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (1, DATE '2024-07-02'),</span></span>
<span id="cb5-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (1, DATE '2024-07-03'),</span></span>
<span id="cb5-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (1, DATE '2024-07-05'),</span></span>
<span id="cb5-9"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (1, DATE '2024-07-06'),</span></span>
<span id="cb5-10"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (2, DATE '2024-07-01'),</span></span>
<span id="cb5-11"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (2, DATE '2024-07-02'),</span></span>
<span id="cb5-12"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (2, DATE '2024-07-03'),</span></span>
<span id="cb5-13"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-01'),</span></span>
<span id="cb5-14"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-02'),</span></span>
<span id="cb5-15"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-03'),</span></span>
<span id="cb5-16"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-04'),</span></span>
<span id="cb5-17"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-05'),</span></span>
<span id="cb5-18"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-06'),</span></span>
<span id="cb5-19"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-07'),</span></span>
<span id="cb5-20"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-08'),</span></span>
<span id="cb5-21"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (3, DATE '2024-07-09'),</span></span>
<span id="cb5-22"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (4, DATE '2024-07-05'),</span></span>
<span id="cb5-23"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (4, DATE '2024-07-06'),</span></span>
<span id="cb5-24"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (4, DATE '2024-07-07'),</span></span>
<span id="cb5-25"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (4, DATE '2024-07-08'),</span></span>
<span id="cb5-26"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">      (4, DATE '2024-07-09'),</span></span>
<span id="cb5-27"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  ) AS t(source_id, end_date)</span></span>
<span id="cb5-28"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">), group_date as (</span></span>
<span id="cb5-29"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  SELECT </span></span>
<span id="cb5-30"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    source_id,</span></span>
<span id="cb5-31"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    end_date,</span></span>
<span id="cb5-32"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    end_date - INTERVAL (ROW_NUMBER() OVER (PARTITION BY source_id ORDER BY end_date) - 1) DAY AS group_date</span></span>
<span id="cb5-33"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  FROM raw_data</span></span>
<span id="cb5-34"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  order by source_id, end_date</span></span>
<span id="cb5-35"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">), final as (</span></span>
<span id="cb5-36"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  SELECT </span></span>
<span id="cb5-37"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    source_id,</span></span>
<span id="cb5-38"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    MIN(end_date) AS start_date,</span></span>
<span id="cb5-39"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    MAX(end_date) AS end_date,</span></span>
<span id="cb5-40"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  FROM grouped_date_df</span></span>
<span id="cb5-41"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  GROUP BY source_id, group_date</span></span>
<span id="cb5-42"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  ORDER BY source_id, group_date</span></span>
<span id="cb5-43"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">)</span></span>
<span id="cb5-44"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">from final</span></span>
<span id="cb5-45"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb5-46">date_df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> conn.execute(sql).df()</span>
<span id="cb5-47">gap_fig <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> px.timeline(date_df, x_start<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'start_date'</span>, x_end<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'end_date'</span>, y<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'source_id'</span>)</span>
<span id="cb5-48">gap_fig.update_yaxes(autorange<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"reversed"</span>)</span>
<span id="cb5-49">gap_fig.show()</span></code></pre></div>
</div>
</section>
</section>
<section id="summary" class="level1">
<h1>Summary</h1>
<p>This article mainly introduces how to use DuckDB to find missing dates. Through window functions and grouping functions, missing dates can be quickly located.</p>
<p>PS: Recently, the frequency of using DuckDB in work is getting higher and higher. The reason is that DuckDB is lightweight, fast, and the efficiency is surprisingly high when combined with Jupyter Notebook.</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a></div></div></section></div> ]]></description>
  <category>code</category>
  <category>duckdb</category>
  <guid>https://watsy0007.com/blog/find_missing_dates_with_duckdb/</guid>
  <pubDate>Fri, 26 Jul 2024 00:00:00 GMT</pubDate>
  <media:content url="https://watsy0007.com/blog/find_missing_dates_with_duckdb/missing-dates-thumbnail.webp" medium="image" type="image/webp"/>
</item>
<item>
  <title>DuckDB Example</title>
  <link>https://watsy0007.com/blog/duckdb-example/</link>
  <description><![CDATA[ 





<p>Test <code>DuckDB</code> features based on <code>quarto</code>.</p>
<p>All the code below can be copied and executed in a jupyter notebook.</p>
<section id="install-dependencies" class="level1">
<h1>Install dependencies</h1>
<div id="3417ebb3" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>pip install duckdb jupysql <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>quiet</span></code></pre></div>
</div>
</section>
<section id="basic-configuration" class="level1">
<h1>Basic configuration</h1>
<div id="1101b857" class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> duckdb</span>
<span id="cb2-2"></span>
<span id="cb2-3">conn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> duckdb.<span class="ex" style="color: null;
background-color: null;
font-style: inherit;">connect</span>()</span></code></pre></div>
</div>
<p>jupysql configuration</p>
<div id="82d0b1a7" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>load_ext sql</span>
<span id="cb3-2"></span>
<span id="cb3-3"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>config SqlMagic.autopandas <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb3-4"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>config SqlMagic.feedback <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb3-5"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>config SqlMagic.displaycon <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb3-6"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>config SqlMagic.displaylimit <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span></span>
<span id="cb3-7"></span>
<span id="cb3-8"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql conn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>alias duckdb<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>native</span></code></pre></div>
</div>
</section>
<section id="demo-data" class="level1">
<h1>Demo data</h1>
<p>Refer to <a href="https://duckdb.org/docs/guides/python/jupyter">Jupyter Notebooks</a></p>
<div id="4837fcbd" class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%%</span>sql <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>save short_trips <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>no<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>execute</span>
<span id="cb4-2">SELECT <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span></span>
<span id="cb4-3">FROM <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet'</span></span>
<span id="cb4-4">WHERE trip_distance <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">6.3</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<span style="None">Skipping execution...</span>
</div>
</div>
<div id="a7960c12" class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sqlplot histogram <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>table short_trips <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>column trip_distance <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span>bins <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--</span><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> short_trips</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://watsy0007.com/blog/duckdb-example/index_files/figure-html/cell-6-output-1.png" width="618" height="449" class="figure-img"></p>
</figure>
</div>
</div>
</div>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a></div></div></section></div> ]]></description>
  <category>code</category>
  <category>duckdb</category>
  <guid>https://watsy0007.com/blog/duckdb-example/</guid>
  <pubDate>Sat, 17 Feb 2024 00:00:00 GMT</pubDate>
  <media:content url="https://watsy0007.com/blog/duckdb-example/duckdb-sql-analysis-thumbnail.webp" medium="image" type="image/webp"/>
</item>
</channel>
</rss>
