<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="http://caleb.software/feed.xml" rel="self" type="application/atom+xml" /><link href="http://caleb.software/" rel="alternate" type="text/html" /><updated>2026-01-26T21:07:59+00:00</updated><id>http://caleb.software/feed.xml</id><title type="html">caleb.software</title><entry><title type="html">Expecting Claude Code Usage</title><link href="http://caleb.software/posts/claude-code-usage.html" rel="alternate" type="text/html" title="Expecting Claude Code Usage" /><published>2026-01-13T00:00:00+00:00</published><updated>2026-01-13T00:00:00+00:00</updated><id>http://caleb.software/posts/claude-code-usage</id><content type="html" xml:base="http://caleb.software/posts/claude-code-usage.html"><![CDATA[<p>I’ve been using Claude Code pretty often since it started being included in the <a href="https://support.claude.com/en/articles/8325606-what-is-the-pro-plan">Claude Pro plan</a> back in June 2025. With this plan, you pay a fixed amount (currently $20/month) and get a limited amount of usage with various Claude products. With Claude Code, the way usage limits work is twofold: you get a 5-hour “session” window with a smaller limit, and a 7-day “weekly” window with a larger limit<sup id="fnref:usage-limits"><a href="#fn:usage-limits" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>.</p>

<p>To check your usage limits, you can run the <code class="language-plaintext highlighter-rouge">/usage</code> slash command from within the Claude Code CLI tool, which brings up a menu that looks like this:</p>

<p><img src="/assets/img/claude-code-usage/claude-usage-dialog.png" alt="" /></p>

<p>One major drawback is that this slash command can’t be used while the Claude agent loop is running and doing things. This has been a big frustration for me, as it really interrupts my flow of work to have to remember to check this only in the intervals where Claude is not actively doing something.</p>

<p>This is especially true when a task has been running for a while: I want to check to see how much usage this task is burning to determine if I should <code class="language-plaintext highlighter-rouge">/compact</code> or <code class="language-plaintext highlighter-rouge">/clear</code> the context and start over. This situation happens semi-frequently, like when Claude goes down a rabbit hole investigating something tricky but gets over-focused on a dead end.</p>

<p>I often find myself opening a new terminal just to start a separate Claude Code session and run <code class="language-plaintext highlighter-rouge">/usage</code> to see where I’m at, but this is really annoying to have to do all the time.</p>

<h2 id="quickly-checking-quota-usage">Quickly Checking Quota Usage</h2>

<p>Initially, I looked into some popular tools such as <a href="https://github.com/ryoppippi/ccusage">ccusage</a>, which describes itself as:</p>

<blockquote>
  <p>Analyze your Claude Code token usage and costs from local JSONL files — incredibly fast and informative!</p>
</blockquote>

<p>While token usage is nice to know, I’m using a subscription plan and not the PAYG API model, so that doesn’t help me that much. While the tool does have a “5-Hour Blocks Report” feature, it seems like the estimated limits are just based on the local log files and some heuristics on total tokens &amp; API usage cost.</p>

<p>There are <a href="https://github.com/ryoppippi/ccusage/issues/145">a couple of relevant issues on Github</a> about inaccuracies with these projections. Since Anthropic has never officially released the magic formula they use for subscription quota usage, the best we can do with this JSONL file approach is guess how much usage is remaining.</p>

<p>That wasn’t really good enough for me, and plus I knew there must be a better way. After all, the Claude Code tool itself can tell you your exact quota remaining, so we should be able to just reuse that for ourselves. I decided to try automating the built-in <code class="language-plaintext highlighter-rouge">/usage</code> slash command directly to get my usage data.</p>

<h2 id="using-expect">Using Expect</h2>

<blockquote>
  <p><strong>DISCLAIMER:</strong> After I implemented this, I noticed that using this script would occasionally increase my reportedly used quota by 1% even without any other Claude usage since last run. I’m not totally sure if this is just some delayed counting or if it’s really using quota, but <a href="https://www.reddit.com/r/ClaudeCode/comments/1qazqq6/confirmed_claude_code_cli_burns_13_of_your_quota/">some others on Reddit have reported</a> that starting up a new Claude Code session burns some tokens for some sort of “warm start” feature.</p>
</blockquote>

<p>The Claude Code CLI<sup id="fnref:tui"><a href="#fn:tui" class="footnote" rel="footnote" role="doc-noteref">2</a></sup> is built with a JavaScript framework called Ink, which uses React to handle the UI. Unfortunately, this tool doesn’t have any sort of <code class="language-plaintext highlighter-rouge">claude --usage</code> flag support; the only way to get your usage is to launch Claude Code and then run the <code class="language-plaintext highlighter-rouge">/usage</code> slash command.</p>

<p>There’s this <a href="https://man7.org/linux/man-pages/man1/expect.1.html">handy unix utility</a> called <code class="language-plaintext highlighter-rouge">expect</code> that’s designed to help script/automate interactive applications like this. It allows you to easily wait for certain output, send an input to the program, and so on. Automating the Claude Code tool this way is relatively simple.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/usr/bin/env expect -f</span>
<span class="nb">set timeout </span>10

<span class="c"># Show the output to the user</span>
log_user 1

<span class="c"># Start Claude Code in interactive mode</span>
spawn claude

<span class="c"># Wait for the prompt ready signal</span>
expect <span class="o">{</span>
    <span class="nb">timeout</span> <span class="o">{</span> puts <span class="s2">"</span><span class="se">\n</span><span class="s2">Timeout waiting for Claude Code to start"</span><span class="p">;</span> <span class="nb">exit </span>1 <span class="o">}</span>
    <span class="nt">-re</span> <span class="s2">"</span><span class="se">\\</span><span class="s2">?1004h"</span> <span class="o">{</span> <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>This doesn’t do much other than configure the timeout and output settings, and then starting a new Claude Code instance. The neat thing is this <code class="language-plaintext highlighter-rouge">expect</code> block, which uses a regular expression <code class="language-plaintext highlighter-rouge">-re</code> to wait for the ANSI <code class="language-plaintext highlighter-rouge">1004h</code> escape code (which is an “enable focus events” command) that Claude emits when it’s ready.</p>

<p>Next is to send our <code class="language-plaintext highlighter-rouge">/usage</code> slash command.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Give it a moment to be fully ready</span>
<span class="nb">sleep </span>0.3

<span class="c"># Type /usage</span>
send <span class="nt">--</span> <span class="s2">"/usage"</span>
</code></pre></div></div>

<p>Initially I tried to just send <code class="language-plaintext highlighter-rouge">/usage \r</code>, but this just appeared in the Claude Code CLI input field as the text of the slash command with a newline after it, as if I had hit Shift+Enter instead of actually submitting the text.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>--------------------------
| /usage                 |
|                        |
--------------------------
</code></pre></div></div>

<p>After some debugging, it seems like this is because typing <code class="language-plaintext highlighter-rouge">/usage</code> causes this autocomplete window to appear, and hitting enter was possibly just selecting the entry from the autocomplete menu instead of actually submitting the command.</p>

<p><img src="/assets/img/claude-code-usage/claude-autocomplete-menu.png" alt="" /></p>

<p>Instead, I found hitting escape closes this autocomplete menu, after which we can actually submit the slash command and get it to run.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Wait for the autocomplete menu to appear</span>
expect <span class="o">{</span>
    <span class="nb">timeout</span> <span class="o">{</span> puts <span class="s2">"</span><span class="se">\n</span><span class="s2">No autocomplete menu appeared"</span><span class="p">;</span> <span class="o">}</span>
    <span class="nt">-re</span> <span class="s2">"/usage.*Show plan usage limits"</span> <span class="o">{</span> <span class="o">}</span>
<span class="o">}</span>

<span class="c"># Dismiss the autocomplete by pressing Escape</span>
send <span class="nt">--</span> <span class="s2">"</span><span class="se">\0</span><span class="s2">33"</span>

<span class="c"># Small delay</span>
<span class="nb">sleep </span>0.1

<span class="c"># Now send enter</span>
send <span class="nt">--</span> <span class="s2">"</span><span class="se">\r</span><span class="s2">"</span>
</code></pre></div></div>

<p>Now our <code class="language-plaintext highlighter-rouge">/usage</code> slash command is sent! We just have to wait for everything to load, and then we can exit.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Wait for the actual usage data to be displayed (not just "Loading")</span>
expect <span class="o">{</span>
    <span class="nb">timeout</span> <span class="o">{</span> puts <span class="s2">"</span><span class="se">\n</span><span class="s2">Timeout waiting for session usage data"</span><span class="p">;</span> <span class="nb">exit </span>1 <span class="o">}</span>
    <span class="nt">-re</span> <span class="s2">"Current session.*used.*Resets"</span> <span class="o">{</span> <span class="o">}</span>
<span class="o">}</span>

<span class="c"># Then wait for week data AND reset time to fully load</span>
expect <span class="o">{</span>
    <span class="nb">timeout</span> <span class="o">{</span> puts <span class="s2">"</span><span class="se">\n</span><span class="s2">Timeout waiting for week usage data"</span><span class="p">;</span> <span class="nb">exit </span>1 <span class="o">}</span>
    <span class="nt">-re</span> <span class="s2">"Current week.*(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)"</span> <span class="o">{</span> <span class="o">}</span>
<span class="o">}</span>

<span class="c"># The script will exit and kill the claude process automatically</span>
<span class="nb">exit </span>0
</code></pre></div></div>

<p>Running it, you get to see the usage report immediately - rather than manually starting Claude Code, waiting for it to load, and sending the slash command yourself.</p>

<p><img src="/assets/img/claude-code-usage/claude-usage-script.gif" alt="" /></p>
<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:usage-limits">
      <p><a href="https://support.claude.com/en/articles/11145838-using-claude-code-with-your-pro-or-max-plan#h_50f6dec29d">Official Docs</a> <a href="#fnref:usage-limits" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
    <li id="fn:tui">
      <p>Well, TUI really, but the main script is called <code class="language-plaintext highlighter-rouge">cli.js</code> <a href="#fnref:tui" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="claude" /><category term="bash" /><summary type="html"><![CDATA[I’ve been using Claude Code pretty often since it started being included in the Claude Pro plan back in June 2025. With this plan, you pay a fixed amount (currently $20/month) and get a limited amount of usage with various Claude products. With Claude Code, the way usage limits work is twofold: you get a 5-hour “session” window with a smaller limit, and a 7-day “weekly” window with a larger limit1. To check your usage limits, you can run the /usage slash command from within the Claude Code CLI tool, which brings up a menu that looks like this: One major drawback is that this slash command can’t be used while the Claude agent loop is running and doing things. This has been a big frustration for me, as it really interrupts my flow of work to have to remember to check this only in the intervals where Claude is not actively doing something. This is especially true when a task has been running for a while: I want to check to see how much usage this task is burning to determine if I should /compact or /clear the context and start over. This situation happens semi-frequently, like when Claude goes down a rabbit hole investigating something tricky but gets over-focused on a dead end. I often find myself opening a new terminal just to start a separate Claude Code session and run /usage to see where I’m at, but this is really annoying to have to do all the time. Quickly Checking Quota Usage Initially, I looked into some popular tools such as ccusage, which describes itself as: Analyze your Claude Code token usage and costs from local JSONL files — incredibly fast and informative! While token usage is nice to know, I’m using a subscription plan and not the PAYG API model, so that doesn’t help me that much. While the tool does have a “5-Hour Blocks Report” feature, it seems like the estimated limits are just based on the local log files and some heuristics on total tokens &amp; API usage cost. There are a couple of relevant issues on Github about inaccuracies with these projections. Since Anthropic has never officially released the magic formula they use for subscription quota usage, the best we can do with this JSONL file approach is guess how much usage is remaining. That wasn’t really good enough for me, and plus I knew there must be a better way. After all, the Claude Code tool itself can tell you your exact quota remaining, so we should be able to just reuse that for ourselves. I decided to try automating the built-in /usage slash command directly to get my usage data. Using Expect DISCLAIMER: After I implemented this, I noticed that using this script would occasionally increase my reportedly used quota by 1% even without any other Claude usage since last run. I’m not totally sure if this is just some delayed counting or if it’s really using quota, but some others on Reddit have reported that starting up a new Claude Code session burns some tokens for some sort of “warm start” feature. The Claude Code CLI2 is built with a JavaScript framework called Ink, which uses React to handle the UI. Unfortunately, this tool doesn’t have any sort of claude --usage flag support; the only way to get your usage is to launch Claude Code and then run the /usage slash command. There’s this handy unix utility called expect that’s designed to help script/automate interactive applications like this. It allows you to easily wait for certain output, send an input to the program, and so on. Automating the Claude Code tool this way is relatively simple. #!/usr/bin/env expect -f set timeout 10 # Show the output to the user log_user 1 # Start Claude Code in interactive mode spawn claude # Wait for the prompt ready signal expect { timeout { puts "\nTimeout waiting for Claude Code to start"; exit 1 } -re "\\?1004h" { } } This doesn’t do much other than configure the timeout and output settings, and then starting a new Claude Code instance. The neat thing is this expect block, which uses a regular expression -re to wait for the ANSI 1004h escape code (which is an “enable focus events” command) that Claude emits when it’s ready. Next is to send our /usage slash command. # Give it a moment to be fully ready sleep 0.3 # Type /usage send -- "/usage" Initially I tried to just send /usage \r, but this just appeared in the Claude Code CLI input field as the text of the slash command with a newline after it, as if I had hit Shift+Enter instead of actually submitting the text. -------------------------- | /usage | | | -------------------------- After some debugging, it seems like this is because typing /usage causes this autocomplete window to appear, and hitting enter was possibly just selecting the entry from the autocomplete menu instead of actually submitting the command. Instead, I found hitting escape closes this autocomplete menu, after which we can actually submit the slash command and get it to run. # Wait for the autocomplete menu to appear expect { timeout { puts "\nNo autocomplete menu appeared"; } -re "/usage.*Show plan usage limits" { } } # Dismiss the autocomplete by pressing Escape send -- "\033" # Small delay sleep 0.1 # Now send enter send -- "\r" Now our /usage slash command is sent! We just have to wait for everything to load, and then we can exit. # Wait for the actual usage data to be displayed (not just "Loading") expect { timeout { puts "\nTimeout waiting for session usage data"; exit 1 } -re "Current session.*used.*Resets" { } } # Then wait for week data AND reset time to fully load expect { timeout { puts "\nTimeout waiting for week usage data"; exit 1 } -re "Current week.*(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)" { } } # The script will exit and kill the claude process automatically exit 0 Running it, you get to see the usage report immediately - rather than manually starting Claude Code, waiting for it to load, and sending the slash command yourself. Official Docs (return) Well, TUI really, but the main script is called cli.js (return)]]></summary></entry><entry><title type="html">Min-maxing My Recent Posts</title><link href="http://caleb.software/posts/minmax-index.html" rel="alternate" type="text/html" title="Min-maxing My Recent Posts" /><published>2025-04-19T00:00:00+00:00</published><updated>2025-04-19T00:00:00+00:00</updated><id>http://caleb.software/posts/minmax-index</id><content type="html" xml:base="http://caleb.software/posts/minmax-index.html"><![CDATA[<p>I designed my personal site from scratch many years ago, and since then it’s undergone plenty of incremental changes. Considering my general lack of artistic ability… the results are probably what you would expect. I recently decided to tweak the CSS styles of my blog’s homepage, since I felt the “recent posts” list was starting to get too dense and hard to read as I added more posts.</p>

<p><img src="/assets/img/minmax-index/old-index.png" alt="" /></p>

<p>This is what it looked like before my redesign. Functional and not terrible, but to me it looked like a boring wall of text with each post stacked on top of each other like this.</p>

<h2 id="the-grid">The Grid</h2>

<p>Beyond just moving the non-title text (date and tags) to a line below the title, I wanted to give it a little flair. I had the idea of using a <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_grid_layout">CSS grid layout</a> to put the date in a column to the left, then the title and tags subtext in the right column. On its own, this was fairly straightforward.</p>

<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"post-links"</span><span class="nt">&gt;</span>
    {% for post in site.posts %}
    <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"post-date"</span><span class="nt">&gt;</span>
        <span class="nt">&lt;span</span> <span class="na">class=</span><span class="s">"post-subtext"</span><span class="nt">&gt;</span>{{ post.date | date_to_string }}<span class="nt">&lt;/span&gt;</span>
    <span class="nt">&lt;/div&gt;</span>

    <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"post"</span><span class="nt">&gt;</span>
        <span class="nt">&lt;a</span> <span class="na">class=</span><span class="s">"post-link"</span> <span class="na">href=</span><span class="s">"{{ post.url }}"</span><span class="nt">&gt;</span>{{ post.title }}<span class="nt">&lt;/a&gt;</span>
        <span class="nt">&lt;div&gt;</span><span class="c">&lt;!-- tags go here --&gt;</span><span class="nt">&lt;/div&gt;</span>
    <span class="nt">&lt;/div&gt;</span>
    {% endfor %}
<span class="nt">&lt;/div&gt;</span>
</code></pre></div></div>

<p>Beyond some styles for margins, fonts, and color, these are the relevant CSS styles to make this grid work.</p>

<div class="language-css highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">.post-links</span> <span class="p">{</span>
    <span class="nl">display</span><span class="p">:</span> <span class="nb">grid</span><span class="p">;</span>
    <span class="nl">grid-template-columns</span><span class="p">:</span> <span class="n">max-content</span> <span class="m">1fr</span><span class="p">;</span>
    <span class="nl">gap</span><span class="p">:</span> <span class="m">1rem</span> <span class="m">1rem</span><span class="p">;</span>
<span class="p">}</span>

<span class="nc">.post-date</span> <span class="p">{</span>
    <span class="nl">text-align</span><span class="p">:</span> <span class="nb">right</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This <code class="language-plaintext highlighter-rouge">grid-template-columns</code> style <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/grid-template-columns">sets the size of the two columns</a> in the grid. <code class="language-plaintext highlighter-rouge">max-content</code> essentially <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/max-content">sets the width</a> for the first column equal to the width of the longest content of any row. The second value of <code class="language-plaintext highlighter-rouge">1fr</code> sets the “growth” factor (similar to <code class="language-plaintext highlighter-rouge">flex-grow</code>) to make that column take up the remaining space<sup id="fnref:grow"><a href="#fn:grow" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>.</p>

<p><img src="/assets/img/minmax-index/new-index.png" alt="" /></p>

<p>It looks great!</p>

<h2 id="mobile-compatibility">Mobile Compatibility</h2>

<p>…on desktop. Unfortunately, because our left column is a fixed size, it can take up ~1/3 of the screen width on certain mobile devices, which squishes the title column and makes things look off balance.</p>

<p><img src="/assets/img/minmax-index/bad-mobile.png" alt="" /></p>

<p>Instead, I’d rather have the left column start wrapping at some point to give the title more space. And I didn’t want to use <code class="language-plaintext highlighter-rouge">@media</code> queries, since that felt like a cop-out. Grid only!</p>

<h2 id="simple-minmax">Simple minmax()</h2>

<p>To make our left column able to shrink and break, we can use <code class="language-plaintext highlighter-rouge">minmax()</code> to <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/minmax">set a range of widths</a> allowed in the column.</p>

<div class="language-css highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">.post-links</span> <span class="p">{</span>
    <span class="nl">display</span><span class="p">:</span> <span class="nb">grid</span><span class="p">;</span>
    <span class="nl">grid-template-columns</span><span class="p">:</span> <span class="nf">minmax</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="n">max-content</span><span class="p">)</span> <span class="m">1fr</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This works, but with a <code class="language-plaintext highlighter-rouge">0</code> minimum width, the left column could get really squished. Instead, we can use <code class="language-plaintext highlighter-rouge">min-content</code> to make the minimum width equal to the longest word<sup id="fnref:word"><a href="#fn:word" class="footnote" rel="footnote" role="doc-noteref">2</a></sup> in that column, so it’s always at least a reasonable width.</p>

<div class="language-css highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">.post-links</span> <span class="p">{</span>
    <span class="nl">display</span><span class="p">:</span> <span class="nb">grid</span><span class="p">;</span>
    <span class="nl">grid-template-columns</span><span class="p">:</span> <span class="nf">minmax</span><span class="p">(</span><span class="n">min-content</span><span class="p">,</span> <span class="n">max-content</span><span class="p">)</span> <span class="m">1fr</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>We can simplify this further by using <code class="language-plaintext highlighter-rouge">minmax(auto, auto)</code> or even just <code class="language-plaintext highlighter-rouge">auto</code>, as these are generally equivalent.</p>

<div class="language-css highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">.post-links</span> <span class="p">{</span>
    <span class="nl">display</span><span class="p">:</span> <span class="nb">grid</span><span class="p">;</span>
    <span class="nl">grid-template-columns</span><span class="p">:</span> <span class="nb">auto</span> <span class="m">1fr</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p><a href="https://developer.mozilla.org/en-US/docs/Web/CSS/grid-template-columns">According to the docs:</a></p>

<blockquote>
  <p>If used outside of minmax() notation, auto represents the range between the minimum and maximum described above. This behaves similarly to minmax(min-content,max-content) in most cases.</p>
</blockquote>

<h2 id="better-minmax">Better minmax()</h2>

<p>While this does cause the date to wrap on very small displays, I noticed that on many larger phones the browser’s grid layout algorithm prioritizes shrinking the right column first, so the left column is still too wide. Ideally we could cap the left column to 20% of the screen (or parent) width. I tried using <code class="language-plaintext highlighter-rouge">min()</code> both on its own and within <code class="language-plaintext highlighter-rouge">minmax()</code>, but sadly this isn’t supported inside <code class="language-plaintext highlighter-rouge">grid-template-columns</code>.</p>

<div class="language-css highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">.post-links</span> <span class="p">{</span>
    <span class="nl">display</span><span class="p">:</span> <span class="nb">grid</span><span class="p">;</span>
    <span class="nl">grid-template-columns</span><span class="p">:</span> <span class="nf">min</span><span class="p">(</span><span class="n">max-content</span><span class="p">,</span> <span class="m">20%</span><span class="p">)</span> <span class="m">1fr</span><span class="p">;</span> <span class="c">/* doesn't work! */</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Instead, we can flip this around. A 20% maximum on the left column is roughly<sup id="fnref:gap"><a href="#fn:gap" class="footnote" rel="footnote" role="doc-noteref">3</a></sup> the same as an 80% minimum on the right column. So we could use <code class="language-plaintext highlighter-rouge">minmax()</code> on the right to let the right column shrink and grow as it pleases, but not smaller than 80%.</p>

<div class="language-css highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">.post-links</span> <span class="p">{</span>
    <span class="nl">display</span><span class="p">:</span> <span class="nb">grid</span><span class="p">;</span>
    <span class="nl">grid-template-columns</span><span class="p">:</span> <span class="nb">auto</span> <span class="nf">minmax</span><span class="p">(</span><span class="m">80%</span><span class="p">,</span> <span class="m">1fr</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>As far as our right column is concerned, this <code class="language-plaintext highlighter-rouge">minmax()</code> makes it at least 80% wide on smaller displays, forcing the left column to wrap. Beyond that minimum, <code class="language-plaintext highlighter-rouge">1fr</code> allows it to grow to fill the remaining space on large displays after the left column reaches its max width of max-content. Overall this works pretty much how we want!</p>

<p><img src="/assets/img/minmax-index/good-mobile.png" alt="" /></p>

<h2 id="alternatively-fit-content">Alternatively, fit-content()</h2>

<p>After spending <em>waaay</em> too long trying to get <code class="language-plaintext highlighter-rouge">minmax()</code> to work exactly how I wanted, I read a little more of the CSS grid docs and found an even simpler solution: we could just use <code class="language-plaintext highlighter-rouge">fit-content()</code> on the left column. <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/fit-content_function">According to MDN</a>, <code class="language-plaintext highlighter-rouge">fit-content()</code> essentially does <code class="language-plaintext highlighter-rouge">min(max-content, max(min-content, argument))</code>, which is almost exactly what we were looking for earlier with <code class="language-plaintext highlighter-rouge">min()</code>.</p>

<div class="language-css highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">.post-links</span> <span class="p">{</span>
    <span class="nl">display</span><span class="p">:</span> <span class="nb">grid</span><span class="p">;</span>
    <span class="nl">grid-template-columns</span><span class="p">:</span> <span class="nf">fit-content</span><span class="p">(</span><span class="m">20%</span><span class="p">)</span> <span class="m">1fr</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="extra-left-padding">Extra Left Padding</h2>

<p>One drawback I’ve noticed is that just after the breakpoint where the left column starts wrapping, that column ends up with some extra padding on the left. I haven’t found a way to fix this without totally overhauling it to use some complex flex solution or <code class="language-plaintext highlighter-rouge">@media</code> queries, so if anyone has an idea let me know!</p>

<p><img src="/assets/img/minmax-index/fit-content-margins.png" alt="" /></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:grow">
      <p>Any number greater than zero would work here; the number only matters when you have different columns with different growth factors, which allows them to grow at different rates. <a href="#fnref:grow" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
    <li id="fn:word">
      <p>Technically the “longest unbreakable content” which could be a number, URL, or whatever <a href="#fnref:word" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
    <li id="fn:gap">
      <p>The <code class="language-plaintext highlighter-rouge">1rem</code> column gap makes them not quite equivalent, but it’s close enough for our use case <a href="#fnref:gap" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="web" /><category term="css" /><summary type="html"><![CDATA[I designed my personal site from scratch many years ago, and since then it’s undergone plenty of incremental changes. Considering my general lack of artistic ability… the results are probably what you would expect. I recently decided to tweak the CSS styles of my blog’s homepage, since I felt the “recent posts” list was starting to get too dense and hard to read as I added more posts. This is what it looked like before my redesign. Functional and not terrible, but to me it looked like a boring wall of text with each post stacked on top of each other like this. The Grid Beyond just moving the non-title text (date and tags) to a line below the title, I wanted to give it a little flair. I had the idea of using a CSS grid layout to put the date in a column to the left, then the title and tags subtext in the right column. On its own, this was fairly straightforward. &lt;div class="post-links"&gt; {% for post in site.posts %} &lt;div class="post-date"&gt; &lt;span class="post-subtext"&gt;{{ post.date | date_to_string }}&lt;/span&gt; &lt;/div&gt; &lt;div class="post"&gt; &lt;a class="post-link" href="{{ post.url }}"&gt;{{ post.title }}&lt;/a&gt; &lt;div&gt;&lt;!-- tags go here --&gt;&lt;/div&gt; &lt;/div&gt; {% endfor %} &lt;/div&gt; Beyond some styles for margins, fonts, and color, these are the relevant CSS styles to make this grid work. .post-links { display: grid; grid-template-columns: max-content 1fr; gap: 1rem 1rem; } .post-date { text-align: right; } This grid-template-columns style sets the size of the two columns in the grid. max-content essentially sets the width for the first column equal to the width of the longest content of any row. The second value of 1fr sets the “growth” factor (similar to flex-grow) to make that column take up the remaining space1. It looks great! Mobile Compatibility …on desktop. Unfortunately, because our left column is a fixed size, it can take up ~1/3 of the screen width on certain mobile devices, which squishes the title column and makes things look off balance. Instead, I’d rather have the left column start wrapping at some point to give the title more space. And I didn’t want to use @media queries, since that felt like a cop-out. Grid only! Simple minmax() To make our left column able to shrink and break, we can use minmax() to set a range of widths allowed in the column. .post-links { display: grid; grid-template-columns: minmax(0, max-content) 1fr; } This works, but with a 0 minimum width, the left column could get really squished. Instead, we can use min-content to make the minimum width equal to the longest word2 in that column, so it’s always at least a reasonable width. .post-links { display: grid; grid-template-columns: minmax(min-content, max-content) 1fr; } We can simplify this further by using minmax(auto, auto) or even just auto, as these are generally equivalent. .post-links { display: grid; grid-template-columns: auto 1fr; } According to the docs: If used outside of minmax() notation, auto represents the range between the minimum and maximum described above. This behaves similarly to minmax(min-content,max-content) in most cases. Better minmax() While this does cause the date to wrap on very small displays, I noticed that on many larger phones the browser’s grid layout algorithm prioritizes shrinking the right column first, so the left column is still too wide. Ideally we could cap the left column to 20% of the screen (or parent) width. I tried using min() both on its own and within minmax(), but sadly this isn’t supported inside grid-template-columns. .post-links { display: grid; grid-template-columns: min(max-content, 20%) 1fr; /* doesn't work! */ } Instead, we can flip this around. A 20% maximum on the left column is roughly3 the same as an 80% minimum on the right column. So we could use minmax() on the right to let the right column shrink and grow as it pleases, but not smaller than 80%. .post-links { display: grid; grid-template-columns: auto minmax(80%, 1fr); } As far as our right column is concerned, this minmax() makes it at least 80% wide on smaller displays, forcing the left column to wrap. Beyond that minimum, 1fr allows it to grow to fill the remaining space on large displays after the left column reaches its max width of max-content. Overall this works pretty much how we want! Alternatively, fit-content() After spending waaay too long trying to get minmax() to work exactly how I wanted, I read a little more of the CSS grid docs and found an even simpler solution: we could just use fit-content() on the left column. According to MDN, fit-content() essentially does min(max-content, max(min-content, argument)), which is almost exactly what we were looking for earlier with min(). .post-links { display: grid; grid-template-columns: fit-content(20%) 1fr; } Extra Left Padding One drawback I’ve noticed is that just after the breakpoint where the left column starts wrapping, that column ends up with some extra padding on the left. I haven’t found a way to fix this without totally overhauling it to use some complex flex solution or @media queries, so if anyone has an idea let me know! Any number greater than zero would work here; the number only matters when you have different columns with different growth factors, which allows them to grow at different rates. (return) Technically the “longest unbreakable content” which could be a number, URL, or whatever (return) The 1rem column gap makes them not quite equivalent, but it’s close enough for our use case (return)]]></summary></entry><entry><title type="html">Scraping Zillow For The Little Guys</title><link href="http://caleb.software/posts/zillow-scraping.html" rel="alternate" type="text/html" title="Scraping Zillow For The Little Guys" /><published>2025-02-03T00:00:00+00:00</published><updated>2025-02-03T00:00:00+00:00</updated><id>http://caleb.software/posts/zillow-scraping</id><content type="html" xml:base="http://caleb.software/posts/zillow-scraping.html"><![CDATA[<p>When I was moving to the DC area in early 2025, my apartment search was plagued by luxury highrise buildings with tons of fake reviews, algorithmic pricing, and hidden fees. We toured a couple, but honestly I was so exasperated by their deceptive practices that I decided to say screw it and only look at small, independent landlords.</p>

<h2 id="searching-zillow">Searching Zillow</h2>

<p>For whatever reason, Zillow seemed to be the best platform to look for these types of apartments. Maybe because people who buy and sell rental properties are familiar with the platform, so they also list them for rent there?</p>

<p>I was hopeful I would be able to apply some filter options to show only the smaller buildings, but for some reason this feature is totally missing from Zillow. You can select certain home types, but this wasn’t useful to me. Unfortunately lots of duplexes and smaller apartment buildings are listed as “apartments” alongside the big highrises. I didn’t want to limit myself to <em>only</em> renting houses or townhomes, which tended to be a little out of my price range.</p>

<p><img src="/assets/img/zillow-scraping/home-type.png" alt="A picture of the zillow filter list showing home types of Houses, Apartments, and Townhomes." /></p>

<p>They also have this filter option for apartment communities, which turned out to be the exact opposite of what I was looking for. I want to <strong>hide</strong> these “communities” but there’s no way to inversely check this box. It seemed like I would have to figure this out myself.</p>

<p><img src="/assets/img/zillow-scraping/apt-communities.png" alt="A picture of the zillow filter list showing the apartment community option, described as: Apartment Communities are professionally managed properties that typically have extra amenities and at least 25 units." /></p>

<h2 id="editing-url-params">Editing URL Params</h2>

<p>I wondered if I could do a little light hacking and modify the URL to include the filter I wanted, since it seemed like all the other filters were included as URL params. I pulled up both a regular search and one with the “communities” filter applied and copied the URLs.</p>

<p>After pasting the URLs into an online decoder, I could see a bunch of search params that were mostly meaningless to me. The two objects were almost identical, but with one difference. When the “communities” filter box is checked, it adds a value to <code class="language-plaintext highlighter-rouge">filterState</code> called <code class="language-plaintext highlighter-rouge">"fmfb":{"value”:false}</code>. I’m not sure why checking this box adds a <code class="language-plaintext highlighter-rouge">false</code> param, but this was definitely what I was looking for.</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">https://www.zillow.com/homes/for_rent/?searchQueryState=</span><span class="p">{</span><span class="nl">"isMapVisible"</span><span class="p">:</span><span class="kc">true</span><span class="p">,</span><span class="nl">"mapBounds"</span><span class="p">:{</span><span class="nl">"west"</span><span class="p">:</span><span class="mf">-77.08456537521572</span><span class="p">,</span><span class="nl">"east"</span><span class="p">:</span><span class="mf">-77.03984758651943</span><span class="p">,</span><span class="nl">"south"</span><span class="p">:</span><span class="mf">38.83129148648382</span><span class="p">,</span><span class="nl">"north"</span><span class="p">:</span><span class="mf">38.86047064887117</span><span class="p">},</span><span class="nl">"mapZoom"</span><span class="p">:</span><span class="mi">15</span><span class="p">,</span><span class="nl">"filterState"</span><span class="p">:{</span><span class="nl">"sort"</span><span class="p">:{</span><span class="nl">"value"</span><span class="p">:</span><span class="s2">"personalizedsort"</span><span class="p">},</span><span class="nl">"fsba"</span><span class="p">:{</span><span class="nl">"value"</span><span class="p">:</span><span class="kc">false</span><span class="p">},</span><span class="nl">"fsbo"</span><span class="p">:{</span><span class="nl">"value"</span><span class="p">:</span><span class="kc">false</span><span class="p">},</span><span class="nl">"nc"</span><span class="p">:{</span><span class="nl">"value"</span><span class="p">:</span><span class="kc">false</span><span class="p">},</span><span class="nl">"cmsn"</span><span class="p">:{</span><span class="nl">"value"</span><span class="p">:</span><span class="kc">false</span><span class="p">},</span><span class="nl">"auc"</span><span class="p">:{</span><span class="nl">"value"</span><span class="p">:</span><span class="kc">false</span><span class="p">},</span><span class="nl">"fore"</span><span class="p">:{</span><span class="nl">"value"</span><span class="p">:</span><span class="kc">false</span><span class="p">},</span><span class="nl">"pmf"</span><span class="p">:{</span><span class="nl">"value"</span><span class="p">:</span><span class="kc">false</span><span class="p">},</span><span class="nl">"pf"</span><span class="p">:{</span><span class="nl">"value"</span><span class="p">:</span><span class="kc">false</span><span class="p">},</span><span class="nl">"fr"</span><span class="p">:{</span><span class="nl">"value"</span><span class="p">:</span><span class="kc">true</span><span class="p">},</span><span class="nl">"fmfb"</span><span class="p">:{</span><span class="nl">"value”:false}},”isListVisible"</span><span class="p">:</span><span class="kc">true</span><span class="p">,</span><span class="nl">"pagination"</span><span class="p">:{},</span><span class="nl">"usersSearchTerm"</span><span class="p">:</span><span class="s2">""</span><span class="p">,</span><span class="nl">"listPriceActive"</span><span class="p">:</span><span class="kc">true</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>My hope was I could just flip this value to <code class="language-plaintext highlighter-rouge">true</code> and the search filter would <em>exclude</em> these highrises. Sadly, after manually editing the value and re-encoding it, navigating to the URL just showed me a regular search with both small and large complexes visible. There must be some backend validation happening, since it simply ignored my <code class="language-plaintext highlighter-rouge">true</code> value as if <code class="language-plaintext highlighter-rouge">fmfb</code> wasn’t there.</p>

<h2 id="scraping-zillow">Scraping Zillow</h2>

<p>My next idea was to scrape all of the properties both with and without the communities filter on, then subtract the second set from the first set. This should give us the properties that <em>don’t</em> appear when the communities filter is on.</p>

<p>I found a couple of semi-abandoned open source Zillow scraping projects online, but I didn’t have a lot of faith in them. I really didn’t want to get my account banned, and I figured Zillow probably has a lot of anti-scraping and anti-automation measures on their site considering how valuable real estate data is.</p>

<p>Plus, I was only looking at a relatively small number of places (around 800 total options, after applying some basic filters for price, bedrooms, and location) so I didn’t think full-fledged automation was necessary. It would probably be simpler to just view each search page manually and use my browser’s devtools to capture whatever JSON the Zillow APIs were sending.</p>

<p><img src="/assets/img/zillow-scraping/html.png" alt="A picture of devtools showing the loaded assets and a Save as... option." /></p>

<h3 id="the-first-page">The First Page</h3>

<p>Zillow paginates its search results, so the 800 or so properties would show up across several pages. I thought each page would just be its own <code class="language-plaintext highlighter-rouge">fetch</code> request to get some JSON from the API, but the first page was different. When you initially load a Zillow search page, it loads a large HTML file that contains that first page’s data all within a <code class="language-plaintext highlighter-rouge">&lt;script&gt;</code> tag.</p>

<p><img src="/assets/img/zillow-scraping/first-page.png" alt="A picture of devtools showing a snippet of the first page request." /></p>

<h3 id="later-pages">Later Pages</h3>

<p>When you click on the next page button, it <em>does</em> hit the API and returns a JSON blob. The later page data is stored in a request called <code class="language-plaintext highlighter-rouge">async-create-search-page-state</code>. These two JSON objects contained some various extra bits of info that weren’t consistent between the two, but the actual search results data seemed to be stored under the <code class="language-plaintext highlighter-rouge">listResults</code> key in both cases.</p>

<p><img src="/assets/img/zillow-scraping/later-page.png" alt="A picture of devtools showing a snippet of a subsequent page request." /></p>

<p>Since the search data itself was formatted the same between the two, I decided to write a parser that could handle both an HTML and JSON file and spit out the JSON of just the search data.</p>

<h3 id="extracting-json">Extracting JSON</h3>

<p>The data we care about is contained between the opening and closing brackets immediately after the <code class="language-plaintext highlighter-rouge">listResults</code> string in our files. Initially I tried manually counting characters one by one, but unsurprisingly this didn’t perform well enough for even moderately large files.</p>

<p>Instead I resorted to using a <code class="language-plaintext highlighter-rouge">StringScanner</code> to extract our data. First, we skip to where we find the <code class="language-plaintext highlighter-rouge">listResults:</code> key in the file. The results sometimes had weird whitespace padding between various parts, hence the catch-all <code class="language-plaintext highlighter-rouge">\s*</code> in the regexp.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span> <span class="o">=</span> <span class="no">File</span><span class="p">.</span><span class="nf">read</span><span class="p">(</span><span class="n">file</span><span class="p">)</span>
<span class="n">scanner</span> <span class="o">=</span> <span class="no">StringScanner</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="n">scanner</span><span class="p">.</span><span class="nf">skip_until</span><span class="p">(</span><span class="sr">/"listResults":\s*\[/</span><span class="p">)</span>
</code></pre></div></div>

<p>Next, we count brackets to figure out where the array of search data ends. Every opening bracket adds one to our count, and every closing bracket subtracts. Our <code class="language-plaintext highlighter-rouge">skip_until</code> regex already captured the first opening bracket, so we start our count at one. Every opening bracket we encounter <em>should</em> have a corresponding closing bracket, so we continue in our loop until our bracket count returns to zero.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bracket_count</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">while</span> <span class="n">bracket_count</span> <span class="o">&gt;</span> <span class="mi">0</span>
  <span class="k">case</span> <span class="n">scanner</span><span class="p">.</span><span class="nf">scan</span><span class="p">(</span><span class="sr">/[^\[\]]+|\[|\]/</span><span class="p">)</span>
  <span class="k">when</span> <span class="s1">'['</span>
    <span class="n">bracket_count</span> <span class="o">+=</span> <span class="mi">1</span>
  <span class="k">when</span> <span class="s1">']'</span>
    <span class="n">bracket_count</span> <span class="o">-=</span> <span class="mi">1</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>If you aren’t a regex wizard, here’s the simple explanation. Brackets are a reserved character in regex, so when we want a literal bracket we have to escape it like <code class="language-plaintext highlighter-rouge">\[</code>. The first part of the regex <code class="language-plaintext highlighter-rouge">[^\[\]]+</code> says: match one or more <code class="language-plaintext highlighter-rouge">+</code> characters that are not <code class="language-plaintext highlighter-rouge">^</code> an open or close bracket <code class="language-plaintext highlighter-rouge">\[\]</code>. The second and third part <code class="language-plaintext highlighter-rouge">|\[|\]</code> says: or <code class="language-plaintext highlighter-rouge">|</code> match exactly one open <code class="language-plaintext highlighter-rouge">\[</code> or (again) <code class="language-plaintext highlighter-rouge">|</code> match exactly one closing <code class="language-plaintext highlighter-rouge">\]</code> bracket.</p>

<p>The <code class="language-plaintext highlighter-rouge">scan</code> function returns what string it matched, so in the case of a bracket we adjust our counts, and for other text (the actual data in the JSON) we just ignore it for now.</p>

<p>Of course, parsing HTML with regex <a href="https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454">doesn’t come without punishment</a>. For some reason our scanner would sometimes skip a few characters past the end of the last bracket, even though <code class="language-plaintext highlighter-rouge">bracket_count</code> was at zero and the loop ended. I didn’t really feel like spending a lot of time debugging, so I just wrote a small manual backtracking loop<sup id="fnref:backtrack"><a href="#fn:backtrack" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">end_pos</span> <span class="o">=</span> <span class="n">scanner</span><span class="p">.</span><span class="nf">pos</span>
<span class="n">end_pos</span> <span class="o">-=</span> <span class="mi">1</span> <span class="k">while</span> <span class="n">data</span><span class="p">[</span><span class="n">end_pos</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">']'</span>
</code></pre></div></div>

<p>Now, we can snip out the part of the file that has the JSON we’re interested in, surround it in curly brackets (that’s how all JSON must be), and parse it directly.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">start_pos</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="nf">index</span><span class="p">(</span><span class="s1">'"listResults"'</span><span class="p">)</span>
<span class="n">json_array</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="n">start_pos</span><span class="o">..</span><span class="n">end_pos</span><span class="p">]</span>
<span class="n">json_array</span><span class="p">.</span><span class="nf">strip!</span>
<span class="n">json_array</span> <span class="o">=</span> <span class="s2">"{</span><span class="si">#{</span><span class="n">json_array</span><span class="si">}</span><span class="s2">}"</span>
<span class="n">parsed_results</span> <span class="o">=</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">json_array</span><span class="p">)</span>
</code></pre></div></div>

<p>In the full script I added some extra logic for error handling and logging to help me debug various parse issues I ran into. The full source code is linked at the bottom.</p>

<h3 id="finding-apartments">Finding Apartments</h3>

<p>Since Zillow paginated the results earlier, each of our two searches (with and without the communities filter) consists of multiple saved files, which I placed in their own corresponding directories <code class="language-plaintext highlighter-rouge">all-apts</code> and <code class="language-plaintext highlighter-rouge">community-apts</code>. Now that we have our file parser, we just have to parse each of these files and combine the results. Ruby has a handy <code class="language-plaintext highlighter-rouge">flat_map</code> function that combines the arrays returned by <code class="language-plaintext highlighter-rouge">parse_file()</code> into one flat array, equivalent to doing <code class="language-plaintext highlighter-rouge">.map{}.flatten(1)</code>.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">all_apts</span> <span class="o">=</span> <span class="no">Dir</span><span class="p">.</span><span class="nf">glob</span><span class="p">(</span><span class="s2">"all-apts/*.html"</span><span class="p">).</span><span class="nf">flat_map</span> <span class="p">{</span> <span class="o">|</span><span class="n">file</span><span class="o">|</span> <span class="n">parse_file</span><span class="p">(</span><span class="n">file</span><span class="p">)</span> <span class="p">}</span>
</code></pre></div></div>

<p>We do this for both the <code class="language-plaintext highlighter-rouge">all-apts</code> and <code class="language-plaintext highlighter-rouge">community-apts</code> directories to get two different arrays. Then we filter out the apartments in <code class="language-plaintext highlighter-rouge">all_apts</code> by kicking out any elements that also exist in the <code class="language-plaintext highlighter-rouge">community_apts</code> array, which gives us our list of small buildings.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">small_apts</span> <span class="o">=</span> <span class="n">all_apts</span><span class="p">.</span><span class="nf">reject</span> <span class="p">{</span> <span class="o">|</span><span class="n">apt</span><span class="o">|</span> <span class="n">community_apts</span><span class="p">.</span><span class="nf">any?</span> <span class="p">{</span> <span class="o">|</span><span class="n">community</span><span class="o">|</span> <span class="n">community</span><span class="p">[</span><span class="s2">"id"</span><span class="p">]</span> <span class="o">==</span> <span class="n">apt</span><span class="p">[</span><span class="s2">"id"</span><span class="p">]</span> <span class="p">}</span> <span class="p">}</span>
</code></pre></div></div>

<h3 id="exporting-to-csv">Exporting to CSV</h3>

<p>The last thing is writing our data to a CSV file so I could create a nice color coded spreadsheet with sorting &amp; filtering. Some of the entries were missing some fields, so I just skipped adding those to the CSV.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">CSV</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="s1">'output.csv'</span><span class="p">,</span> <span class="s1">'w'</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">csv</span><span class="o">|</span>
  <span class="n">csv</span> <span class="o">&lt;&lt;</span> <span class="p">[</span><span class="s1">'detailUrl'</span><span class="p">,</span> <span class="s1">'address'</span><span class="p">,</span> <span class="s1">'unformattedPrice'</span><span class="p">,</span> <span class="s1">'beds'</span><span class="p">,</span> <span class="s1">'baths'</span><span class="p">,</span> <span class="s1">'area'</span><span class="p">,</span> <span class="s1">'latitude'</span><span class="p">,</span> <span class="s1">'longitude'</span><span class="p">]</span>  <span class="c1"># Header row</span>
  <span class="n">small_apts</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">apt</span><span class="o">|</span>
      <span class="k">if</span> <span class="n">apt</span><span class="p">[</span><span class="s1">'detailUrl'</span><span class="p">]</span> <span class="o">&amp;&amp;</span> <span class="n">apt</span><span class="p">[</span><span class="s1">'address'</span><span class="p">]</span> <span class="o">&amp;&amp;</span> <span class="n">apt</span><span class="p">[</span><span class="s1">'unformattedPrice'</span><span class="p">]</span> <span class="o">&amp;&amp;</span> <span class="n">apt</span><span class="p">[</span><span class="s1">'beds'</span><span class="p">]</span> <span class="o">&amp;&amp;</span> <span class="n">apt</span><span class="p">[</span><span class="s1">'baths'</span><span class="p">]</span> <span class="o">&amp;&amp;</span> <span class="n">apt</span><span class="p">[</span><span class="s1">'area'</span><span class="p">]</span> <span class="o">&amp;&amp;</span> <span class="n">apt</span><span class="p">[</span><span class="s1">'latLong'</span><span class="p">]</span> <span class="o">&amp;&amp;</span> <span class="n">apt</span><span class="p">[</span><span class="s1">'latLong'</span><span class="p">][</span><span class="s1">'latitude'</span><span class="p">]</span> <span class="o">&amp;&amp;</span> <span class="n">apt</span><span class="p">[</span><span class="s1">'latLong'</span><span class="p">][</span><span class="s1">'longitude'</span><span class="p">]</span>
        <span class="n">csv</span> <span class="o">&lt;&lt;</span> <span class="p">[</span><span class="n">apt</span><span class="p">[</span><span class="s1">'detailUrl'</span><span class="p">],</span> <span class="n">apt</span><span class="p">[</span><span class="s1">'address'</span><span class="p">],</span> <span class="n">apt</span><span class="p">[</span><span class="s1">'unformattedPrice'</span><span class="p">],</span> <span class="n">apt</span><span class="p">[</span><span class="s1">'beds'</span><span class="p">],</span> <span class="n">apt</span><span class="p">[</span><span class="s1">'baths'</span><span class="p">],</span> <span class="n">apt</span><span class="p">[</span><span class="s1">'area'</span><span class="p">],</span> <span class="n">apt</span><span class="p">[</span><span class="s1">'latLong'</span><span class="p">][</span><span class="s1">'latitude'</span><span class="p">],</span> <span class="n">apt</span><span class="p">[</span><span class="s1">'latLong'</span><span class="p">][</span><span class="s1">'longitude'</span><span class="p">]]</span>
      <span class="k">end</span>
    <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Look at that beauty! Out of nearly 800 options in the original search we ended up with 85 in our small apartment spreadsheet. Much better than going through all of that by hand. Ultimately, we used this sheet to find a couple places to tour and finally choose the perfect apartment for us.</p>

<p><img src="/assets/img/zillow-scraping/spreadsheet.png" alt="A picture of my spreadsheet with the apartment data." /></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:backtrack">
      <p>I’m hoping this hacky solution is enough to invoke <a href="https://meta.wikimedia.org/wiki/Cunningham%27s_Law">Cunningham’s law</a> and get somebody to tell me what’s actually going on here :^) <a href="#fnref:backtrack" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="ruby" /><summary type="html"><![CDATA[When I was moving to the DC area in early 2025, my apartment search was plagued by luxury highrise buildings with tons of fake reviews, algorithmic pricing, and hidden fees. We toured a couple, but honestly I was so exasperated by their deceptive practices that I decided to say screw it and only look at small, independent landlords. Searching Zillow For whatever reason, Zillow seemed to be the best platform to look for these types of apartments. Maybe because people who buy and sell rental properties are familiar with the platform, so they also list them for rent there? I was hopeful I would be able to apply some filter options to show only the smaller buildings, but for some reason this feature is totally missing from Zillow. You can select certain home types, but this wasn’t useful to me. Unfortunately lots of duplexes and smaller apartment buildings are listed as “apartments” alongside the big highrises. I didn’t want to limit myself to only renting houses or townhomes, which tended to be a little out of my price range. They also have this filter option for apartment communities, which turned out to be the exact opposite of what I was looking for. I want to hide these “communities” but there’s no way to inversely check this box. It seemed like I would have to figure this out myself. Editing URL Params I wondered if I could do a little light hacking and modify the URL to include the filter I wanted, since it seemed like all the other filters were included as URL params. I pulled up both a regular search and one with the “communities” filter applied and copied the URLs. After pasting the URLs into an online decoder, I could see a bunch of search params that were mostly meaningless to me. The two objects were almost identical, but with one difference. When the “communities” filter box is checked, it adds a value to filterState called "fmfb":{"value”:false}. I’m not sure why checking this box adds a false param, but this was definitely what I was looking for. https://www.zillow.com/homes/for_rent/?searchQueryState={"isMapVisible":true,"mapBounds":{"west":-77.08456537521572,"east":-77.03984758651943,"south":38.83129148648382,"north":38.86047064887117},"mapZoom":15,"filterState":{"sort":{"value":"personalizedsort"},"fsba":{"value":false},"fsbo":{"value":false},"nc":{"value":false},"cmsn":{"value":false},"auc":{"value":false},"fore":{"value":false},"pmf":{"value":false},"pf":{"value":false},"fr":{"value":true},"fmfb":{"value”:false}},”isListVisible":true,"pagination":{},"usersSearchTerm":"","listPriceActive":true} My hope was I could just flip this value to true and the search filter would exclude these highrises. Sadly, after manually editing the value and re-encoding it, navigating to the URL just showed me a regular search with both small and large complexes visible. There must be some backend validation happening, since it simply ignored my true value as if fmfb wasn’t there. Scraping Zillow My next idea was to scrape all of the properties both with and without the communities filter on, then subtract the second set from the first set. This should give us the properties that don’t appear when the communities filter is on. I found a couple of semi-abandoned open source Zillow scraping projects online, but I didn’t have a lot of faith in them. I really didn’t want to get my account banned, and I figured Zillow probably has a lot of anti-scraping and anti-automation measures on their site considering how valuable real estate data is. Plus, I was only looking at a relatively small number of places (around 800 total options, after applying some basic filters for price, bedrooms, and location) so I didn’t think full-fledged automation was necessary. It would probably be simpler to just view each search page manually and use my browser’s devtools to capture whatever JSON the Zillow APIs were sending. The First Page Zillow paginates its search results, so the 800 or so properties would show up across several pages. I thought each page would just be its own fetch request to get some JSON from the API, but the first page was different. When you initially load a Zillow search page, it loads a large HTML file that contains that first page’s data all within a &lt;script&gt; tag. Later Pages When you click on the next page button, it does hit the API and returns a JSON blob. The later page data is stored in a request called async-create-search-page-state. These two JSON objects contained some various extra bits of info that weren’t consistent between the two, but the actual search results data seemed to be stored under the listResults key in both cases. Since the search data itself was formatted the same between the two, I decided to write a parser that could handle both an HTML and JSON file and spit out the JSON of just the search data. Extracting JSON The data we care about is contained between the opening and closing brackets immediately after the listResults string in our files. Initially I tried manually counting characters one by one, but unsurprisingly this didn’t perform well enough for even moderately large files. Instead I resorted to using a StringScanner to extract our data. First, we skip to where we find the listResults: key in the file. The results sometimes had weird whitespace padding between various parts, hence the catch-all \s* in the regexp. data = File.read(file) scanner = StringScanner.new(data) scanner.skip_until(/"listResults":\s*\[/) Next, we count brackets to figure out where the array of search data ends. Every opening bracket adds one to our count, and every closing bracket subtracts. Our skip_until regex already captured the first opening bracket, so we start our count at one. Every opening bracket we encounter should have a corresponding closing bracket, so we continue in our loop until our bracket count returns to zero. bracket_count = 1 while bracket_count &gt; 0 case scanner.scan(/[^\[\]]+|\[|\]/) when '[' bracket_count += 1 when ']' bracket_count -= 1 end end If you aren’t a regex wizard, here’s the simple explanation. Brackets are a reserved character in regex, so when we want a literal bracket we have to escape it like \[. The first part of the regex [^\[\]]+ says: match one or more + characters that are not ^ an open or close bracket \[\]. The second and third part |\[|\] says: or | match exactly one open \[ or (again) | match exactly one closing \] bracket. The scan function returns what string it matched, so in the case of a bracket we adjust our counts, and for other text (the actual data in the JSON) we just ignore it for now. Of course, parsing HTML with regex doesn’t come without punishment. For some reason our scanner would sometimes skip a few characters past the end of the last bracket, even though bracket_count was at zero and the loop ended. I didn’t really feel like spending a lot of time debugging, so I just wrote a small manual backtracking loop1. end_pos = scanner.pos end_pos -= 1 while data[end_pos] != ']' Now, we can snip out the part of the file that has the JSON we’re interested in, surround it in curly brackets (that’s how all JSON must be), and parse it directly. start_pos = data.index('"listResults"') json_array = data[start_pos..end_pos] json_array.strip! json_array = "{#{json_array}}" parsed_results = JSON.parse(json_array) In the full script I added some extra logic for error handling and logging to help me debug various parse issues I ran into. The full source code is linked at the bottom. Finding Apartments Since Zillow paginated the results earlier, each of our two searches (with and without the communities filter) consists of multiple saved files, which I placed in their own corresponding directories all-apts and community-apts. Now that we have our file parser, we just have to parse each of these files and combine the results. Ruby has a handy flat_map function that combines the arrays returned by parse_file() into one flat array, equivalent to doing .map{}.flatten(1). all_apts = Dir.glob("all-apts/*.html").flat_map { |file| parse_file(file) } We do this for both the all-apts and community-apts directories to get two different arrays. Then we filter out the apartments in all_apts by kicking out any elements that also exist in the community_apts array, which gives us our list of small buildings. small_apts = all_apts.reject { |apt| community_apts.any? { |community| community["id"] == apt["id"] } } Exporting to CSV The last thing is writing our data to a CSV file so I could create a nice color coded spreadsheet with sorting &amp; filtering. Some of the entries were missing some fields, so I just skipped adding those to the CSV. CSV.open('output.csv', 'w') do |csv| csv &lt;&lt; ['detailUrl', 'address', 'unformattedPrice', 'beds', 'baths', 'area', 'latitude', 'longitude'] # Header row small_apts.each do |apt| if apt['detailUrl'] &amp;&amp; apt['address'] &amp;&amp; apt['unformattedPrice'] &amp;&amp; apt['beds'] &amp;&amp; apt['baths'] &amp;&amp; apt['area'] &amp;&amp; apt['latLong'] &amp;&amp; apt['latLong']['latitude'] &amp;&amp; apt['latLong']['longitude'] csv &lt;&lt; [apt['detailUrl'], apt['address'], apt['unformattedPrice'], apt['beds'], apt['baths'], apt['area'], apt['latLong']['latitude'], apt['latLong']['longitude']] end end end Look at that beauty! Out of nearly 800 options in the original search we ended up with 85 in our small apartment spreadsheet. Much better than going through all of that by hand. Ultimately, we used this sheet to find a couple places to tour and finally choose the perfect apartment for us. I’m hoping this hacky solution is enough to invoke Cunningham’s law and get somebody to tell me what’s actually going on here :^) (return)]]></summary></entry><entry><title type="html">Deploying A Go App On Apache</title><link href="http://caleb.software/posts/deploying-go-apache.html" rel="alternate" type="text/html" title="Deploying A Go App On Apache" /><published>2024-08-14T00:00:00+00:00</published><updated>2024-08-14T00:00:00+00:00</updated><id>http://caleb.software/posts/deploying-go-apache</id><content type="html" xml:base="http://caleb.software/posts/deploying-go-apache.html"><![CDATA[<p>I recently decided to try my hand at creating a <a href="https://go.dev/">Go-based</a> website with the <a href="https://github.com/gin-gonic/gin">Gin framework</a>, and I was surprised to find there isn’t a simple way to host this app on my personal server which is running Apache.</p>

<p>In the past I’ve created a couple of web apps with <a href="https://sinatrarb.com/">Sinatra</a> (in Ruby), and deploying them was relatively simple. It required just installing the <a href="https://github.com/phusion/passenger">Phusion Passenger</a> packages with <code class="language-plaintext highlighter-rouge">apt</code>, enabling the Passenger Apache module, and <a href="https://www.phusionpassenger.com/library/walkthroughs/deploy/ruby/digital_ocean/apache/oss/bionic/deploy_app.html#rails_edit-apache-configuration-file">editing the apache config</a> to include the app at a certain subdomain.</p>

<p>Unfortunately, it seems like nothing like this appears to exist for Go apps. One of the benefits of something like Passenger is that it automatically handles app startup (after, say, rebooting my server) and can restart the app automatically if it crashes for some reason. All nice to have, but for this small personal project I decided it doesn’t matter that much.</p>

<h2 id="running-with-gin">Running With Gin</h2>

<p>Instead, I decided to find a quick and dirty solution to get my website up and running. Gin comes with a simple built-in web server, which we can run from the command line.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// api.go</span>
<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
	<span class="n">router</span> <span class="o">:=</span> <span class="n">gin</span><span class="o">.</span><span class="n">Default</span><span class="p">()</span>
	<span class="n">router</span><span class="o">.</span><span class="n">StaticFile</span><span class="p">(</span><span class="s">"/"</span><span class="p">,</span> <span class="s">"./index.html"</span><span class="p">)</span>
	<span class="n">router</span><span class="o">.</span><span class="n">GET</span><span class="p">(</span><span class="s">"/stories"</span><span class="p">,</span> <span class="n">getTopStories</span><span class="p">)</span>
	<span class="n">router</span><span class="o">.</span><span class="n">Run</span><span class="p">(</span><span class="s">"localhost:8080"</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>go run <span class="nb">.</span>
</code></pre></div></div>

<p>From here, we can simply hit <code class="language-plaintext highlighter-rouge">Ctrl+Z</code> to suspend the program, then disown it and make it run in the background<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">disown</span> <span class="nt">-h</span>
<span class="nv">$ </span><span class="nb">bg</span>
</code></pre></div></div>

<p>And voila, we have our web server as a background process running forever (at least, until it crashes).</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>ps x | <span class="nb">grep </span>go
 217490 pts/1    S+     0:00 <span class="nb">grep</span> <span class="nt">--color</span><span class="o">=</span>auto go
3934484 ?        Sl     0:37 /usr/local/go/bin/go run <span class="nb">.</span>
3934514 ?        Sl     0:24 /tmp/go-build1016660365/b001/exe/api
</code></pre></div></div>

<h2 id="configuring-apache">Configuring Apache</h2>

<p>Currently, our web app is only accessible from <code class="language-plaintext highlighter-rouge">localhost:8080</code>, but we want to make it available to the world! To do this, we’ll use the <a href="https://httpd.apache.org/docs/current/mod/mod_proxy.html">Apache proxy module</a> to redirect traffic from a certain public subdomain to our local Gin server.</p>

<p>First we just enable the module:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>a2enmod proxy 
<span class="nv">$ </span>a2enmod proxy_http
</code></pre></div></div>

<p>Next, we’ll create a new apache config for our subdomain and tell it to proxy all of our traffic to our Gin server<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>:</p>

<div class="language-conf highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;<span class="n">VirtualHost</span> *:<span class="m">80</span>&gt; 
        <span class="n">ProxyPreserveHost</span> <span class="n">On</span>
        <span class="n">ProxyRequests</span> <span class="n">Off</span>
        <span class="n">ServerName</span> <span class="n">hn</span>.<span class="n">caleb</span>.<span class="n">software</span>
        <span class="n">ProxyPass</span> / <span class="n">http</span>://<span class="n">localhost</span>:<span class="m">8080</span>/
        <span class="n">ProxyPassReverse</span> / <span class="n">http</span>://<span class="n">localhost</span>:<span class="m">8080</span>/
&lt;/<span class="n">VirtualHost</span>&gt; 
</code></pre></div></div>

<p>And bam, after restarting Apache, our site is now live at <a href="http://hn.caleb.software/">hn.caleb.software</a>!</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>Source: <a href="https://stackoverflow.com/a/954415">stack overflow</a> post <a href="#fnref:1" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
    <li id="fn:2">
      <p>Source: <a href="https://stackoverflow.com/a/13089668">stack overflow</a> post <a href="#fnref:2" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="go" /><category term="apache" /><summary type="html"><![CDATA[I recently decided to try my hand at creating a Go-based website with the Gin framework, and I was surprised to find there isn’t a simple way to host this app on my personal server which is running Apache. In the past I’ve created a couple of web apps with Sinatra (in Ruby), and deploying them was relatively simple. It required just installing the Phusion Passenger packages with apt, enabling the Passenger Apache module, and editing the apache config to include the app at a certain subdomain. Unfortunately, it seems like nothing like this appears to exist for Go apps. One of the benefits of something like Passenger is that it automatically handles app startup (after, say, rebooting my server) and can restart the app automatically if it crashes for some reason. All nice to have, but for this small personal project I decided it doesn’t matter that much. Running With Gin Instead, I decided to find a quick and dirty solution to get my website up and running. Gin comes with a simple built-in web server, which we can run from the command line. // api.go func main() { router := gin.Default() router.StaticFile("/", "./index.html") router.GET("/stories", getTopStories) router.Run("localhost:8080") } $ go run . From here, we can simply hit Ctrl+Z to suspend the program, then disown it and make it run in the background1: $ disown -h $ bg And voila, we have our web server as a background process running forever (at least, until it crashes). $ ps x | grep go 217490 pts/1 S+ 0:00 grep --color=auto go 3934484 ? Sl 0:37 /usr/local/go/bin/go run . 3934514 ? Sl 0:24 /tmp/go-build1016660365/b001/exe/api Configuring Apache Currently, our web app is only accessible from localhost:8080, but we want to make it available to the world! To do this, we’ll use the Apache proxy module to redirect traffic from a certain public subdomain to our local Gin server. First we just enable the module: $ a2enmod proxy $ a2enmod proxy_http Next, we’ll create a new apache config for our subdomain and tell it to proxy all of our traffic to our Gin server2: &lt;VirtualHost *:80&gt; ProxyPreserveHost On ProxyRequests Off ServerName hn.caleb.software ProxyPass / http://localhost:8080/ ProxyPassReverse / http://localhost:8080/ &lt;/VirtualHost&gt; And bam, after restarting Apache, our site is now live at hn.caleb.software! Source: stack overflow post (return) Source: stack overflow post (return)]]></summary></entry><entry><title type="html">Categorizing Hacker News Posts</title><link href="http://caleb.software/posts/hn-reader.html" rel="alternate" type="text/html" title="Categorizing Hacker News Posts" /><published>2024-07-26T00:00:00+00:00</published><updated>2024-07-26T00:00:00+00:00</updated><id>http://caleb.software/posts/hn-reader</id><content type="html" xml:base="http://caleb.software/posts/hn-reader.html"><![CDATA[<p><strong>tl;dr:</strong> check it out at <a href="http://hn.caleb.software/">hn.caleb.software</a>.</p>

<h2 id="front-page-news">Front Page News</h2>

<p>I like Hacker News. For all of its warts (looking at you, crypto bros) it’s a great place for finding interesting tech articles and blogs. The site guidelines are pretty straightforward, saying that “<a href="https://news.ycombinator.com/newsguidelines.html">if they’d cover it on TV news, it’s probably off-topic</a>.” While the posts and discussions are usually high quality, this doesn’t stop the occasional ragebait drama-politics from popping up on the front page from time to time.</p>

<h2 id="domain-categorization">Domain Categorization</h2>

<p>I’ve noticed that most of the stories I don’t care about tend to come from the same hundred or so websites. The other day I had an idea: what if I could build a simple frontend for HN that would allow me to sort each post (and its domain) into a few categories? Then, if I’m just in the mood for some indie tech blog articles I could just the front page to only show those types of stories?</p>

<p>… and then I can use these domain classifications to train a neural network on keywords found in the article titles, maybe create a latent Dirichlet allocation, and then use that to preemptively predict what category a new post should… hold on, let’s just start with the first idea.</p>

<h2 id="calebs-hn-reader">Caleb’s HN Reader</h2>

<p>So I made a tiny REST API in Go which basically just acts as a proxy for the HN API<sup id="fnref:cors"><a href="#fn:cors" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>, and a simple webpage which displays all of the stories alongside a little select box to categorize each one. It stores the sorted domain list in the browser’s local storage, so anyone can use it without me needing to deal with user accounts or a database. And yes, I spent exactly seven minutes creating the CSS styles for the page. Isn’t it beautiful?</p>

<p><img src="/assets/img/hn-reader/reader.jpeg" alt="" /></p>

<p>Right now there’s just a handful of default categories: <code class="language-plaintext highlighter-rouge">Technology</code>, <code class="language-plaintext highlighter-rouge">Miscellaneous</code>, <code class="language-plaintext highlighter-rouge">Business</code>, <code class="language-plaintext highlighter-rouge">Science</code>, <code class="language-plaintext highlighter-rouge">Projects &amp; Companies</code>, and <code class="language-plaintext highlighter-rouge">Garbage</code>. There’s also a settings page where you can import/export your domain lists for sharing as well as modify what categories are available.</p>

<p><img src="/assets/img/hn-reader/settings.png" alt="" /></p>

<p>I’ll release more details on the technical implementation and some things I learned pretty soon. In the meantime, you can check out the source yourself <a href="https://github.com/karagenit/hn-reader">here on Github</a>.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:cors">
      <p>I really wanted to make this project frontend-only for ease of hosting, but sadly the CORS settings on the HN API made this impossible. <a href="#fnref:cors" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="projects" /><category term="web" /><category term="go" /><summary type="html"><![CDATA[tl;dr: check it out at hn.caleb.software. Front Page News I like Hacker News. For all of its warts (looking at you, crypto bros) it’s a great place for finding interesting tech articles and blogs. The site guidelines are pretty straightforward, saying that “if they’d cover it on TV news, it’s probably off-topic.” While the posts and discussions are usually high quality, this doesn’t stop the occasional ragebait drama-politics from popping up on the front page from time to time. Domain Categorization I’ve noticed that most of the stories I don’t care about tend to come from the same hundred or so websites. The other day I had an idea: what if I could build a simple frontend for HN that would allow me to sort each post (and its domain) into a few categories? Then, if I’m just in the mood for some indie tech blog articles I could just the front page to only show those types of stories? … and then I can use these domain classifications to train a neural network on keywords found in the article titles, maybe create a latent Dirichlet allocation, and then use that to preemptively predict what category a new post should… hold on, let’s just start with the first idea. Caleb’s HN Reader So I made a tiny REST API in Go which basically just acts as a proxy for the HN API1, and a simple webpage which displays all of the stories alongside a little select box to categorize each one. It stores the sorted domain list in the browser’s local storage, so anyone can use it without me needing to deal with user accounts or a database. And yes, I spent exactly seven minutes creating the CSS styles for the page. Isn’t it beautiful? Right now there’s just a handful of default categories: Technology, Miscellaneous, Business, Science, Projects &amp; Companies, and Garbage. There’s also a settings page where you can import/export your domain lists for sharing as well as modify what categories are available. I’ll release more details on the technical implementation and some things I learned pretty soon. In the meantime, you can check out the source yourself here on Github. I really wanted to make this project frontend-only for ease of hosting, but sadly the CORS settings on the HN API made this impossible. (return)]]></summary></entry><entry><title type="html">The Tilde Tragedy</title><link href="http://caleb.software/posts/rm-tilde.html" rel="alternate" type="text/html" title="The Tilde Tragedy" /><published>2024-04-27T00:00:00+00:00</published><updated>2024-04-27T00:00:00+00:00</updated><id>http://caleb.software/posts/rm-tilde</id><content type="html" xml:base="http://caleb.software/posts/rm-tilde.html"><![CDATA[<p>Anyone familiar with bash<sup id="fnref:bash"><a href="#fn:bash" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> knows about <a href="https://www.gnu.org/software/bash/manual/html_node/Tilde-Expansion.html">tilde expansion</a>: in shell scripts and from the terminal, the tilde character is automatically replaced with the value of the <code class="language-plaintext highlighter-rouge">$HOME</code> variable, which points to the current user’s home directory. That means that</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cd</span> ~
</code></pre></div></div>

<p>is automatically expanded to</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cd</span> /home/caleb
</code></pre></div></div>

<p>which can be a very convenient shortcut!</p>

<h2 id="in-ruby">In Ruby</h2>

<p>Sometime in 2015<sup id="fnref:age"><a href="#fn:age" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>, I had switched my computer over to Debian only a few months prior and I was just getting into programming with Ruby. I don’t remember exactly what I was trying to accomplish that day, but I had a script that was checking for the existence of a directory under my home folder and creating it if it was missing<sup id="fnref:mkdir"><a href="#fn:mkdir" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Dir</span><span class="p">.</span><span class="nf">mkdir</span><span class="p">(</span><span class="s1">'~/foo/bar'</span><span class="p">)</span> <span class="k">unless</span> <span class="no">Dir</span><span class="p">.</span><span class="nf">exist?</span><span class="p">(</span><span class="s1">'~/foo/bar'</span><span class="p">)</span>
</code></pre></div></div>

<p>Of course, the problem here is that <em>Ruby</em>, unlike Bash, doesn’t inherently support tilde expansion. Instead, you have to explicitly call something like <code class="language-plaintext highlighter-rouge">File.expand_path()</code> to handle the expansion. In retrospect this seems obvious, but to a young programmer I’m sure you can see why I expected this to “just work” here.</p>

<p>The first time I ran this script, it correctly detected the directory didn’t exist and went ahead and created it. But instead of creating the directory under my home folder, it created it right inside of the project directory I was currently working in… with the directory being named the literal <code class="language-plaintext highlighter-rouge">~</code>.</p>

<p>And this is the point where I screwed up big time. I fixed my script, and then tried to get rid of that weird tilde folder inside my project directory, like so:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">rm</span> <span class="nt">-rf</span> ~
</code></pre></div></div>

<p>The command took a few seconds to run, which I thought was strange at first… until the realization hit me. I quickly spammed <code class="language-plaintext highlighter-rouge">Ctrl+C</code> to cancel the command, but it had already blown away half of my home directory.</p>

<p>Moral of the story: trying to learn a new programming language after 2 a.m. is a bad idea. Oh, and make backups.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:bash">
      <p>I originally had this as “shell scripting” here, but I realized that I rarely use the old-school <code class="language-plaintext highlighter-rouge">sh</code> shell and don’t remember if it supports tilde expansion or not. Exercise for the reader: try finding the answer via Google. All of my results were just SO posts on how tilde expansion works in bash. (I eventually just asked Anthropic’s Claude LLM, who handily told me tilde expansion was one of the many new features bash introduced over sh. Big, if true!) <a href="#fnref:bash" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
    <li id="fn:age">
      <p>For context, I would’ve been in 8th on 9th grade here, so hopefully that excuses what follows :) <a href="#fnref:age" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
    <li id="fn:mkdir">
      <p>Actually, since we’re creating nested directories I was probably using <code class="language-plaintext highlighter-rouge">FileUtils::mkdir_p</code>, which supports creating subdirectories (unlike the regular <code class="language-plaintext highlighter-rouge">mkdir</code>). Both methods behave the same as far as tilde expansion goes. <a href="#fnref:mkdir" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="ruby" /><category term="linux" /><category term="bash" /><summary type="html"><![CDATA[Anyone familiar with bash1 knows about tilde expansion: in shell scripts and from the terminal, the tilde character is automatically replaced with the value of the $HOME variable, which points to the current user’s home directory. That means that $ cd ~ is automatically expanded to $ cd /home/caleb which can be a very convenient shortcut! In Ruby Sometime in 20152, I had switched my computer over to Debian only a few months prior and I was just getting into programming with Ruby. I don’t remember exactly what I was trying to accomplish that day, but I had a script that was checking for the existence of a directory under my home folder and creating it if it was missing3: Dir.mkdir('~/foo/bar') unless Dir.exist?('~/foo/bar') Of course, the problem here is that Ruby, unlike Bash, doesn’t inherently support tilde expansion. Instead, you have to explicitly call something like File.expand_path() to handle the expansion. In retrospect this seems obvious, but to a young programmer I’m sure you can see why I expected this to “just work” here. The first time I ran this script, it correctly detected the directory didn’t exist and went ahead and created it. But instead of creating the directory under my home folder, it created it right inside of the project directory I was currently working in… with the directory being named the literal ~. And this is the point where I screwed up big time. I fixed my script, and then tried to get rid of that weird tilde folder inside my project directory, like so: $ rm -rf ~ The command took a few seconds to run, which I thought was strange at first… until the realization hit me. I quickly spammed Ctrl+C to cancel the command, but it had already blown away half of my home directory. Moral of the story: trying to learn a new programming language after 2 a.m. is a bad idea. Oh, and make backups. I originally had this as “shell scripting” here, but I realized that I rarely use the old-school sh shell and don’t remember if it supports tilde expansion or not. Exercise for the reader: try finding the answer via Google. All of my results were just SO posts on how tilde expansion works in bash. (I eventually just asked Anthropic’s Claude LLM, who handily told me tilde expansion was one of the many new features bash introduced over sh. Big, if true!) (return) For context, I would’ve been in 8th on 9th grade here, so hopefully that excuses what follows :) (return) Actually, since we’re creating nested directories I was probably using FileUtils::mkdir_p, which supports creating subdirectories (unlike the regular mkdir). Both methods behave the same as far as tilde expansion goes. (return)]]></summary></entry><entry><title type="html">Publishing A Blog Series With Jekyll</title><link href="http://caleb.software/posts/jekyll-series.html" rel="alternate" type="text/html" title="Publishing A Blog Series With Jekyll" /><published>2024-03-27T00:00:00+00:00</published><updated>2024-03-27T00:00:00+00:00</updated><id>http://caleb.software/posts/jekyll-series</id><content type="html" xml:base="http://caleb.software/posts/jekyll-series.html"><![CDATA[<p>Sometimes a blog post can get too long, and it makes sense to split it into multiple parts. For example, I recently posted about my <a href="/posts/hn-reader.html">HN Reader project</a>, and split it into multiple posts with each one detailing different aspects of the project (frontend, API, deployment, etc).</p>

<p>Of course, a reader might find a link to a later entry in the series and want to start from the beginning, so it would be nice to link all of the other posts in the series at the top of the page. I could do all of this manually, but as the series gets longer and longer I would have to remember to go back and update every other post in the series to add the new link. I felt that long-term this would be a pain to maintain and could lead to broken or missing links.</p>

<p>Instead, this should be pretty easy to automate with <a href="https://jekyllrb.com/">Jekyll</a>, which is the static site generator I use to publish my blog.</p>

<h2 id="frontmatter-yaml">Frontmatter YAML</h2>

<p>Each blog post on my website starts as a markdown file with a <a href="https://jekyllrb.com/docs/front-matter/#custom-variables">frontmatter header</a> at the beginning, which is just YAML. We can add our own attribute to the post’s frontmatter to keep track of what series it belongs to. We’ll simply call the variable <code class="language-plaintext highlighter-rouge">series</code> and set it to whatever string we want. Then, we can set the same <code class="language-plaintext highlighter-rouge">series</code> value on other posts to link them together, just like the one linked at the top of this page<sup id="fnref:series"><a href="#fn:series" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>---
title: "Publishing A Blog Series With Jekyll"
series: jekyll
---

... rest of the post
</code></pre></div></div>

<h2 id="liquid-templates">Liquid Templates</h2>

<p>Next we’ll add some template logic to the layout page for our blog posts. Jekyll uses the <a href="https://shopify.dev/docs/api/liquid">Liquid</a> template language, which allows us to use conditional logic and loops right inside our markdown files.</p>

<p>If the page has the <code class="language-plaintext highlighter-rouge">series</code> attribute, we’ll insert the div that contains the links to other posts in the series. After that, we just iterate through each post on the site, and add a link to any post that has a matching <code class="language-plaintext highlighter-rouge">series</code> value. We also skip the post if the title matches the current page’s title, since including a link to the current page would be pointless.</p>

<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{% if page.series %}
  {% for post in site.posts %}
    {% if post.series == page.series %}
      {% unless post.title == page.title %}
      <span class="nt">&lt;a</span> <span class="na">href=</span><span class="s">"{{ post.url }}"</span><span class="nt">&gt;</span>{{ post.title }}<span class="nt">&lt;/a&gt;</span>
      {% endunless %}
    {% endif %}
  {% endfor %}
{% endif %}
</code></pre></div></div>

<h2 id="html-details">HTML Details</h2>

<p>This works great so far, but for a long series this could add a lot of content to the top of the page before the actual post starts. Instead, I decided to hide all of the related posts within an HTML <code class="language-plaintext highlighter-rouge">&lt;details&gt;</code> <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element/details">section</a> which is collapsed by default.</p>

<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;details</span> <span class="na">class=</span><span class="s">"series"</span><span class="nt">&gt;</span>
  <span class="nt">&lt;summary&gt;</span>Related Posts<span class="nt">&lt;/summary&gt;</span>
  <span class="c">&lt;!-- list of posts goes here --&gt;</span>
  <span class="nt">&lt;a</span> <span class="na">href=</span><span class="s">"#"</span><span class="nt">&gt;</span>Another post in this series<span class="nt">&lt;/a&gt;</span>
  <span class="nt">&lt;a</span> <span class="na">href=</span><span class="s">"#"</span><span class="nt">&gt;</span>Another post in this series<span class="nt">&lt;/a&gt;</span>
<span class="nt">&lt;/details&gt;</span>
</code></pre></div></div>

<p>I added some CSS styles to the section to change the font size, weight, and color. I also decided to change what the expand/collapse icon looked like since I thought the default triangle was boring.</p>

<div class="language-css highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">details</span> <span class="o">&gt;</span> <span class="nt">summary</span> <span class="p">{</span>
    <span class="nl">list-style-type</span><span class="p">:</span> <span class="s2">'[+] '</span><span class="p">;</span>
<span class="p">}</span>

<span class="nt">details</span><span class="o">[</span><span class="nt">open</span><span class="o">]</span> <span class="o">&gt;</span> <span class="nt">summary</span> <span class="p">{</span>
    <span class="nl">list-style-type</span><span class="p">:</span> <span class="s2">'[\00AC] '</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>That <code class="language-plaintext highlighter-rouge">\00AC</code> code point represents the <a href="https://www.toptal.com/designers/htmlarrows/punctuation/not-sign/">not sign</a>. But other than because I think boolean algebra notation looks cool, why use this? I tried a regular <code class="language-plaintext highlighter-rouge">-</code> as well as an <a href="https://www.merriam-webster.com/grammar/em-dash-en-dash-how-to-use">en- and em-dash</a>, but these symbols weren’t the same length as the <code class="language-plaintext highlighter-rouge">+</code> symbol. As a result, the “Related Posts” text would shift left/right every time you clicked on it and that <em>really</em> annoyed me.</p>

<p>There’s probably a better way to do this to ensure the symbols are the same width across different fonts too, but I couldn’t figure it out and gave up after a couple minutes. The <code class="language-plaintext highlighter-rouge">summary::marker</code> <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/::marker">pseudo-selector</a> doesn’t support the <code class="language-plaintext highlighter-rouge">width</code> property (or many properties in general).</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:series">
      <p>These two posts aren’t really in any kind of series together, but I thought it would be silly to make an entire post about creating my “related posts section” and not actually show the dang thing in action <a href="#fnref:series" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="jekyll" /><category term="liquid" /><summary type="html"><![CDATA[Sometimes a blog post can get too long, and it makes sense to split it into multiple parts. For example, I recently posted about my HN Reader project, and split it into multiple posts with each one detailing different aspects of the project (frontend, API, deployment, etc). Of course, a reader might find a link to a later entry in the series and want to start from the beginning, so it would be nice to link all of the other posts in the series at the top of the page. I could do all of this manually, but as the series gets longer and longer I would have to remember to go back and update every other post in the series to add the new link. I felt that long-term this would be a pain to maintain and could lead to broken or missing links. Instead, this should be pretty easy to automate with Jekyll, which is the static site generator I use to publish my blog. Frontmatter YAML Each blog post on my website starts as a markdown file with a frontmatter header at the beginning, which is just YAML. We can add our own attribute to the post’s frontmatter to keep track of what series it belongs to. We’ll simply call the variable series and set it to whatever string we want. Then, we can set the same series value on other posts to link them together, just like the one linked at the top of this page1. --- title: "Publishing A Blog Series With Jekyll" series: jekyll --- ... rest of the post Liquid Templates Next we’ll add some template logic to the layout page for our blog posts. Jekyll uses the Liquid template language, which allows us to use conditional logic and loops right inside our markdown files. If the page has the series attribute, we’ll insert the div that contains the links to other posts in the series. After that, we just iterate through each post on the site, and add a link to any post that has a matching series value. We also skip the post if the title matches the current page’s title, since including a link to the current page would be pointless. {% if page.series %} {% for post in site.posts %} {% if post.series == page.series %} {% unless post.title == page.title %} &lt;a href="{{ post.url }}"&gt;{{ post.title }}&lt;/a&gt; {% endunless %} {% endif %} {% endfor %} {% endif %} HTML Details This works great so far, but for a long series this could add a lot of content to the top of the page before the actual post starts. Instead, I decided to hide all of the related posts within an HTML &lt;details&gt; section which is collapsed by default. &lt;details class="series"&gt; &lt;summary&gt;Related Posts&lt;/summary&gt; &lt;!-- list of posts goes here --&gt; &lt;a href="#"&gt;Another post in this series&lt;/a&gt; &lt;a href="#"&gt;Another post in this series&lt;/a&gt; &lt;/details&gt; I added some CSS styles to the section to change the font size, weight, and color. I also decided to change what the expand/collapse icon looked like since I thought the default triangle was boring. details &gt; summary { list-style-type: '[+] '; } details[open] &gt; summary { list-style-type: '[\00AC] '; } That \00AC code point represents the not sign. But other than because I think boolean algebra notation looks cool, why use this? I tried a regular - as well as an en- and em-dash, but these symbols weren’t the same length as the + symbol. As a result, the “Related Posts” text would shift left/right every time you clicked on it and that really annoyed me. There’s probably a better way to do this to ensure the symbols are the same width across different fonts too, but I couldn’t figure it out and gave up after a couple minutes. The summary::marker pseudo-selector doesn’t support the width property (or many properties in general). These two posts aren’t really in any kind of series together, but I thought it would be silly to make an entire post about creating my “related posts section” and not actually show the dang thing in action (return)]]></summary></entry><entry><title type="html">CSS Paragraph Separators</title><link href="http://caleb.software/posts/paragraph-separators.html" rel="alternate" type="text/html" title="CSS Paragraph Separators" /><published>2023-11-10T00:00:00+00:00</published><updated>2023-11-10T00:00:00+00:00</updated><id>http://caleb.software/posts/paragraph-separators</id><content type="html" xml:base="http://caleb.software/posts/paragraph-separators.html"><![CDATA[<p>Turns out it’s actually really easy to have custom paragraph separators in CSS just like you’d see in a print book, like this:</p>

<hr />

<p>All of my blog’s posts are generated from markdown files by <a href="https://jekyllrb.com/">Jekyll</a>. To insert a paragraph separator in markdown, it’s just three dashes on their own line.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>---
</code></pre></div></div>

<p>This inserts an <code class="language-plaintext highlighter-rouge">&lt;hr&gt;</code> element into the generated HTML, which by default just looks like a straight horizontal line (boring!) It’s pretty easy to add some custom styling to it via CSS - specifically, we can remove the line by removing the <code class="language-plaintext highlighter-rouge">border</code> attribute, then use the <code class="language-plaintext highlighter-rouge">content</code> attribute to make whatever characters we want appear in its place<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>.</p>

<div class="language-css highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">hr</span> <span class="p">{</span>
    <span class="nl">border</span><span class="p">:</span> <span class="nb">none</span><span class="p">;</span>
<span class="p">}</span>

<span class="nt">hr</span><span class="nd">::before</span> <span class="p">{</span>
    <span class="nl">content</span><span class="p">:</span> <span class="s2">'* * *'</span><span class="p">;</span>	
    <span class="nl">display</span><span class="p">:</span> <span class="nb">block</span><span class="p">;</span>
    <span class="nl">text-align</span><span class="p">:</span> <span class="nb">center</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I didn’t like how close together the asterisks were, but adding additional spaces in between the characters like <code class="language-plaintext highlighter-rouge">content: '*     *'</code> did nothing<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>. I tried adding some <code class="language-plaintext highlighter-rouge">&amp;nbsp</code> characters to the content, but the special characters weren’t being interpreted and the literal string <code class="language-plaintext highlighter-rouge">&amp;nbsp</code> simply appeared on the page. Finally, I figured out we can use the escaped unicode value of an nbsp to add space between the characters<sup id="fnref:3"><a href="#fn:3" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>.</p>

<div class="language-css highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">content</span><span class="o">:</span> <span class="s2">'*\a0\a0\a0\a0\a0*\a0\a0\a0\a0\a0*'</span><span class="o">;</span>
</code></pre></div></div>

<p>And voila!</p>

<hr />

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>Credit: <a href="https://stackoverflow.com/a/32146824">stack overflow</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
    <li id="fn:2">
      <p>After I wrote this, I realized I might be able to add a <code class="language-plaintext highlighter-rouge">white-space: pre</code> rule to the <code class="language-plaintext highlighter-rouge">hr::before</code> selector - this should allow the multiple spaces in a row to appear correctly. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
    <li id="fn:3">
      <p>Credit: <a href="https://stackoverflow.com/a/190412">stack overflow</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="web" /><category term="css" /><summary type="html"><![CDATA[Turns out it’s actually really easy to have custom paragraph separators in CSS just like you’d see in a print book, like this: All of my blog’s posts are generated from markdown files by Jekyll. To insert a paragraph separator in markdown, it’s just three dashes on their own line. --- This inserts an &lt;hr&gt; element into the generated HTML, which by default just looks like a straight horizontal line (boring!) It’s pretty easy to add some custom styling to it via CSS - specifically, we can remove the line by removing the border attribute, then use the content attribute to make whatever characters we want appear in its place1. hr { border: none; } hr::before { content: '* * *'; display: block; text-align: center; } I didn’t like how close together the asterisks were, but adding additional spaces in between the characters like content: '* *' did nothing2. I tried adding some &amp;nbsp characters to the content, but the special characters weren’t being interpreted and the literal string &amp;nbsp simply appeared on the page. Finally, I figured out we can use the escaped unicode value of an nbsp to add space between the characters3. content: '*\a0\a0\a0\a0\a0*\a0\a0\a0\a0\a0*'; And voila! Credit: stack overflow (return) After I wrote this, I realized I might be able to add a white-space: pre rule to the hr::before selector - this should allow the multiple spaces in a row to appear correctly. (return) Credit: stack overflow (return)]]></summary></entry><entry><title type="html">Self-Hosted Web Traffic Analytics</title><link href="http://caleb.software/posts/web-traffic-stats.html" rel="alternate" type="text/html" title="Self-Hosted Web Traffic Analytics" /><published>2022-06-29T00:00:00+00:00</published><updated>2022-06-29T00:00:00+00:00</updated><id>http://caleb.software/posts/web-traffic-stats</id><content type="html" xml:base="http://caleb.software/posts/web-traffic-stats.html"><![CDATA[<p>I like looking at visitor statistics on my personal blog, not because it really matters or affects anything… I just think it’s pretty interesting. I looked at some of the popular options like <a href="https://analytics.google.com/analytics/web/provision/#/provision">Google Analytics</a> and some others, but they didn’t quite fit what I was looking for. I decided I had a few requirements:</p>

<ol>
  <li>Must be free</li>
  <li>Should respect visitor’s privacy (no sharing data with advertisers)</li>
  <li>Minimal performance impacts</li>
</ol>

<p>Items #1 and #2 were usually mutually exclusive, except for some open source projects. Those projects generally seemed like more of a hassle to set up than what would be worth it. A lot of options also required including some extra javascript files, which I wanted to avoid too.</p>

<p>In the end, I settled on just using the logs generated by Apache web server and wrote a small ruby script to parse those log files.</p>

<h2 id="apache-logs">Apache Logs</h2>

<p>The default Apache logs include a good amount of information, but you can also configure Apache to log some additional information that’s useful for site analytics such as the <code class="language-plaintext highlighter-rouge">Referer</code><sup id="fnref:ref"><a href="#fn:ref" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> and <code class="language-plaintext highlighter-rouge">User-Agent</code> header fields. You can find the <a href="https://httpd.apache.org/docs/2.4/mod/mod_log_config.html">Apache log configuration documentation here</a>.</p>

<p>This is a snippet of my personal blog’s Apache config file with a custom log format:</p>

<div class="language-conf highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># /etc/apache2/sites-available/caleb.software.conf
</span>
&lt;<span class="n">VirtualHost</span> *:<span class="m">80</span>&gt;
        <span class="n">ServerName</span> <span class="n">caleb</span>.<span class="n">software</span>

        <span class="n">ErrorLog</span> ${<span class="n">APACHE_LOG_DIR</span>}/<span class="n">error</span>.<span class="n">log</span>

        <span class="n">LogFormat</span> <span class="s2">"%t %h %U %&gt;s %{Referer}i %{User-Agent}i"</span> <span class="n">blog</span>
        <span class="n">CustomLog</span> /<span class="n">var</span>/<span class="n">log</span>/<span class="n">site</span>/<span class="n">custom</span>.<span class="n">log</span> <span class="n">blog</span>
&lt;/<span class="n">VirtualHost</span>&gt;
</code></pre></div></div>

<p>First, we define a custom <code class="language-plaintext highlighter-rouge">LogFormat</code> named “blog” and then set the <code class="language-plaintext highlighter-rouge">CustomLog</code> file path and format. Here’s what each of the custom log flags mean:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">%t</code> is the time of the request</li>
  <li><code class="language-plaintext highlighter-rouge">%h</code> is the requester’s IP (or hostname if you turn on reverse hostname lookups)</li>
  <li><code class="language-plaintext highlighter-rouge">%U</code> is the page requested (like “/index.html”)</li>
  <li><code class="language-plaintext highlighter-rouge">%&gt;s</code> is the request status code (e.g. 200)</li>
  <li>The <code class="language-plaintext highlighter-rouge">%{xxx}i</code> are specific request header fields</li>
</ul>

<p>Make sure Apache has permission to write to the custom log file’s destination. I just created an empty file with <code class="language-plaintext highlighter-rouge">touch custom.log</code> and then <code class="language-plaintext highlighter-rouge">chown</code>‘d it.</p>

<h2 id="ruby-script">Ruby Script</h2>

<p>I wrote an accompanying ruby script to process these log files and get some insights from them. Since my blog is a static site built with <a href="https://jekyllrb.com/">Jekyll</a>, the ruby script just spits out a markdown file which gets built into the final site. The script is only about fifty lines long, and you can find it <a href="https://gist.github.com/karagenit/859f2d048215b2d572267b0546c5ab19">here on Github Gists</a>.</p>

<p>The resulting markdown file is also pretty simple, with just a couple of tables listing the most popular pages (including total views and number of unique viewers) and the most frequent referrers. Unique viewers are determined by combining the <code class="language-plaintext highlighter-rouge">User-Agent</code> string with the requester’s IP address… definitely not a perfect method, but good enough to give us a rough estimate.</p>

<h2 id="cron-job">Cron Job</h2>

<p>Finally, the ruby script is run every hour by a simple crontab entry:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0 <span class="k">*</span> <span class="k">*</span> <span class="k">*</span> <span class="k">*</span> <span class="nb">cd</span> /var/www/site <span class="p">;</span> ruby analysis.rb <span class="o">&amp;&amp;</span> jekyll build
</code></pre></div></div>

<h2 id="roadmap">Roadmap</h2>

<p>In the future, I have a few extra features I want to add to the script. It would be nice to see a chart of the number of viewers of each page over time. It would also be cool to see which posts result in the most email signups and engagement.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:ref">
      <p>The english word is actually spelled “referrer”, but because of a funny typo back in the 90s the HTTP header is named “referer”. <a href="https://en.wikipedia.org/wiki/HTTP_referer#Etymology">You can read more about it on Wikipedia.</a> <a href="#fnref:ref" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="apache" /><category term="ruby" /><summary type="html"><![CDATA[I like looking at visitor statistics on my personal blog, not because it really matters or affects anything… I just think it’s pretty interesting. I looked at some of the popular options like Google Analytics and some others, but they didn’t quite fit what I was looking for. I decided I had a few requirements:]]></summary></entry><entry><title type="html">Delayed Messages on iOS</title><link href="http://caleb.software/posts/ios-delayed-messages.html" rel="alternate" type="text/html" title="Delayed Messages on iOS" /><published>2022-04-10T00:00:00+00:00</published><updated>2022-04-10T00:00:00+00:00</updated><id>http://caleb.software/posts/ios-delayed-messages</id><content type="html" xml:base="http://caleb.software/posts/ios-delayed-messages.html"><![CDATA[<p>When I got my first iPhone in March I was surprised to find that iMessage didn’t support scheduling messages to be sent at a later time. This feature is pretty common on some other messaging platforms, and it’s especially useful when I remember I need to text my mom (who <em>always</em> has her ringer turned on) about something at 3 o’clock in the morning and I don’t want to wake her up.</p>

<p>I played around with trying to implement this with an <a href="https://support.apple.com/guide/shortcuts/welcome/ios">iOS Shortcut</a>, but it turned out to be a little trickier than I expected and I had to get a little creative. I ended up using a combination of a <strong>shortcut</strong> and an <strong>automation</strong> (also via the Shortcuts app<sup id="fnref:name"><a href="#fn:name" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>) plus an iOS Calendar to schedule messages to be sent later.</p>

<h2 id="a-new-calendar">A New Calendar</h2>

<p>First, we create a new calendar within the iOS Calendar app. I titled mine <code class="language-plaintext highlighter-rouge">Delayed Messages</code> here. We will use this calendar to store messages to be sent later, with individual calendar events storing the message text and recipient<sup id="fnref:jar"><a href="#fn:jar" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>.</p>

<p><img src="/assets/img/ios-delayed-messages/calendar.PNG" alt="A new calendar" /></p>

<p>Under the “Calendars” menu you can toggle which calendars are shown, so you can hide this new <code class="language-plaintext highlighter-rouge">Delayed Messages</code> calendar if you don’t like looking at these messages on your agenda.</p>

<h2 id="the-shortcut">The Shortcut</h2>

<p>A <strong>shortcut</strong> is a way to automatically do some tasks on iOS at the click of a button. They can save you a lot of time if you find yourself manually doing the same actions over and over again multiple times a day.</p>

<p><img src="/assets/img/ios-delayed-messages/short-home.PNG" alt="The shortcuts app" /></p>

<p>Below, you can see the structure of the shortcut. It’s pretty straightforward: first, we grab the text content of the message, pick what date we want to send the message, and we ask the user to choose a phone number from their contacts.</p>

<p><img src="/assets/img/ios-delayed-messages/short-input.PNG" alt="Reading user input" /></p>

<p>Then we create a new calendar item in our <code class="language-plaintext highlighter-rouge">Delayed Messages</code> calendar with the information. The title of the calendar is the recipient’s phone number, and the <code class="language-plaintext highlighter-rouge">Notes</code> field contains the text of the message.</p>

<p><img src="/assets/img/ios-delayed-messages/short-add.PNG" alt="Creating the calendar event" /></p>

<p>If you want to try it out yourself, <a href="https://www.icloud.com/shortcuts/8ed7f998b0d94c1bb252e3ede389e3cc">you can get the shortcut here</a> (just remember you have to create the <code class="language-plaintext highlighter-rouge">Delayed Messages</code> calendar yourself first).</p>

<h2 id="the-automation">The Automation</h2>

<p>An <strong>automation</strong> is very similar to a shortcut, as it allows you to automatically do things - the difference is that while a shortcut runs immediately when we click on it, an automation is scheduled to run at a certain interval (such as once a week).</p>

<p><img src="/assets/img/ios-delayed-messages/auto-home.PNG" alt="" /></p>

<p>Unfortunately, the most often an automation can be run is once a day at <em>one specific time</em>, so I couldn’t make it send my messages at different times throughout the day<sup id="fnref:time"><a href="#fn:time" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>. I settled on just sending all of the delayed messages at 9am, since I figured most people are awake at that point (sorry to my west coast friends!).</p>

<p>First, our automation will grab any events in our <code class="language-plaintext highlighter-rouge">Delayed Messages</code> calendar that need to be sent today.</p>

<p><img src="/assets/img/ios-delayed-messages/auto-find.PNG" alt="" /></p>

<p>Then, we loop through each of the calendar events, and send the messages to the listed recipient.</p>

<p><img src="/assets/img/ios-delayed-messages/auto-loop.PNG" alt="" /></p>

<p>Finally, we delete the calendar events, though you could skip this step.</p>

<p><img src="/assets/img/ios-delayed-messages/auto-delete.PNG" alt="" /></p>

<p>I turned off the <em>“ask before running”</em> and <em>“notify when run”</em> options just to make it a little easier. I’m also not sure how iOS automations handle cases where the automation wasn’t able to run (e.g. your phone is off/dead at 9AM on a certain day), so it might be a good idea to modify the automation’s <code class="language-plaintext highlighter-rouge">Find</code> action to fetch all calendar events at earlier dates as well to avoid a message from getting skipped (better late than never?)</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:name">
      <p>Yeah, so the “Shortcuts” <em>app</em> contains both “Shortcuts” and “Automations”… don’t blame me, I didn’t pick the names. <a href="#fnref:name" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
    <li id="fn:jar">
      <p>Someone pointed out <a href="https://datajar.app/">this handy app called DataJar</a> which you can use to store shortcut data. In hindsight it 100% would’ve been easier to use than a calendar… <a href="#fnref:jar" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
    <li id="fn:time">
      <p>It <em>is</em> possible, it just didn’t seem worth it to me: in the first shortcut, have the user input a time as well and store that in the calendar event too. Then, create multiple automations scheduled to run at different times of the day (e.g. you could have one at 9am and 5pm, or whatever suits you). <a href="#fnref:time" class="reversefootnote" role="doc-backlink">(return)</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="ios" /><summary type="html"><![CDATA[When I got my first iPhone in March I was surprised to find that iMessage didn’t support scheduling messages to be sent at a later time. This feature is pretty common on some other messaging platforms, and it’s especially useful when I remember I need to text my mom (who always has her ringer turned on) about something at 3 o’clock in the morning and I don’t want to wake her up. I played around with trying to implement this with an iOS Shortcut, but it turned out to be a little trickier than I expected and I had to get a little creative. I ended up using a combination of a shortcut and an automation (also via the Shortcuts app1) plus an iOS Calendar to schedule messages to be sent later. A New Calendar First, we create a new calendar within the iOS Calendar app. I titled mine Delayed Messages here. We will use this calendar to store messages to be sent later, with individual calendar events storing the message text and recipient2. Under the “Calendars” menu you can toggle which calendars are shown, so you can hide this new Delayed Messages calendar if you don’t like looking at these messages on your agenda. The Shortcut A shortcut is a way to automatically do some tasks on iOS at the click of a button. They can save you a lot of time if you find yourself manually doing the same actions over and over again multiple times a day. Below, you can see the structure of the shortcut. It’s pretty straightforward: first, we grab the text content of the message, pick what date we want to send the message, and we ask the user to choose a phone number from their contacts. Then we create a new calendar item in our Delayed Messages calendar with the information. The title of the calendar is the recipient’s phone number, and the Notes field contains the text of the message. If you want to try it out yourself, you can get the shortcut here (just remember you have to create the Delayed Messages calendar yourself first). The Automation An automation is very similar to a shortcut, as it allows you to automatically do things - the difference is that while a shortcut runs immediately when we click on it, an automation is scheduled to run at a certain interval (such as once a week). Unfortunately, the most often an automation can be run is once a day at one specific time, so I couldn’t make it send my messages at different times throughout the day3. I settled on just sending all of the delayed messages at 9am, since I figured most people are awake at that point (sorry to my west coast friends!). First, our automation will grab any events in our Delayed Messages calendar that need to be sent today. Then, we loop through each of the calendar events, and send the messages to the listed recipient. Finally, we delete the calendar events, though you could skip this step. I turned off the “ask before running” and “notify when run” options just to make it a little easier. I’m also not sure how iOS automations handle cases where the automation wasn’t able to run (e.g. your phone is off/dead at 9AM on a certain day), so it might be a good idea to modify the automation’s Find action to fetch all calendar events at earlier dates as well to avoid a message from getting skipped (better late than never?) Yeah, so the “Shortcuts” app contains both “Shortcuts” and “Automations”… don’t blame me, I didn’t pick the names. (return) Someone pointed out this handy app called DataJar which you can use to store shortcut data. In hindsight it 100% would’ve been easier to use than a calendar… (return) It is possible, it just didn’t seem worth it to me: in the first shortcut, have the user input a time as well and store that in the calendar event too. Then, create multiple automations scheduled to run at different times of the day (e.g. you could have one at 9am and 5pm, or whatever suits you). (return)]]></summary></entry></feed>