function61.com https://function61.com/index.xml Recent content on function61.com Hugo -- gohugo.io en-us Fri, 17 Mar 2017 10:55:00 +0000 LabelCloud https://function61.com/products/labelcloud/ Fri, 17 Mar 2017 10:55:00 +0000 https://function61.com/products/labelcloud/ <h2 id="use-case">Use case</h2> <p>You need to print labels from your web-based application, whether you are in healthcare, logistics or some other sector. You have two solutions regarding the implementation:</p> <ul> <li>a) Build the label rendering and printing logic inside your application.</li> <li>b) Call some other software to do the label rendering and printing for you.</li> </ul> <p>LabelCloud is the software that does everything for you from rendering to producing the label printer specific control code and pushing that code to the printer.</p> <p>LabelCloud is available both as a cloud service or as a on-premises version where you require tighter data privacy controls or want to control availability yourself.</p> <h2 id="demo">Demo</h2> <p>Contact us for a demonstration! The below screenshot is from LabelCloud PDF renderer:</p> <p><img src="../labelcloud_screenshot.jpg" alt="" /></p> <h2 id="supported-printer-manufacturers">Supported printer manufacturers</h2> <ul> <li>Toshiba TEC</li> <li>Zebra ZPL/ZPL2 based printers</li> <li>PCL-based printers (most laser printers print in PCL, some label printers)</li> <li>Other printer manufacturers via PDF + GDI-based printing</li> <li>Other control language implementations considered on a case-by-case basis if you&rsquo;re ordering a license.</li> </ul> <h2 id="architecture-how-does-it-work">Architecture: how does it work?</h2> <p><img src="../labelcloud_flow.png" alt="" /></p> <p>LabelCloud is a webservice into which you can make a HTTP+JSON call with the text/barcode contents of the data you want to print as a label.</p> <p>Basically, you do this:</p> <pre><code>POST /labelcloud HTTP/1.1 Content-Type: application/json [ { label data to print + into which type of printer } ] </code></pre> <p>And it responds with the correct printer controlling code that can be pushed into the printer. There&rsquo;s a client-side JavaScript library for that. All this is done with HTML5 + JavaScript with no browser plugins required - thus it works in every browser.</p> <p>The solution requires you to install <a href="https://function61.com/products/nxprint/">NX Print</a> (provider for low-level access to the printer from web applications) software for the workstation, which listens on a HTTP port (not allowed to be contacted from outside of the computer), to provide access to Windows&rsquo; printers.</p> <h2 id="licensing">Licensing</h2> <p>Contact us for a quote!</p> Security https://function61.com/security/ Thu, 16 Mar 2017 22:25:00 +0200 https://function61.com/security/ <h2 id="security-policy">Security policy</h2> <p>Whenever our users use our products, there&rsquo;s an implicit trust agreement between our user and us not to screw things up. We&rsquo;d like to make this agreement explicit by promising the following:</p> <ul> <li>We do everything in our power to secure access to our systems by keeping systems up-to-date, using good authentication practices like strong, unique passwords everywhere and hardware security modules where applicable.</li> <li>Use encryption where it&rsquo;s sensible to do so (transport layer stuff like HTTPS, data encryption at rest where it&rsquo;s practical)</li> <li>Never to store your password without current best-practices hashing (we won&rsquo;t event know your password). Never to store your credit card data on our servers but use partners with <a href="https://en.wikipedia.org/wiki/Payment_Card_Industry_Data_Security_Standard">PCI compliant</a> systems.</li> <li>We will design our systems rather in a paranoid manner than a trusting manner: even datacenter-internal traffic between nodes is encrypted. LAN is not a magical safe place separated by a firewall from the outside bad world.</li> <li>We keep tabs on security researchers&rsquo; blogs and tweets.</li> <li>Any severe security incidents will be disclosed publicly (in our blog), to our customers via email and listed on this page.</li> </ul> <h2 id="vulnerability-reporting">Vulnerability reporting</h2> <p>We highly appreciate <a href="https://en.wikipedia.org/wiki/Responsible_disclosure">responsible vulnerability disclosures</a>.</p> <p>If you would like to report a vulnerability, or have any security concerns with our product, please reach out to us by email. Our <a href="https://keybase.io/joonas">PGP key is on Keybase</a>.</p> <p>For non-critical matters, we prefer that you open an issue with the appropriate product:</p> <ul> <li>For open source products, use the GitHub issues.</li> <li>For all other products, file a ticket in our support system.</li> </ul> <h2 id="security-incident-history">Security incident history</h2> <p>No security incidents. Let&rsquo;s keep it that way!</p> <p><img src="https://function61.com/images/pages/days-without-accident.png" alt="" /></p> Introduction to WAL (write-ahead logging) https://function61.com/blog/2017/introduction-to-wal-write-ahead-logging/ Thu, 23 Feb 2017 12:24:00 +0200 https://function61.com/blog/2017/introduction-to-wal-write-ahead-logging/ <p><a href="https://en.wikipedia.org/wiki/Write-ahead_logging">WAL</a> is a concept to achieve atomicity &amp; durability - usually found in filesystems and database systems.</p> <p>Both atomicity (the &ldquo;A&rdquo; in ACID) and durability (the &ldquo;D&rdquo; in ACID) are properties of <a href="https://en.wikipedia.org/wiki/ACID">ACID</a> - the set of properties used to describe database systems.</p> <h2 id="what-is-atomicity-durability-why-do-i-want-it">What is atomicity &amp; durability? Why do I want it?</h2> <p>When you add or insert data to, say an SQL database, when the database system responds with &ldquo;OK - I saved your data&rdquo;, it has to be sure that even in the face of a power loss or application crash, the updated data is still found from the file. This is known as durability.</p> <p>When a database saves data, it almost always changes more than one value. If the operation fails for some reason, we don&rsquo;t want the end result to be one value ending up as changed but the other value as unchanged. We want either all things to happen or none at all. This is known as atomicity. Imagine a banking system where a money transfer debits from your account but is never credited to the receiver. Not atomic.</p> <p>Imagine ordering something from an online shop like Amazon, paying for it and never receiving the order.</p> <p>When you ask the customer service about it, they don&rsquo;t find your order. It turns out that during the order their computer crashed and the saved order was just lost. Not durable.</p> <p>You would be pretty bummed about this, so competent tech companies use systems for saving data that guarantee atomicity &amp; durability so we don&rsquo;t suffer these problems.</p> <p>The same goes for filesystems - you don&rsquo;t want to lose all your files if your computer crashes.</p> <h2 id="aren-t-file-writes-atomic-durable-by-default">Aren&rsquo;t file writes atomic &amp; durable by default?</h2> <p>Unfortunately, no! Yes I know I previously said that filesystems are atomic &amp; durable, but those guarantees only apply at the filesystem level. The filesystem is protected against corruption, but the individual files are not. This is something that has to be achieved at the application level!</p> <p>To begin explaining this, let&rsquo;s oversimplify how a database system might write to a file. We might have a &ldquo;users&rdquo; table, stored in a file in a filesystem as <code>users.txt</code>:</p> <pre><code>id=1 username=joonas.fi registered=2017-02-01 </code></pre> <p>Now, let&rsquo;s do an insert into the database: <code>INSERT INTO users (id, username, registered) VALUES (2, &quot;hello&quot;, &quot;2017-02-23&quot;)</code></p> <p>After that write the file should look like this:</p> <pre><code>id=1 username=joonas.fi registered=2017-02-01 id=2 username=hello registered=2017-02-23 </code></pre> <p>The database system used pseudocode like this to write that change to the file:</p> <pre><code>fwrite(&quot;users.txt&quot;, &quot;id=2 username=hello registered=2017-02-23&quot;) </code></pre> <p>But, the system write call used to write to the file is not atomic (the &ldquo;A&rdquo; in ACID). Atomicity in short means that either the whole write call succeeds or the whole write call fails. And we don&rsquo;t have that, so in the face of power loss or application crash we can easily end up with this in <code>users.txt</code>:</p> <pre><code>id=1 username=joonas.fi registered=2017-02-01 id=2 userna </code></pre> <p>Or even this:</p> <pre><code>id=1 username=joonas.fi registered=2017-02-01 me=hello registered=2017-02-23 </code></pre> <p>Because nothing guarantees that the bytes written to the file are written in the order that you specified.</p> <p>Writing to a file can always fail. There is no getting around this.</p> <h2 id="but-i-ve-heard-about-fsync-it-fixes-this-right">But I&rsquo;ve heard about fsync, it fixes this right?</h2> <p>fsync alone does not fix this. <code>fsync()</code> is only a mechanism to ask the operating system to wait until all the pending writes are actually confirmed as written on the disk.</p> <p>So, without fsync have this pseudocode:</p> <pre><code>fwrite(&quot;users.txt&quot;, &quot;id=2 username=hello registered=2017-02-23&quot;) print(&quot;Changes were successfully written to users.txt&quot;) </code></pre> <p>This code is lying, since <code>fwrite()</code> can under the hood do anything it wants. Usually for performance reasons the operating system either:</p> <ul> <li>Queues the write to the disk as a background task. It will finish soon, maybe.</li> <li>Or even batches/buffers the write and waits for more writes to come so it can more efficiently write more data as a single operation.</li> </ul> <p>This is where we can fix the above code with <code>fsync()</code></p> <pre><code>fwrite(&quot;users.txt&quot;, &quot;id=2 username=hello registered=2017-02-23&quot;) fsync(&quot;users.txt&quot;) print(&quot;Changes were successfully written to users.txt&quot;) </code></pre> <p>Now the code is correct. <code>fsync()</code> tells the OS to wait until all the buffers are flushed to disk. When we receive the success message, we can be sure we&rsquo;re not being lied to.</p> <p>Okay <code>fsync()</code> solves durability, i.e. confirmed data written to disk is durable!</p> <p>But what about atomicity? What if the <code>fwrite()</code> fails while we&rsquo;re waiting for the <code>fsync()</code> to finish?</p> <p>Good question! Like we explained before, nothing still guarantees that the <code>fwrite()</code> call succeeds or fails as a whole.</p> <p>The <code>fwrite()</code> call can always land on the disk only partially when an application/operating system crashes or the power goes out. So with <code>fsync()</code> we get atomicity &amp; durability <strong>but only if it succeeds</strong>. And it is not guaranteed to succeed and thus we can still end up with corrupted data (no atomicity).</p> <p>In short, sequence of events:</p> <ol> <li><code>fwrite()</code> start</li> <li>&ndash; queue write op 1</li> <li>&ndash; queue write op 2</li> <li><code>fwrite()</code> returns</li> <li><code>fsync()</code> start</li> <li>&ndash; wait for write op 1 to finish</li> <li>&ndash; wait for write op 2 to finish</li> <li><code>fsync()</code> returns (atomicity &amp; durability boundary only achieved here)</li> </ol> <p>Remember: the power can go out in any of those states. Only in state #8 we are atomic &amp; durable. In all the other states before we will end up with corrupted data if we&rsquo;re not careful.</p> <h2 id="ok-so-how-does-wal-achieve-atomicity-durability">Ok, so how does WAL achieve atomicity &amp; durability?</h2> <p>The problem boils down to:</p> <blockquote> <p>We don&rsquo;t want incomplete data in <code>users.txt</code>. We either want the write to fail as a whole or succeed as a whole.</p> </blockquote> <p>That is human speak for wanting atomicity and durability.</p> <p>So finally, this is where the subject, write-ahead logging, comes in!</p> <p>WAL in essence: before writing the line to <code>users.txt</code>, we write first to <code>write-ahead-log.txt</code>:</p> <pre><code>wal-entry line=2 content=[id=2 username=hello registered=2017-02-23] </code></pre> <p>Sidenote: WAL&rsquo;s usually use byte offsets instead of line offsets. This is just easier to explain.</p> <p>After we have <code>fsync()</code>ed the WAL, we can start writing to <code>users.txt</code>. If during that write the power goes out and we end up with:</p> <pre><code>line 1 | id=1 username=joonas.fi registered=2017-02-01 line 2 | id=2 userna </code></pre> <p>After we restart the database system, it scans the <code>write-ahead-log.txt</code> first and notices that according to the WAL the <strong>line 2</strong> in <code>users.txt</code> is not what it should be and repairs it:</p> <pre><code>line 1 | id=1 username=joonas.fi registered=2017-02-01 line 2 | id=2 username=hello registered=2017-02-23 </code></pre> <p>It doesn&rsquo;t matter if the repair operation fails because the power goes out again during the reparing <code>fwrite()</code>, because after database restart the repair operation will start again and eventually reach valid state.</p> <p>As long as the WAL is intact, we are safe.</p> <h2 id="okay-that-s-cool-but-what-if-write-to-the-wal-fails">Okay that&rsquo;s cool, but what if write to the WAL fails?</h2> <p>Excellent question again, champ! So we know that every <code>fwrite()</code> can fail mid-write, so writing the &ldquo;repair entry&rdquo; to WAL can fail as well and no amount of <code>fsync()</code> fixes that?</p> <p>Correct. The trick is implementing the WAL in such a way that it is resilient to corruption. I.e. we need to be able to detect corrupted WAL entries - the ones that can occur before <code>fsync()</code> completes and power goes out. Corrupted WAL could look like this:</p> <pre><code>wal-entry line=2 content=[id=2 username=hello registered=2017-02-23] wal-entry line=3 content=[id=3 user </code></pre> <p>Ok, clearly that is corrupted. So when trying to write WAL entry for line 3 the <code>fsync()</code> never finished and we wound up with that.</p> <p>But because the <code>fsync()</code> didn&rsquo;t finish we didn&rsquo;t confirm to the user that the change was applied - instead the user got an error (or timeout because the power went out), and she can know for sure that the operation didn&rsquo;t go through. Or perhaps the application even automatically re-tries with a healthy server and the user never suffers from this error.</p> <p>Remember:</p> <blockquote> <p>We never acknowledge writes to database without <code>fsync()</code> to WAL.</p> <p>We never write to <code>users.txt</code> before the WAL entry is fully <code>fsync()</code>ed, because only durable &amp; atomic WAL entries can repair the tracked file.</p> </blockquote> <p>Moving on, because we didn&rsquo;t finish writing the WAL entry, the original <code>users.txt</code> was not even tried to be modified and thus is in entirely valid state. Only succesfully written WAL entries will be written to <code>users.txt</code> because only those entries can be repaired.</p> <p>Broken WAL entries are discarded (either removed or ignored) and writes to <code>users.txt</code> correctly reported to clients as either succeeding or failing as a whole atomic operation, in a durable fashion.</p> <p>Atomicity and durability. Thanks to WAL! Congratulations, you now understand one of the most fundamental aspects of a database system! :)</p> <h2 id="recapping">Recapping</h2> <p>With WAL:</p> <ul> <li>The tracked file (<code>users.txt</code>) is effectively (after possible repair) always fully atomic &amp; durable.</li> <li>The log file (<code>write-ahead-log.txt</code>) as a whole is never atomic &amp; durable. <ul> <li>Individual WAL entries though are atomic &amp; durable.</li> <li>We just need to be able to detect and deal with corrupt WAL entries.</li> </ul></li> </ul> Reverse-engineering with strace https://function61.com/blog/2017/reverse-engineering-with-strace/ Thu, 12 Jan 2017 12:00:00 +0200 https://function61.com/blog/2017/reverse-engineering-with-strace/ <p>Let&rsquo;s learn some reverse-engineering with <a href="https://strace.io/">strace</a> by inspecting how <a href="https://www.docker.com/">Docker</a> interacts with its server API (Docker uses client-server model).</p> <p><a href="https://docs.docker.com/engine/reference/api/docker_remote_api/">Docker has API documentation</a> so it&rsquo;s not like this is necessary, but this serves as a good example on how to reverse-engineer black boxes.</p> <p></p> <h2 id="install-strace">Install strace</h2> <p>First, make sure you have strace, curl and jq installed:</p> <pre><code>$ apt-get install -y strace curl jq </code></pre> <h2 id="system-calls-what-and-why">System calls? What and why?</h2> <p>strace stands for &ldquo;system trace&rdquo;. With strace we can trace system calls. System calls are calls that programs make to communicate with the Kernel.</p> <p>Kernel calls are needed for any I/O such as disk, network etc. Why are they needed? Because the only way for a program to read/write to/from disk or to/from network is to utilize disk/network drivers, and it would be stupid for each program to implement those drivers themselves. Instead, the drivers are implemented only once and they are hosted in the Kernel. Userspace programs (= &ldquo;normal programs&rdquo;) communicate with those drivers by calling functions in the Kernel (&ldquo;syscall&rdquo;).</p> <p>Why would it be stupid to implement disk/network/etc drivers ourselves? If you used to play DOS games in ye oldie (but golden) times, you probably recall that you had to <a href="http://www.flaterco.com/kb/audio/ISA/A3CONFIG_main.png">choose and configure a sound card driver</a> before you could play that game. That meant that each game had to contain sound card drivers for each type of device. That was inefficient in both time and money, as each game developer had to develop drivers for each sound card (or even video card) and also probably purchase those cards and test the drivers against them. It also slowed the pace of innovation and competition, as it took some time for new games to support newer sound cards that came to the market.</p> <p>So in short: system calls are awesome because they let your program just abstractly ask the Kernel to read a file, and not care about whether:</p> <ul> <li>The disk is a HDD, SSD, CD-ROM or Blu-ray.</li> <li>It&rsquo;s attached via SATA, IDE, USB or even as a network drive.</li> <li>The filesystem is FAT32, NTFS, ext4 etc.</li> <li>The block level on the disk is encrypted or not.</li> <li>Etc etc.</li> </ul> <h2 id="quick-introduction-to-strace">Quick introduction to strace</h2> <p>Okay, let&rsquo;s run a short program first without using strace:</p> <pre><code>$ cat /etc/os-release NAME=&quot;Ubuntu&quot; VERSION=&quot;16.04.1 LTS (Xenial Xerus)&quot; ID=ubuntu ID_LIKE=debian PRETTY_NAME=&quot;Ubuntu 16.04.1 LTS&quot; VERSION_ID=&quot;16.04&quot; HOME_URL=&quot;http://www.ubuntu.com/&quot; SUPPORT_URL=&quot;http://help.ubuntu.com/&quot; BUG_REPORT_URL=&quot;http://bugs.launchpad.net/ubuntu/&quot; UBUNTU_CODENAME=xenial </code></pre> <p><code>cat</code> is a program that prints a file to <code>stdout</code> - usually which is connected eventually to your screen.</p> <p>Now, let&rsquo;s run that very same program but with <code>strace</code>:</p> <pre><code>$ strace cat /etc/os-release execve(&quot;/bin/cat&quot;, [&quot;cat&quot;, &quot;/etc/os-release&quot;], [/* 21 vars */]) = 0 ... snipped ... open(&quot;/etc/os-release&quot;, O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=274, ...}) = 0 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb6f4c4e000 read(3, &quot;NAME=\&quot;Ubuntu\&quot;\nVERSION=\&quot;16.04.1 L&quot;..., 131072) = 274 write(1, &quot;NAME=\&quot;Ubuntu\&quot;\nVERSION=\&quot;16.04.1 L&quot;..., 274NAME=&quot;Ubuntu&quot; VERSION=&quot;16.04.1 LTS (Xenial Xerus)&quot; ID=ubuntu ID_LIKE=debian PRETTY_NAME=&quot;Ubuntu 16.04.1 LTS&quot; VERSION_ID=&quot;16.04&quot; HOME_URL=&quot;http://www.ubuntu.com/&quot; SUPPORT_URL=&quot;http://help.ubuntu.com/&quot; BUG_REPORT_URL=&quot;http://bugs.launchpad.net/ubuntu/&quot; UBUNTU_CODENAME=xenial ) = 274 read(3, &quot;&quot;, 131072) = 0 munmap(0x7fb6f4c4e000, 139264) = 0 close(3) = 0 close(1) = 0 close(2) = 0 exit_group(0) = ? +++ exited with 0 +++ </code></pre> <p>strace dumps its tracing output to <code>stderr</code>, while <code>stdout</code> is directly connected to the running program.</p> <p>That means that:</p> <ul> <li>By running <code>strace cat /etc/os-release</code> we see both the output from strace and the original program (as pictured above).</li> <li>By running <code>strace cat /etc/os-release &gt; /dev/null</code> we ignore <code>stdout</code> =&gt; we only see output from strace.</li> <li>By running <code>strace cat /etc/os-release 2&gt; /dev/null</code> we ignore <code>stderr</code> =&gt; we only see output from the original program (this is effectively useless).</li> </ul> <p>Demonstration:</p> <pre><code>$ strace cat /etc/os-release 2&gt; /dev/null NAME=&quot;Ubuntu&quot; VERSION=&quot;16.04.1 LTS (Xenial Xerus)&quot; ID=ubuntu ID_LIKE=debian PRETTY_NAME=&quot;Ubuntu 16.04.1 LTS&quot; VERSION_ID=&quot;16.04&quot; HOME_URL=&quot;http://www.ubuntu.com/&quot; SUPPORT_URL=&quot;http://help.ubuntu.com/&quot; BUG_REPORT_URL=&quot;http://bugs.launchpad.net/ubuntu/&quot; UBUNTU_CODENAME=xenial $ strace cat /etc/os-release 1&gt; /dev/null execve(&quot;/bin/cat&quot;, [&quot;cat&quot;, &quot;/etc/os-release&quot;], [/* 21 vars */]) = 0 ... snipped ... open(&quot;/etc/os-release&quot;, O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=274, ...}) = 0 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f171a572000 read(3, &quot;NAME=\&quot;Ubuntu\&quot;\nVERSION=\&quot;16.04.1 L&quot;..., 131072) = 274 write(1, &quot;NAME=\&quot;Ubuntu\&quot;\nVERSION=\&quot;16.04.1 L&quot;..., 274) = 274 read(3, &quot;&quot;, 131072) = 0 </code></pre> <p>The &ldquo;ignore stdout&rdquo; is relevant because if your original program produces much output, we might not be interested in that at all.</p> <p>So, from the above output we can see that <code>cat</code> program:</p> <ul> <li>Opens a file (<code>open</code>) <code>/etc/os-release</code> as readonly (<code>O_RDONLY</code>).</li> <li>Issues <code>read</code> calls to file descriptor #3, receiving the file contents (<code>NAME=&quot;Ubuntu&quot;\n...</code>).</li> <li>Other file descriptors would be #0 for <code>stdin</code>, #1 for <code>stdout</code>, #2 for <code>stderr</code> and 3..N are other open file descriptors.</li> <li>In this case the file that was specified in the only argument to cat ended up as fd #3.</li> <li>Immediately after reading the content, cat outputs the same content into its own <code>stdout</code>: <code>write(1, same content as what was read...)</code>.</li> </ul> <p>Okay, now that we know the basics of strace, let&rsquo;s move on.</p> <h2 id="using-strace-with-real-life-programs">Using strace with real-life programs</h2> <p>Okay, we have this docker command, that we want to reverse engineer:</p> <pre><code>$ docker service ps whoami ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR aps1k88b6rv0dcxnh6tstgjc4 whoami.1 emilevauge/whoami:latest master Running Running about an hour ago </code></pre> <p>That command interrogates the Docker API to list tasks (<code>whoami.1</code>, identified by <code>aps1k88b6rv0dcxnh6tstgjc4</code>) running for a specific service (<code>whoami</code>).</p> <p>Okay, since we know that <code>docker</code> is a client for a client-server model, therefore it accesses the server API, probably over TCP or a Unix socket. Both of those we can crudely inspect with strace.</p> <p>We know that Docker uses HTTP protocol to communicate with the API, so we&rsquo;ll search for a string <code>GET</code> (= start of most HTTP requests) to validate that assumption:</p> <pre><code>$ strace docker service ps whoami | grep GET execve(&quot;/usr/bin/docker&quot;, [&quot;docker&quot;, &quot;service&quot;, &quot;ps&quot;, &quot;whoami&quot;], [/* 21 vars */]) = 0 uname({sysname=&quot;Linux&quot;, nodename=&quot;master&quot;, ...}) = 0 brk(NULL) = 0x2360000 brk(0x23611c0) = 0x23611c0 ... snipped ... </code></pre> <p>Okay, we made our first mistake. So we grepped for <code>GET</code> but none of the output lines contain that keyword, so what&rsquo;s going on?</p> <p>Remember that strace (as each program) outputs two streams: <code>stdout</code> and <code>stderr</code>. The log output we <strong>wanted</strong> to grep from is in <code>stderr</code>, but the shell <code>|</code> operator pipes <code>stdout</code> to the next program. So we were grepping from the wrong stream. This drawing illustrates the problem:</p> <pre><code>+------------------------------+ +-------------+ +-------------------+ | +------+ |NAME=&quot;Ubuntu&quot;| +------+ | | | |stdout+----&gt;... +-----&gt;$ grep+-----&gt; | | +------+ +-------------+ +------+ | | | $ strace cat /etc/os-release | | Your screen | | +------+ +-------------+ | | | |stderr+----&gt;execve() +------------------&gt; | | +------+ |... | | | +------------------------------+ +-------------+ +-------------------+ </code></pre> <p>So, knowing that, let&rsquo;s redirect strace&rsquo;s <code>stderr</code> to <code>stdout</code> so we can grep it (<code>2&gt;&amp;1</code>):</p> <pre><code>$ strace docker service ps whoami 2&gt;&amp;1 | grep GET </code></pre> <p>Nothing, so we were expecting a write call with a string <code>GET</code>, but there was no matches. Let&rsquo;s try just listing the <code>write</code> calls:</p> <pre><code>$ strace docker service ps whoami 2&gt;&amp;1 | grep write </code></pre> <p>Nothing, that&rsquo;s weird! When running again:</p> <pre><code>$ strace docker service ps whoami 2&gt;&amp;1 | grep write write(3, &quot;GET /v1.24/services/whoami HTTP/&quot;..., 95) = 95 </code></pre> <p>Why are those <code>write</code> calls <strong>randomly</strong> showing up and sometimes not? Well, that relates to the underlying programming language used by Docker, the <a href="https://golang.org/">Go language</a>. Go takes advantage of processors with multiple cores by spawning multiple processes, and sometimes that <code>write</code> call shows up in the main process that we launched, and sometimes not.</p> <p>There&rsquo;s an option to <code>strace</code> that we can use:</p> <pre><code>-f follow forks </code></pre> <p>This means that when Go spawns child processes, strace monitors those as well. We probably should&rsquo;ve used that switch by default, so we don&rsquo;t have to take guesses about the inner workings of each program to debug from now on. Continuing with that:</p> <pre><code>$ strace -f docker service ps whoami 2&gt;&amp;1 | grep write [pid 9337] &lt;... write resumed&gt; ) = 95 [pid 9337] write(3, &quot;GET /v1.24/tasks?filters=%7B%22s&quot;..., 160 &lt;unfinished ...&gt; [pid 9337] &lt;... write resumed&gt; ) = 160 [pid 9336] write(3, &quot;GET /v1.24/services/7g1xbqf95ly6&quot;..., 114 &lt;unfinished ...&gt; [pid 9336] &lt;... write resumed&gt; ) = 114 [pid 9338] write(3, &quot;GET /v1.24/nodes/7xynabmkp3r117p&quot;..., 111 &lt;unfinished ...&gt; ... snipped ... </code></pre> <p>Observation: each outbound HTTP request had its own process (<code>-f</code> option prefixes each line with a process id). So they must take heavy use of <a href="https://www.goinggo.net/2014/01/concurrency-goroutines-and-gomaxprocs.html">Go&rsquo;s goroutines</a> for easy parallelization of computation. :)</p> <p>Now, each time we run that same command we consistently see all the writes, awesome!</p> <p>Now moving forward with our initial plan of only scanning for the <code>GET</code> strings to identify only the calls relating to outbound HTTP GET requests:</p> <pre><code>$ strace -f docker service ps whoami 2&gt;&amp;1 | grep GET [pid 9361] ioctl(2, TCGETSstrace: Process 9365 attached [pid 9361] ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 [pid 9361] ioctl(1, TCGETS, 0x7ffecdce30c0) = -1 ENOTTY (Inappropriate ioctl for device) [pid 9364] write(3, &quot;GET /v1.24/services/whoami HTTP/&quot;..., 95 &lt;unfinished ...&gt; [pid 9363] write(3, &quot;GET /v1.24/tasks?filters=%7B%22s&quot;..., 160 &lt;unfinished ...&gt; [pid 9364] write(3, &quot;GET /v1.24/services/7g1xbqf95ly6&quot;..., 114) = 114 [pid 9364] write(3, &quot;GET /v1.24/nodes/7xynabmkp3r117p&quot;..., 111) = 111 </code></pre> <p>Okay, strace seems to truncate the write command, because we&rsquo;re not seeing the write argument in its entirety - we are not seeing the entire URLs which we are interested about. There&rsquo;s a switch for that:</p> <pre><code>-s strsize limit length of print strings to STRSIZE chars (default 32) </code></pre> <p>So:</p> <pre><code>$ strace -s 64 -f docker service ps whoami 2&gt;&amp;1 | grep GET [pid 9377] ioctl(2, TCGETS &lt;unfinished ...&gt; [pid 9377] ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 [pid 9377] ioctl(1, TCGETS &lt;unfinished ...&gt; [pid 9381] write(3, &quot;GET /v1.24/services/whoami HTTP/1.1\r\nHost: docker\r\nUser-Agent: D&quot;..., 95) = 95 [pid 9381] write(3, &quot;GET /v1.24/tasks?filters=%7B%22service%22%3A%7B%227g1xbqf95ly6bt&quot;..., 160 &lt;unfinished ...&gt; [pid 9380] write(3, &quot;GET /v1.24/services/7g1xbqf95ly6btrwyglymrigt HTTP/1.1\r\nHost: do&quot;..., 114) = 114 [pid 9380] write(3, &quot;GET /v1.24/nodes/7xynabmkp3r117pj9rxhln9k9 HTTP/1.1\r\nHost: docke&quot;..., 111) = 111 # better. let's also: # - filter only to the `write` lines that contain keyword GET # - increase string length to 96 because we were not seeing the /tasks?filters=... URL fully $ strace -s 96 -f docker service ps whoami 2&gt;&amp;1 | grep write | grep GET [pid 9442] write(3, &quot;GET /v1.24/services/whoami HTTP/1.1\r\nHost: docker\r\nUser-Agent: Docker-Client/1.12.3 (linux)\r\n\r\n&quot;, 95 &lt;unfinished ...&gt; [pid 9439] write(3, &quot;GET /v1.24/tasks?filters=%7B%22service%22%3A%7B%227g1xbqf95ly6btrwyglymrigt%22%3Atrue%7D%7D HTTP&quot;..., 160 &lt;unfinished ...&gt; [pid 9442] write(3, &quot;GET /v1.24/services/7g1xbqf95ly6btrwyglymrigt HTTP/1.1\r\nHost: docker\r\nUser-Agent: Docker-Client/&quot;..., 114 &lt;unfinished ...&gt; [pid 9442] write(3, &quot;GET /v1.24/nodes/7xynabmkp3r117pj9rxhln9k9 HTTP/1.1\r\nHost: docker\r\nUser-Agent: Docker-Client/1.1&quot;..., 111 &lt;unfinished ...&gt; </code></pre> <p>Okay, we know that Docker makes a HTTP request to path <code>/v1.24/services/whoami</code>. Does it use HTTP or a Unix socket?</p> <p>Searching from the full output of strace just before the first write we find:</p> <pre><code>$ strace -s 64 -f docker service ps whoami 1&gt;/dev/null | less -S ... snip ... [pid 9401] setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0 [pid 9401] connect(3, {sa_family=AF_LOCAL, sun_path=&quot;/var/run/docker.sock&quot;}, 23) = 0 ... snip ... [pid 9401] write(3, &quot;GET /v1.24/services/whoami HTTP/1.1\r\nHost: docker\r\nUser-Agent: D&quot;..., 95 &lt;unfinished ...&gt; ... snip ... </code></pre> <p>Okay, we saw that it uses a Unix socket and its location in the filesystem is <code>/var/run/docker.sock</code>.</p> <p>Lets try connecting to Docker API ourself without the official Docker client:</p> <pre><code>$ curl --unix-socket /var/run/docker.sock http:/v1.24/services/whoami # snipped: non-pretty JSON output in one line # if you have jq installed, you can pipe JSON into it and have it pretty-printed $ curl --unix-socket /var/run/docker.sock http:/v1.24/services/whoami | jq . { &quot;ID&quot;: &quot;7g1xbqf95ly6btrwyglymrigt&quot;, &quot;Version&quot;: { &quot;Index&quot;: 808 }, &quot;CreatedAt&quot;: &quot;2017-01-11T12:37:55.171414273Z&quot;, &quot;UpdatedAt&quot;: &quot;2017-01-11T13:19:20.515932494Z&quot;, &quot;Spec&quot;: { &quot;Name&quot;: &quot;whoami&quot;, &quot;Labels&quot;: { &quot;traefik.enable&quot;: &quot;true&quot;, ... snipped } </code></pre> <p>Now, checking out the <code>/tasks?filters...</code> endpoint:</p> <pre><code>$ curl --unix-socket /var/run/docker.sock http:/v1.24/tasks?filters=%7B%22service%22%3A%7B%227g1xbqf95ly6btrwyglymrigt%22%3Atrue%7D%7D | jq . [ { &quot;ID&quot;: &quot;aps1k88b6rv0dcxnh6tstgjc4&quot;, &quot;Version&quot;: { &quot;Index&quot;: 728 }, &quot;CreatedAt&quot;: &quot;2017-01-11T12:37:55.17994955Z&quot;, &quot;UpdatedAt&quot;: &quot;2017-01-11T12:38:02.142646711Z&quot;, &quot;Spec&quot;: { &quot;ContainerSpec&quot;: { &quot;Image&quot;: &quot;emilevauge/whoami:latest&quot;, &quot;Env&quot;: [ &quot;VIRTUAL_HOST=whoami._CLUSTER_.fn61.net&quot; ] }, ... snipped ... }, &quot;ServiceID&quot;: &quot;7g1xbqf95ly6btrwyglymrigt&quot;, &quot;Slot&quot;: 1, &quot;NodeID&quot;: &quot;7xynabmkp3r117pj9rxhln9k9&quot;, &quot;Status&quot;: { &quot;Timestamp&quot;: &quot;2017-01-11T12:38:02.097532639Z&quot;, &quot;State&quot;: &quot;running&quot;, &quot;Message&quot;: &quot;started&quot;, &quot;ContainerStatus&quot;: { &quot;ContainerID&quot;: &quot;a748e970d01cf55fea3550c29f769f2ccd81f018e12b002b4cfbedb9f60a1f0c&quot;, &quot;PID&quot;: 27839 } }, &quot;DesiredState&quot;: &quot;running&quot;, ... snipped ... } ] </code></pre> <p>And <code>/nodes/7xynabmkp3r117pj9rxhln9k9</code>:</p> <pre><code>curl --unix-socket /var/run/docker.sock http:/v1.24/nodes/7xynabmkp3r117pj9rxhln9k9 | jq . { &quot;ID&quot;: &quot;7xynabmkp3r117pj9rxhln9k9&quot;, ... snipped ... &quot;Description&quot;: { &quot;Hostname&quot;: &quot;master&quot;, &quot;Platform&quot;: { &quot;Architecture&quot;: &quot;x86_64&quot;, &quot;OS&quot;: &quot;linux&quot; }, ... snipped ... } </code></pre> <p>Okay cool, so we managed to make the same requests as the Docker client made, with only the <code>curl</code> tool. HTTP is awesome! :)</p> <p>So, recapping from the output of strace we reverse-engineered the logic for getting the output of <code>$ docker service ps whoami</code>:</p> <ul> <li>Make request to <code>/services/whoami</code>.</li> <li>Find out that the service ID is <code>7g1xbqf95ly6btrwyglymrigt</code>.</li> <li>Make request to <code>/tasks?filters={&quot;service&quot;:{&quot;7g1xbqf95ly6btrwyglymrigt&quot;:true}}</code> (for clarity I <a href="https://urldecode.org/">urldecoded</a> the garbage that looked like this: <code>%7B%22service%22..</code>).</li> <li>Make request to <code>/nodes/7xynabmkp3r117pj9rxhln9k9</code> to find out the details of the node the task is running on.</li> </ul> <p>So from above HTTP requests we found the data from the JSON responses that go into the final ps output:</p> <pre><code>$ docker service ps whoami ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR aps1k88b6rv0dcxnh6tstgjc4 whoami.1 emilevauge/whoami:latest master Running Running 3 hours ago </code></pre> <ul> <li>The task ID <code>aps1k88b6rv0dcxnh6tstgjc4</code> comes from response from <code>/tasks</code> (<code>Task.ID</code>).</li> <li>Task name is combination of <code>service.Name</code> and <code>task.Slot</code> (whoami + 1 =&gt; <code>whoami.1</code>).</li> <li>Image we know from <code>/tasks</code> (<code>Task.Spec.ContainerSpec.Image</code>).</li> <li>Node name was from <code>/nodes/7xynabmkp3r117pj9rxhln9k9</code> (<code>Node.Description.Hostname</code>).</li> <li>Desired state was from <code>/tasks</code> (<code>Task.DesiredState</code>).</li> <li>Current state was from <code>/tasks</code> (<code>Task.Status.State</code>).</li> <li>Error would be from <code>/tasks</code> as well (would be <code>Task.Status.State=failed</code>, <code>Task.Status.Err=task: non-zero exit (1)</code>).</li> </ul> <h2 id="closing-words">Closing words</h2> <p>strace is awesome, as it helps you understand how a program works behind the scenes (reverse engineering), without needing access to the source code. And it doesn&rsquo;t matter whether the program you are inspecting is open or closed source, or written in Python, C++ or Go, because in the end almost everything happens via syscalls.</p> <h2 id="further-reading">Further reading</h2> <ul> <li><a href="https://jvns.ca/blog/2016/11/10/a-few-drawings-about-linux/">Julia Evans&rsquo; drawings about syscalls and file descriptors</a></li> <li><a href="https://jvns.ca/blog/2015/04/14/strace-zine/">Julia Evans&rsquo; drawings about strace</a></li> <li><a href="https://jvns.ca/categories/strace/">Julia Evans&rsquo; other writings about strace</a></li> <li><a href="https://jvns.ca/blog/2016/06/15/using-ltrace-to-debug-a-memory-leak/">There&rsquo;s also ltrace (for program-internal library traces) - again an article by Julia</a></li> </ul> Running a company-internal certificate authority https://function61.com/blog/2017/running-a-company-internal-certificate-authority/ Mon, 09 Jan 2017 18:35:00 +0200 https://function61.com/blog/2017/running-a-company-internal-certificate-authority/ <p>If you have not read my previous post on <a href="https://function61.com/blog/2017/how-do-ssl-certificates-work/">How do SSL certificates work?</a>, go read it first!</p> <p>Moving on, this post is about setting up an certificate authority for company-internal use.</p> <h2 id="company-internal-ca">&ldquo;Company-internal&rdquo; CA?</h2> <p>Most CA setups are for company-internal usage, unless you are planning on starting <a href="https://en.wikipedia.org/wiki/Certificate_authority#Providers">multi-million dollar CA business</a> by investing years and some serious cash to get through start-of-operation audits, periodical audits, setting up hardcore secure infrastructure with tight access controls, TPM modules and such. :)</p> <p>Company-internal essentially means that we use a self-signed root certificate to provide certificates only internal to our company. That means that because we don&rsquo;t have to (or want to) fill all the requirements of those above mentioned public CA&rsquo;s, we can easily use all the benefits (like client certificate authentication) of public key infrastructure to our advantage.</p> <p>We use this to provide authentication and TLS-encryption for internal (&ldquo;backoffice&rdquo;) applications.</p> <h2 id="security-implications">Security implications?</h2> <p>Even though this is for internal usage, our CA is used to authenticate into mission critical internal services that must not be compromised. Thus, the encryption keys backing the certificates must be protected really well. In addition to that, one should think of:</p> <ul> <li>Set up the root CA to be in an offline and air-gapped setup (think Raspberry Pi without network).</li> <li>Use multi-level CA (online or offline, depending on needs) architecture (&ldquo;intermediate CA&rdquo;), so intermediate CAs can be revoked.</li> <li>Implement an online certificate revocation list, possibly hosted on AWS S3.</li> <li>Have the CA&rsquo;s private key stored in a TPM (or equivalent) module.</li> </ul> <p>We don&rsquo;t currently implement the above points, but we should look into that in the future.</p> <h2 id="this-sounds-hard-why-do-it">This sounds hard - why do it?</h2> <p>Good question! Not all companies use SSL certificates - probably because it&rsquo;s hard to set up and manage. But the tooling is getting better and easier, so I hope client cert authentication is here to stay. Here are a few benefits:</p> <ul> <li><p>Once the CA is set up, <a href="https://github.com/function61/traefik-fn61/blob/master/conf/traefik.toml">configuring server apps to use it is usually pretty damn easy</a>.</p></li> <li><p>Client certificate authentication is implemented in pretty much every programming language and HTTP server library there is. You don&rsquo;t have to reinvent the authentication/authorization layers, and thus they are probably more secure than anything else you or anyone else would write.</p></li> <li><p>If you have 10 services where the user can log in and you hire a new employee, granting access does not have to mean adding the user manually to those 10 services. If those services use client cert authentication, all you have to do is issue the certificate to the user and not touch those services. Certificate authentication is effectively offline access control, as the server does not have to contact the CA to ask if this certificate has authorization to this service. The CA signs this authorization once, and the user effectively delivers this signature herself to the service.</p></li> <li><p>Since you&rsquo;re implementing authentication on top of TLS, you get guaranteed encryption for free. This means that even if your employess log in from an unsecured network from a coffee shop, your data is safe.</p></li> <li><p>Client certificate authentication protects you from keyloggers, as you don&rsquo;t have to type in a username and password. Though this fact can be diluted by the fact that if a keylogger compromises your computer, another malicious program could as well steal your private keys related to your client certs (unless you use a hardware security module).</p></li> </ul> <h2 id="how-do-i-use-the-certificates-issued-by-a-ca">How do I use the certificates issued by a CA?</h2> <p>For concrete usage, see our <a href="https://github.com/function61/traefik-fn61">loadbalancer configuration</a>.</p> <h2 id="choosing-a-toolkit-for-ca-management">Choosing a toolkit for CA management</h2> <p>Choosing a toolkit for this task was not easy. I was not fond of using OpenSSL, since it requires keeping more state than should be required: verbose and complex config files and counters which more modern toolkits statelessly implement with randomization. OpenSSL did not felt modern and as automation friendly as I had hoped.</p> <p>Ultimately, Cloudflare&rsquo;s <a href="https://github.com/cloudflare/cfssl">cfssl</a> fit this task rather nicely. It supports automation out of the box very nicely via JSON and it doesn&rsquo;t <strong>require</strong> any configuration, if all you want to do is call if via command line. And it&rsquo;s a nice bonus that I already trust Cloudflare as their long-time customer.</p> <p>However, their documentation was really lacking (thanks CoreOS for having a better tutorial!), and the compiled binaries on their official website were outdated and there was bug or feature lacking in <code>-initca</code> feature that did not respect the expiration date in the certificate signing request. I had to <a href="https://github.com/function61/certificate-authority/tree/master/cfssl-builder">statically compile the binaries myself</a>. Why statically? So they work on Alpine linux that has a different standard library.</p> <p>More alternatives and research is documented on the below mentioned repository!</p> <h2 id="how-do-i-set-up-a-ca">How do I set up a CA?</h2> <p>Since I had to take notes while setting up a CA anyway, I figured I&rsquo;d open source my research in the form of an example CA (pretty much 1:1 our production setup): <a href="https://github.com/function61/certificate-authority">github.com/function61/certificate-authority</a></p> <p>If you&rsquo;re reading this far, you are probably interested of this subject. Check out that repo to learn more!</p> How do SSL certificates work? https://function61.com/blog/2017/how-do-ssl-certificates-work/ Wed, 04 Jan 2017 22:48:00 +0200 https://function61.com/blog/2017/how-do-ssl-certificates-work/ <p>In this article I try to explain how the increasingly encrypted internet works!</p> <p></p> <p>Disclaimer: I am intentionally simplifying a few things, since this is already a complicated subject.</p> <h2 id="what-is-pki">What is PKI?</h2> <p>PKI (<a href="https://en.wikipedia.org/wiki/Public_key_infrastructure">public key infrastructure</a>) is the basis of TLS (<a href="https://en.wikipedia.org/wiki/Transport_Layer_Security">Transport Layer Security</a>, previously known as SSL, but we still speak of SSL certificates).</p> <p>PKI is basically a collection of conventions on dealing with <a href="https://en.wikipedia.org/wiki/Public-key_cryptography">public key cryptography</a> and trust relationships between public keys <strong>by using certificates</strong>. You could say that <code>PKI = public key crypto + certificates</code>.</p> <p>Okay, but what do we use PKI for?</p> <pre><code>+-------+ +------+ +-------+ | | | | | | | HTTPS | | FTPS | | IMAPS | | | | | | | +-------+ +------+ +-------+ +------------------------------+ | | | TLS | | | +------------------------------+ +------------------------------+ | | | PKI | | | +------------------------------+ </code></pre> <p>From the above diagram, you can see the relationships between these technologies:</p> <ul> <li>TLS is what makes encrypted HTTP (called HTTPS), FTP and IMAP possible.</li> <li>Therefore, TLS is not just about HTTPS.</li> <li>PKI is what makes TLS possible.</li> </ul> <p>Taken further, public key crypto is not just for PKI and PKI is not just for TLS:</p> <pre><code>+-------+ +------+ +-------+ | | | | | | | HTTPS | | FTPS | | IMAPS | | | | | | | +-------+ +------+ +-------+ +------------------------------+ +-----------+ | | | | | TLS | | VPN | | | | | +------------------------------+ +-----------+ +----------------------------------------------+ +-------+ | | | | | PKI | | SSH | | | | | +----------------------------------------------+ +-------+ +----------------------------------------------------------+ | | | Public key cryptography | | | +----------------------------------------------------------+ </code></pre> <p>Observations:</p> <ul> <li>SSH does not use PKI (because it does not use certificates).</li> <li>But PKI and SSH both use public key crypto, so they are related technologies.</li> <li>VPN also uses certificates, so it uses PKI.</li> </ul> <h2 id="what-actually-is-a-certificate">What actually is a certificate</h2> <p>So, previously we learned that HTTPS is built on TLS which is built on PKI, which means public key crypto + certificates.</p> <p>Certificates are basically just:</p> <ul> <li>A name (like www.google.com).</li> <li>Our public key.</li> <li>Issuer&rsquo;s public key (like from Comodo&rsquo;s CA).</li> <li>Issuer&rsquo;s signature (proves that issuer vouches that this certificate is legit =&gt; details like our name and public key are true).</li> </ul> <p>Also directly related to our certificate is our private key, but like the name implies it is always kept secret and is never transmitted to anybody else. But the public and private key are mathematically related and thus by having the private key we can prove that we are the owner of the public key. By knowing the public key you cannot know the private key. That essentially is the basis of public key cryptography.</p> <p>So each certificate has an issuer (almost 100 % of certs, except for root certificates which we&rsquo;ll cover soon), and only one issuer. This means that each issuer has probably issued many certificates. This means that you can think of certificates like it was a directory structure in a filesystem:</p> <pre><code>+---------+ | | | Root CA | | | +----+----+ | | +-----------------+ | | | +----&gt; Intermediate CA | | | +-------------+---+ | | +---------------+ | | | +---&gt; Server cert 1 | | | | | +---------------+ | | +---------------+ | | | +---&gt; Server cert 2 | | | | | +---------------+ | | +---------------+ | | | +---&gt; Client cert 1 | | | +---------------+ </code></pre> <p>From above picture:</p> <ul> <li>Issuer of &ldquo;Root CA&rdquo; = (nobody)</li> <li>Issuer of &ldquo;Intermediate CA&rdquo; = &ldquo;Root CA&rdquo;</li> <li>Issuer of &ldquo;Server cert 1&rdquo; = &ldquo;Intermediate CA&rdquo;</li> </ul> <p>Also from the above picture you&rsquo;ll see that there are a few types of certificates:</p> <ul> <li>Root certificates,</li> <li>Intermediate certificates,</li> <li>Server certificates and</li> <li>Client certificates.</li> </ul> <p>These are basically just a convention, and they technically don&rsquo;t differ much from each other - they all are just certificates. Here are the differences:</p> <ul> <li>Server and client certificates can not issue certificates (the issuer decides this restriction when signing a certificate), for security reasons.</li> <li>Server certificates use their hostname as certificate name to prove that you are connecting to the correct server. This means that the issuer also vouches that you own that hostname, and therefore you have to prove domain ownership to SSL cert issuers when getting server SSL certs.</li> <li>Client certificate&rsquo;s name has no meaning, as the client presents the certificate to the server during connection to prove its identity only - the client doesn&rsquo;t have a hostname (or one that has any meaning).</li> <li>Intermediate certificate is basically like a server certificate but it is given the permission to issue sub certificates.</li> <li>Root certificate is like intermediate certificate but it is the only type of certificate that nobody has issued, i.e. it does not have a parent certificate. That&rsquo;s why it is called a root certificate. These are also called self-signed certificates.</li> </ul> <h2 id="so-how-do-ssl-certificates-work-with-https-connections">So how do SSL certificates work with HTTPS connections</h2> <p>When you connect to &ldquo;www.google.com&rdquo;, the server must present a server certificate saying that its name is really &ldquo;www.google.com&rdquo;. If your browser receives a certificate for any other name (like bing.com), or something else is wrong with it (like signatures not matching or certificate having expired), your browser refuses to connect to the website and warns you that this seems dangerous.</p> <p>In addition to the name, the server must also sign the certificate with its private key to prove, that the server really is authoritative for that certificate. Remember that certificates are public knowledge, so just by downloading Google&rsquo;s certificate you cannot claim to be Google! (unless you have the private key as well)</p> <p>Okay, but can&rsquo;t anybody just make a certificate with the name &ldquo;www.google.com&rdquo; but with a different public and private key, since the browser mainly cares about the name in the certificate but the public key has no direct meaning? Yes, anybody can make such a certificate, but no trustworthy certificate authority will sign it for you (signing = issuing). Therefore your malicious certificate would only be:</p> <ul> <li>Self-signed (root certificate), but your browser only trusts a small list of trustworthy roots =&gt; no trust OR:</li> <li>Properly signed by some some malicious certificate authority <strong>but</strong> its certificate would not anchor to a trustworthy root =&gt; no trust.</li> </ul> <p>PKI boils down to this:</p> <blockquote> <p>You only trust certificates that anchor to trusted root certificates.</p> </blockquote> <p>Anchoring just means that when you walk the parent issuer path to the root, you can trust the certificate if you trust the root certificate. And the root certificates are owned by trusted certificate authorities that promise to do good job of issuing certificates only to entities that prove that they own the hostname they are asking the certificate to be issued for, and also that all other details in the certificate hold true.</p> <p>Thus, if www.google.com presents these certificates (this is called a trust chain):</p> <pre><code>www.google.com -&gt; Google Internet Authority -&gt; GeoTrust Global CA </code></pre> <p>.. your browser goes through this thought process:</p> <ul> <li>I do not trust &ldquo;www.google.com&rdquo;, but it has &ldquo;Google Internet Authority&rdquo; as an issuer.</li> <li>I do not trust &ldquo;Google Internet Authority&rdquo;, but it has &ldquo;GeoTrust Global CA&rdquo; as an issuer.</li> <li>I <strong>do trust</strong> &ldquo;GeoTrust Global CA&rdquo;!</li> <li>Therefore I know that &ldquo;www.google.com&rdquo; can be trusted, awesome!</li> </ul> <p>This can be summed up with:</p> <pre><code> +--------------+ trusts | | +-----------------------+ Your browser | +----------------+ | | | | | +--------------+ | | | | | indirect trust | | | | | | | + | +---v---------+ +---------------+ +----------------+ | | trusts | | trusts | | | GeoTrust CA +------------&gt; Google CA +------------&gt; www.google.com | | | | | | | +-------------+ +---------------+ +----------------+ </code></pre> <p>This trust chain can technically be really long, but in practice it&rsquo;s only a few levels.</p> <h2 id="tell-me-more-about-root-certificates">Tell me more about root certificates</h2> <p>So, in the previous section&rsquo;s example I told that your computer trusts &ldquo;GeoTrust Global CA&rdquo;, why is that?</p> <p>Your computer/device/whatever holds a static list of root CAs that are configured as trustworthy (= trust anchors).</p> <p>Take a look yourself, on Windows hit start menu and search for <code>Manage computer certificates</code>, go to <code>Trusted Root Certification Authorities &gt; Certificates</code> and you&rsquo;ll see this list:</p> <p><img src="../how-do-ssl-certificates-work-list-of-root-cas.png" alt="" /></p> <p>Unsurprisingly, &ldquo;GeoTrust Global CA&rdquo; was on that list. :)</p> <p>Where does that list come from, and who manages that list? Biggest software vendors have their own list of root certificates that these vendors (certificate authorities) have to apply to be included into:</p> <ul> <li><a href="https://www.mozilla.org/en-US/about/governance/policies/security-group/certs/policy/">Mozilla CA Certificate program</a> - Firefox &amp; most open source ecosystem.</li> <li><a href="https://technet.microsoft.com/library/cc751157.aspx">Microsoft Trusted Root Certificate Program</a> - most applications that run on Windows use this list.</li> <li><a href="http://www.apple.com/certificateauthority/ca_program.html">Apple Root Certificate Program</a> - Apple&rsquo;s ecosystem.</li> </ul> <p>The process is quite heavy, costly and requires third party audits initially and regularly. But it&rsquo;s a good business - SSL certs have been somewhat expensive and it&rsquo;s a huge industry.</p> <p>For PKI to work as a whole, there has to be some sort of consensus on what vendors of root CAs are included in the list. If your root cert is on Mozilla&rsquo;s list but not Microsoft&rsquo;s, nobody would buy your CA&rsquo;s issued certificates because your certificate would only work on some browsers and some devices. And different vendors take different amount of time to update the list to the end products (operating system updates, device updates, software updates etc.), so it takes quite a while even after you&rsquo;ve been accepted to the vendors&rsquo; root programs for your issued certificates to become so widespread that your customers would buy your issued certificates.</p> <p>Sidebar: I&rsquo;ve used the term &ldquo;CA&rdquo; and &ldquo;root certificate&rdquo; quite interchangably. Technically, the root certificate represents the CA (Certificate Authority) and that certificate is used to issue sub certificates. That certificate also represents the vendor that (probably sells) issues the sub certificates, so the vendor itself could also be called a CA.</p> <h2 id="can-we-trust-the-cas-isn-t-pki-broken-by-design">Can we trust the CAs? Isn&rsquo;t PKI broken by design?</h2> <p>Good questions! So yes, the whole thing breaks down if even one of the CAs gets compromised or acts knowingly maliciously. And these things <strong>do happen</strong>:</p> <ul> <li>In 2015 <a href="http://www.pcworld.com/article/2999146/encryption/google-threatens-action-against-symantec-issued-certificates-following-botched-investigation.html">Google beat up and quite rightly humiliated Symantec</a> after discovering that during testing Symantec had generated certificates posing as Google services.</li> <li>In 2012 <a href="http://www.h-online.com/security/news/item/Trustwave-issued-a-man-in-the-middle-certificate-1429982.html">Trustwave issued a man-in-the-middle certificate</a> that compromised the integrity of the whole PKI ecosystem, with Mozilla reacting with a <a href="http://www.h-online.com/security/news/item/Mozilla-takes-action-against-CAs-issuing-man-in-the-middle-certificates-1437332.html">patch denouncing any Trustwave issued sub-CA certs</a>.</li> <li>In 2011 <a href="https://www.wired.com/2011/09/diginotar-bankruptcy/">DigiNotar went bankrupt</a> for having a security incident that compromised the whole internet and having quite rightly been booted off all vendors&rsquo; root CA programs - a clear death blow to any CA business.</li> </ul> <p>So yes, if there are so many instances of misuse made public, think of all the malicious activity that has not made it to the public, and think of all the government actors, NSA being a dick etc.</p> <p>I&rsquo;m with you if you&rsquo;re thinking that this system is broken by design, it seems so weird to build a centralized system of power which we have to blindly trust!</p> <p>The good thing is that the ecosystem is being actively improved by introducing Google&rsquo;s <a href="https://www.certificate-transparency.org/">certificate transparency</a> (&ldquo;CT&rdquo;) project - essentially requiring all CA-issued certificates to be publicly disclosed, and that way if any foul play happens, it will be noticed and dealt with accordingly.</p> <p>In the centralized design of PKI we cannot prevent malicious use, but by moving 100 % to CT we can at least detect malicious activity, kill malicious/incompetent CA vendors and start trusting PKI again.</p> <p>Of course CT is a transitional project and not all CAs yet implement CT, but things will gradually improve and eventually all trusted CA certificates will either implement CT or be booted off all vendors&rsquo; lists.</p> <h2 id="how-much-do-ssl-certificates-cost">How much do SSL certificates cost?</h2> <p>Doesn&rsquo;t cost much, depending on certificate type (there are subtypes to server certificates but it&rsquo;s not that important), cheapest go for ~ 8 USD/year.</p> <p>Of course the cost if relative - for a business that&rsquo;s nothing, but for a hobby site or a personal blog that might be too much, and it&rsquo;s the reason most of the web is not encrypted yet and thus we are still using insecure connection channels.</p> <p>Of course progress is being made and this landscape is also changing by making SSL certs finally free and making the web democratic again:</p> <ul> <li><a href="https://letsencrypt.org/">Let&rsquo;s Encrypt</a> - finally making SSL certs free and accessible to everyone.</li> <li><a href="https://cloudflare.com/">Cloudflare</a> - offering even their free customers free SSL certificates.</li> </ul> <h2 id="summing-up">Summing up</h2> <p>There&rsquo;s no denying that PKI/SSL is a complicated subject, also part of the reason why not all websites are encrypted yet (this is too hard), and we didn&rsquo;t even discuss the encryption that goes into actually securing your traffic between the devices on the internet.</p> <p>Honestly, there is probably a lot more about PKI/SSL certs that I don&rsquo;t know yet and I&rsquo;ve probably made some mistakes trying to explain the mess that is this subject. But I hope you gained some insight, and if you have any comments/questions, drop them below! :)</p> Introducing Buildbot (open source) https://function61.com/blog/2016/introducing-buildbot/ Wed, 28 Dec 2016 13:51:00 +0300 https://function61.com/blog/2016/introducing-buildbot/ <p>Buildbot is a build server like Jenkins, but stateless, <strong>super lightweight</strong> and aimed for building Dockerized apps.</p> <p></p> <p>Head over to our <a href="https://github.com/function61/buildbot">GitHub repository</a> for more details!</p> Website launch https://function61.com/blog/2016/website-launch/ Fri, 23 Dec 2016 16:38:53 +0300 https://function61.com/blog/2016/website-launch/ <p>function61.com got its website launched! Hope you enjoy it, and any feedback would be appreciated.</p> <p></p> <p>Even Ron Swanson would approve of this site:</p> <p><img src="https://i.giphy.com/11QjaaofL7yP3W.gif" alt="" /></p> <p>Disclaimer: Ron Swanson might or might not approve of this site.</p> <p>Huge thanks to:</p> <ul> <li><a href="https://html5up.net/">HTML5 UP</a> for the design</li> <li><a href="https://gohugo.io">Hugo</a> for the modern static website engine</li> <li><a href="https://unsplash.com">Unsplash</a> for the imagery</li> </ul> About our values https://function61.com/about-our-values/ Thu, 22 Dec 2016 22:24:34 +0200 https://function61.com/about-our-values/ <h2 id="building-software-is-a-craft">Building software is a craft</h2> <p>Building great software is really hard - software fails all the time either to work or keep up with expectations. Most people may not realize it but software engineering is a craft with almost endless amount of things to learn. You can never make perfect software, but the real talent is trying your very best to be far above average and always learn how to improve things by having an open and critical mind.</p> <h2 id="user-experience">User experience</h2> <p>We are passionate about making the user experience so good that you&rsquo;ll like using our software. Especially in corporate settings most software tend to be boring and even hated because of poor usability. We&rsquo;re trying our hardest to change that! Try our software to see it for yourself!</p> <h2 id="security">Security</h2> <p>Concrete steps we take to respect your security and privacy:</p> <ul> <li>All of our websites are served over <b>secure channel only</b>. Unencrypted http is auto-upgraded to https.</li> <li>We use Cloudflare for DDOS protection and additional security of our and our customers&rsquo; sites.</li> <li>We closely follow the work of security researchers like <a href="https://www.troyhunt.com/">Troy Hunt</a> to keep up with the ever-changing landscape.</li> <li>We understand how vulnerabilities in <a href="https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project">OWASP top10 project</a> work and thus know how to defend against them.</li> <li>Our user accounts in all important systems are protected with strong passwords.</li> <li>We&rsquo;ve designed our login systems in such a way that we may not need your password at all (if you sign in via Facebook, Twitter, Google etc.)</li> <li>If we need to store your password, we&rsquo;ll use the current state-of-the-art <a href="https://en.wikipedia.org/wiki/Bcrypt">bcrypt</a> with an appropriate cost factor. <a href="https://nakedsecurity.sophos.com/2016/12/15/yahoo-breach-ive-closed-my-account-because-it-uses-md5-to-hash-my-password/">Even Yahoo failed this by using a weak algorithm</a>. We are not going to make the same mistakes.</li> <li>If your use case requires stronger security, our systems support two-factor authentication (<a href="https://support.google.com/accounts/answer/1066447?hl=en" target="_blank">Google Authenticator</a>)</li> <li>We <a href="https://function61.com/blog/2016/every-website-will-get-hacked-how-to-prepare-for-it/">really, really care about security</a>.</li> <li>We use USB security keys (from <a href="https://www.yubico.com/">Yubico</a>) to protect our credentials, so even if an attacker got access to our laptops, they would not gain authorization to our production servers.</li> </ul> <p>Please, we urge you to ask your other software vendors how they are working to protect your security and privacy. We as users deserve better.</p> <h2 id="trust-by-transparency">Trust by transparency</h2> <p>It&rsquo;s easy to sweep bad development practices, security issues, bad software design, uptime problems and user satisfaction under the carpet. Many companies have huge issues that their users might be unaware of.</p> <p>If a company is actively trying to make these details public, it builds trust because users know that the things we make public will be under public scrutiny and thus hurts our image if done improperly. What we&rsquo;ve already made transparent are:</p> <ul> <li>The quality of our software by publishing many open source projects. Our software architecture and code quality is free for anybody to review.</li> <li>In fact, we have open sourced our most critical security infrastructure like our <a href="https://github.com/function61/traefik-fn61">loadbalancer configuration</a> and <a href="https://github.com/function61/certificate-authority">certificate authority</a>.</li> <li>Our focus and thoughts on importance of security.</li> <li>Our promise that if we ever get hacked or anything important gets compromised, we will disclose it (<a href="https://open.buffer.com/buffer-has-been-hacked-here-is-whats-going-on/">Buffer handled their issue beautifully</a>).</li> <li><a href="https://status.function61.com/">Any issues with our services being unreachable</a>.</li> </ul> <p>In the future we&rsquo;d like to be transparent about our finances. <a href="https://buffer.com/transparency">Buffer set the gold standard for this</a> and we&rsquo;d love to do something like this in the future.</p> Consulting https://function61.com/consulting/ Thu, 22 Dec 2016 22:24:34 +0200 https://function61.com/consulting/ <h2 id="need-help-with-something">Need help with something?</h2> <p>Contact us, and let&rsquo;s discuss about solutions to your problems/projects!</p> <h2 id="our-experience">Our experience</h2> <p>Stuff we feel comfortable working with:</p> <ul> <li>Software integrations</li> <li>Most <a href="https://aws.amazon.com/">AWS services</a>. In-depth experience with EC2, SQS, S3, IAM, SNS, Lambda, DynamoDB, Route53.</li> <li>Printer drivers / printer infrastructure / label printing / barcodes</li> <li>Distributed systems (CoreOS, Docker Swarm, Consul)</li> <li>Container-based infrastructure and build systems</li> <li><a href="https://function61.com/blog/2016/every-website-will-get-hacked-how-to-prepare-for-it/">Security-oriented thinking</a></li> <li>Public key infrastructure + SSL</li> <li><a href="http://martinfowler.com/eaaDev/EventSourcing.html">EventSourcing</a> -based architecture + <a href="http://martinfowler.com/bliki/CQRS.html">CQRS</a></li> <li>JSON/REST</li> <li><a href="https://joonas.fi/2015/12/26/aforge.net-is-awesome/">Computer vision</a></li> <li>Reverse engineering (strace, Wireshark, Docker diff)</li> <li>Network protocols such as HTTP, HTTP/2, DNS, TCP/IP, UDP</li> <li>Software loadbalancers such as HAProxy and Nginx</li> <li>Laboratory information systems (healthcare)</li> </ul> NX Print https://function61.com/products/nxprint/ Thu, 22 Dec 2016 22:24:34 +0200 https://function61.com/products/nxprint/ <p>Printing infrastructure for browser-based apps.</p> <h2 id="use-cases">Use cases</h2> <ul> <li>Need to print from the browser automatically? Your use case simply does not allow for your end users to invoke the print dialog, choose a printer and possibly printer settings?</li> <li>NX Print provides you low-level access to Windows&rsquo; printing infrastructure from JavaScript.</li> <li>Print PDF-based reports/order confirmations/packing slips etc. from browser.</li> </ul> <h2 id="technical-details">Technical details</h2> <p>NX Print is built on our NX technology (an open source framework for calling native apps from JavaScript in a secure manner).</p> <h2 id="do-you-need-low-level-or-higher-level">Do you need low level or higher level?</h2> <p>If you don&rsquo;t need to access the low level printing primitives from JavaScript, and/or you&rsquo;d just rather print some labels, we have a product for that as well: <a href="https://function61.com/products/labelcloud/">LabelCloud</a>. LabelCloud is basically NX Print + extra stuff for printing labels.</p> <h2 id="documentation-integration-usage-example">Documentation + integration usage example</h2> <p>Integration is easy, with just a few lines of JavaScript. <a href="http://nx-print-docs.readthedocs.io/en/latest/readme/">Head over to documentation</a>.</p> <p>Here&rsquo;s the rough API in pseudocode:</p> <pre><code>// list printers nx.nxprint_1_0.list_printers() // print PDF nx.nxprint_1_0.print_pdf_http() // raw print nx.nxprint_1_0.print_file_http() </code></pre> <p>For additional details, contact us!</p> <h2 id="licensing">Licensing</h2> <p>While you can download NX Print freely for evaluation, you have to get a license for production use. Contact us for licensing!</p> Every website will get hacked - how to prepare for it https://function61.com/blog/2016/every-website-will-get-hacked-how-to-prepare-for-it/ Fri, 23 Sep 2016 11:29:36 +0300 https://function61.com/blog/2016/every-website-will-get-hacked-how-to-prepare-for-it/ <h2 id="every-website-will-get-hacked">Every website will get hacked?</h2> <p>Yes. It is starting to look like every website - even the major ones, <strong>will get hacked</strong>. It is not a question of &ldquo;if&rdquo;, but &ldquo;when&rdquo;.</p> <p>Just now in the news is that <a href="http://www.bbc.com/news/world-us-canada-37447016">Yahoo was compromised</a> and the result was a leak of names, usernames, passwords and other personal data of 500 million accounts - almost twice the United States population.</p> <p>Here are examples of a few widely known breached websites (up to September 2016 - this list grows constantly):</p> <p></p> <table> <thead> <tr> <th>Website</th> <th>Breached accounts, millions</th> </tr> </thead> <tbody> <tr> <td>Yahoo</td> <td>~500 M</td> </tr> <tr> <td>MySpace</td> <td>359.4 M</td> </tr> <tr> <td>LinkedIn</td> <td>164.6 M</td> </tr> <tr> <td>Adobe</td> <td>152.4 M</td> </tr> <tr> <td>VKontakte (Russian Facebook)</td> <td>93.3 M</td> </tr> <tr> <td>Dropbox</td> <td>68.6 M</td> </tr> <tr> <td>Tumblr</td> <td>65.4 M</td> </tr> <tr> <td>Ashley Madison (dating site for cheaters)</td> <td>30.8 M</td> </tr> <tr> <td>Last.fm</td> <td>37 M</td> </tr> <tr> <td>Snapchat</td> <td>4.6 M</td> </tr> <tr> <td>Trillian</td> <td>3.8 M</td> </tr> <tr> <td>Patreon</td> <td>2.3 M</td> </tr> <tr> <td>Forbes</td> <td>1 M</td> </tr> <tr> <td>Comcast</td> <td>0.6 M</td> </tr> <tr> <td>Yahoo (old breach)</td> <td>0.4 M</td> </tr> <tr> <td>Avast</td> <td>0.4 M</td> </tr> </tbody> </table> <p>This list is just the publicly known breaches, and I believe there are plenty that go unreported and/or have already happened, but are yet to be known to the public. Many breaches have a lead time (= time it takes from the breach until it becomes public knowledge) in years.</p> <p>More complete list (and sources for the claims) of breaches are found from <a href="https://haveibeenpwned.com/PwnedWebsites">haveibeenpwned.com</a>, operated by security researcher (and my personal hero) <a href="https://www.troyhunt.com/">Troy Hunt</a>.</p> <h2 id="have-my-details-been-leaked-online">Have my details been leaked online?</h2> <p><a href="https://haveibeenpwned.com/">HaveIBeenPwned</a> is a free website where you can enter your email to see if your account details have been publicly leaked online.</p> <p>I know, in this day and age entering your primary email into one more website doesn&rsquo;t sound good - but the operator of the website is as <a href="https://www.troyhunt.com/tag/have-i-been-pwned-3f/">trustworthy and as responsible as they get</a>.</p> <h2 id="different-types-of-websites-have-different-risks">Different types of websites have different risks</h2> <table> <thead> <tr> <th>Type</th> <th>Example site</th> <th>Sign-in via</th> <th>Impact of breach</th> </tr> </thead> <tbody> <tr> <td>Blog</td> <td>Tumblr</td> <td></td> <td><strong>Low</strong>: people probably don&rsquo;t have much private content in Tumblr blogs.</td> </tr> <tr> <td>Social</td> <td>Twitter</td> <td></td> <td><strong>Low</strong>: Twitter messages are public anyway (though some people use private messages as well)</td> </tr> <tr> <td>Social</td> <td>Twitter</td> <td>Yes</td> <td><strong>Severe</strong>: all your &ldquo;Sign in via Twitter&rdquo; -websites are compromised as a result.</td> </tr> <tr> <td>Social</td> <td>LinkedIn</td> <td></td> <td><strong>Low</strong>: my profile and private messages there are not so sensitive.</td> </tr> <tr> <td>Social</td> <td>Facebook</td> <td></td> <td><strong>Severe</strong>: your private messages in Facebook should probably be private.</td> </tr> <tr> <td>Social</td> <td>Facebook</td> <td>Yes</td> <td><strong>Severe</strong>: all your &ldquo;Sign in via Facebook&rdquo; -websites are compromised as a result.</td> </tr> <tr> <td>Dating</td> <td>match.com</td> <td></td> <td><strong>Severe</strong>: you probably have privacy-sensisive conversations there.</td> </tr> <tr> <td>Dating</td> <td>AshleyMadison</td> <td></td> <td><strong>Catastrophic</strong>: you being a cheater and your sexual preferences become public knowledge.</td> </tr> <tr> <td>Email</td> <td>GMail</td> <td>Yes</td> <td><strong>Catastrophic</strong>: &ldquo;Reset password&rdquo; links are sent to email. Your email gives full access to all your other services.</td> </tr> <tr> <td>Health</td> <td>E-health provider</td> <td></td> <td><strong>Catastrophic</strong>: your health data, possibly sexual history, awkward illnesses will be made public.</td> </tr> </tbody> </table> <p><strong>BUT</strong>: if you are one of the <a href="https://www.troyhunt.com/what-do-sony-and-yahoo-have-in-common/">59 % who use the same password for different sites</a>, then the impact of even a Tumblr breach could turn up to be catastrophic because the same password lets hackers into your GMail as well (and from there, to other websites).</p> <p>Note: above are generalizations. People use websites differently, for example you could use Twitter&rsquo;s private messages a lot more than I do, so you could be a lot more concerned about a breach than I would be. Or you could not have sensitive private messages in Facebook.</p> <p>You don&rsquo;t believe me about the catastrophic impact? <a href="https://www.troyhunt.com/ashley-madison-search-sites-like/">Read more</a> to learn for example how vulture-like people set up the AshleyMadison records (sexual preferences etc.) for public search and extorted money from the victims. Yes, you could argue the victims being cheaters, they deserved it - and you&rsquo;d probably be right.</p> <p>I believe it&rsquo;s only a matter of time before a huge breach reveals people&rsquo;s really intimate data like health records. <a href="https://www.troyhunt.com/when-nation-is-hacked-understanding/">Philippines voting data</a> and <a href="https://en.wikipedia.org/wiki/Office_of_Personnel_Management_data_breach">US government personnel data</a> have already been exposed. Health record data will be breached, and it will get ugly.</p> <h2 id="as-a-user-how-do-i-prepare-for-this">As a user, how do I prepare for this?</h2> <ul> <li>Super important: use different passwords for every service. Password managers help with this, like the free <a href="http://keepass.info/">Keepass</a> and others.</li> <li>Use strong passwords. Or better yet, <a href="https://xkcd.com/936/">passphrases</a>.</li> <li>Use <a href="https://www.youtube.com/watch?v=zMabEyrtPRg">multi-factor authentication</a> where possible. <a href="https://play.google.com/store/apps/details?id=com.google.android.apps.authenticator2&amp;hl=en">Google Authenticator</a> is awesome and works for non-Google websites as well!</li> <li>Know the risks (= the data might/will get public) when giving personal data to online websites.</li> <li>If your software vendor doesn&rsquo;t do things responsibly, ask them for improvements.</li> </ul> <h2 id="as-a-website-operator-how-do-we-prevent-this-and-or-prepare-for-this">As a website operator, how do we prevent this and/or prepare for this?</h2> <ul> <li>Take your and your users&rsquo; security and privacy very seriously. Your users deserve better.</li> <li>Support HTTPS. Force HTTPS. CloudFlare handles it all for you, or if you don&rsquo;t use CloudFlare <a href="https://letsencrypt.org/">SSL certs nowadays are free</a> anyway. No excuses!</li> <li>Don&rsquo;t use company-wide accounts for any sensitive systems. Have each employee have a unique account so access can be audited and privileges managed granularly. If this is painful, you are lacking automation.</li> <li>Enforce strong and unpredictable passwords for all employees that have access to critical systems. I&rsquo;ve witnessed first hand companies having systematic (= predictable) master passwords to critical production systems. Even former employees knew the new passwords because they knew the formula the passwords were generated with. A huge no-no!!</li> <li>There are many <a href="https://www.troyhunt.com/everything-you-ever-wanted-to-know/">important issues you need to be aware of</a> when building a password reset feature (which you probably need if you are handling user accounts).</li> <li>Support multi-factor authentication.</li> <li>If your team isn&rsquo;t aware of issues like SQL injection, cross-site scripting, CSRF, or are not familiar with issues documented in <a href="https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project">OWASP top ten</a>, you are not being responsible and should not be developing software for other people&rsquo;s use.</li> <li>Keep your systems up-to-date and patched. Huge issues like <a href="https://www.troyhunt.com/everything-you-need-to-know-about3/">Heartbleed</a> or <a href="https://www.troyhunt.com/everything-you-need-to-know-about2/">Shellshock</a> pop up and it&rsquo;s your job to follow the changing landscape and react to issues FAST. Even keeping up doesn&rsquo;t quarantee immunity: see <a href="https://en.wikipedia.org/wiki/Zero-day_(computing)">zero-day vulnerabilities</a>.</li> <li>Implement rate limiting for logins to prevent brute-forcing passwords online. (PBKDF2 et al. sorta fixes this too..)</li> <li>Prepare for the inevitable and protect your users&rsquo; passwords with current best practices. <a href="https://www.troyhunt.com/our-password-hashing-has-no-clothes/">Hash + salt is nearly useless nowadays</a>. Look into adaptive hashing formulas like <a href="https://en.wikipedia.org/wiki/PBKDF2">PBKDF2</a>.</li> </ul> <p>And most importantly: foster a culture of security in your team. Many companies don&rsquo;t really care, and I hope those companies go out of business once they lose their reputation - which you can only lose once.</p> <p>One example of downplaying important issues: SnapChat was warned about a security issue and they called the attack vector &ldquo;theoretical&rdquo;. Theory turned into practice and <a href="https://haveibeenpwned.com/PwnedWebsites#Snapchat">4.6 million usernames and phone numbers were exposed</a>. SnapChat deserved the hack but the users didn&rsquo;t. Be responsible!</p>