The Roycoding Bloghttps://roycoding.com/blogOccasional blog posts from roycodinghttp://www.rssboard.org/rss-specificationpython-feedgenhttps://www.gravatar.com/avatar/0b896dff2667cbfee4237e889f6d4446.jpgThe Roycoding Bloghttps://roycoding.com/blogenSat, 19 Nov 2022 22:25:11 +0000The shortest definition of embeddings?https://roycoding.com/blog/2022/embeddings.html<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <title>The shortest definition of embeddings&quest;</title> <style> .vscode-dark img[src$=\#gh-light-mode-only], .vscode-light img[src$=\#gh-dark-mode-only] { display: none; } </style> <link rel="stylesheet" href="proxy.php?url=https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/markdown.css"> <link rel="stylesheet" href="proxy.php?url=https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/highlight.css"> <style> body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', system-ui, 'Ubuntu', 'Droid Sans', sans-serif; font-size: 14px; line-height: 1.6; } </style> <style> .task-list-item { list-style-type: none; } .task-list-item-checkbox { margin-left: -20px; vertical-align: middle; } </style> </head> <body class="vscode-body vscode-light"> <h1 id="the-shortest-definition-of-embeddings">The shortest definition of embeddings?</h1> <p><strong><a href="proxy.php?url=https://roycoding.com">Roy Keyes</a></strong></p> <blockquote> <p><em>19 Nov 2022</em> - This is a post on my <a href="proxy.php?url=https://roycoding.com/blog">blog</a>. Get the <a href="proxy.php?url=https://roycoding.com/blog/rss.xml">RSS feed</a>!</p> </blockquote> <p><img src="proxy.php?url=https://roycoding.com/blog/2022/images/embeddings.jpg" alt="A Stable Diffusion generated image of an embedding space"></p> <p>Embeddings are a very important concept in deep learning and several related techniques. They are powerful and flexible and at the core of many modern machine learning systems. But, they are also tricky to describe.</p> <p>Here is my attempt to create a very succinct definition and description of embeddings.</p> <blockquote> <h2 id="embeddings-are-learned-transformations-to-make-data-more-useful">Embeddings are learned transformations to make data more useful</h2> </blockquote> <p>What does this mean?</p> <ul> <li> <p><strong>Learned</strong> means that the parameters of the transformation are determined using the data, rather than just a fixed algorithm. The parameters are optimized for some task using the data as input. E.g. teaching a small neural network to determine if words found in a text dataset are from the same sentence. The network needs to transform the input text data in a way that makes this task as easy as possible (this is basically word2vec).</p> </li> <li> <p><strong>Transformations</strong> are how data is changed from being represented in one way into another. This typically means mapping data from one vector space to another, such as taking words from a vocabulary of 10k words that are one-hot-encoded in a 10k dimensional space to a 100 dimensional &quot;dense&quot; vector space.</p> </li> <li> <p><strong>Data</strong> can be any type of information: words, sentences, images, audio, webpages, time series. Often the input data is considered a &quot;feature&quot; for machine learning.</p> </li> <li> <p><strong>Useful</strong> means that the new representation of the data contains more information in a way that makes the ultimate task easier. Useful can mean:</p> <ul> <li>Grouping similar items together or warping the representations to make existing groups more easily identifiable (e.g. linearly separable).</li> <li>Representing the data in fewer dimensions, whether to more easily encode relationships between the data or for computational efficiency.</li> <li>Being reusable for many closely related tasks.</li> </ul> </li> </ul> <p>Embeddings are incredibly useful for many data related tasks. They are especially useful for things like search (e.g. what other images are like this one?) and recommendations. More generally they are useful for working with data that is not inherently numeric, such as text, images, or audio.</p> <p>While I didn't touch on the technical aspects of creating or using embeddings, hopefully this is helpful as a conceptual overview for you.</p> <ul> <li>Note: My definition is slightly different from others, because I don't think that embeddings strictly need to be in a lower dimensional space, but that is often the best choice for the task at hand. What exactly qualifies as an embedding can get a little fuzzy when considering basic, non-learned encodings (does one-hot encoding count) or learned transformations that are only used as internal (latent) representations, etc.</li> </ul> </body> </html>[email protected] (Roy Keyes)https://roycoding.com/blog/2022/embeddings.htmlSat, 19 Nov 2022 00:00:00 +0000Why I don’t screen resumeshttps://roycoding.com/blog/2021/not_using_resumes.html<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>Why I don&apos;t screen resumes</title> <link rel="stylesheet" href="proxy.php?url=https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/markdown.css"> <link rel="stylesheet" href="proxy.php?url=https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/highlight.css"> <style> body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', system-ui, 'Ubuntu', 'Droid Sans', sans-serif; font-size: 14px; line-height: 1.6; } </style> <style> .task-list-item { list-style-type: none; } .task-list-item-checkbox { margin-left: -20px; vertical-align: middle; } </style> </head> <body class="vscode-body vscode-light"> <h1 id="why-i-dont-screen-resumes">Why I don't screen resumes</h1> <p><strong><a href="proxy.php?url=https://roycoding.com">Roy Keyes</a></strong></p> <blockquote> <p><em>2 Oct 2021</em> - This is a post on my <a href="proxy.php?url=https://roycoding.com/blog">blog</a>. Get the <a href="proxy.php?url=https://roycoding.com/blog/rss.xml">RSS feed</a>!</p> </blockquote> <p>Probably the most controversial recommendation I make about hiring for data roles is to ignore resumes. In this post I'll talk about what that really means and why (and when) I recommend that.</p> <p><img src="proxy.php?url=https://roycoding.com/blog/2021/images/resumes.jpg" alt="Resume screening meme"></p> <h2 id="too-many-damn-resumes">Too many damn resumes</h2> <p>Back around 2017 something changed. I was hiring data scientists, as I had done before, but there was one big difference: simply too many applicants! I was (again) working at a young startup with little to no name recognition, but applications were flooding in. The volume of candidates had grown incredibly in just a couple years. My previous process could not handle this new scale. In order to handle this increased volume, I implemented a resume scoring process designed to be fast and as fair as possible.</p> <p>I soon found that even this was not efficient enough to handle the growing volume. I decided to take the more radical step of ignoring resumes completely. And it seemed to work.</p> <h2 id="why-ignore-resumes">Why ignore resumes?</h2> <p>Yes I'm suggesting that you ignore the carefully crafted career summary that your job hopefuls are sending in. Despite the hours (and even money) that they've poured into creating those resumes[1], one thing can make them irrelevant: applicant volume.</p> <p>But let's talk about what this actually means. What do I mean by excessive applicant volume? If you are in a situation where your applicant numbers greatly outpace your hiring resources, resume screening is likely to be a bottleneck. For me that meant hundreds or thousands of applicants, where I was the only one who could dedicate time (and expertise) to evaluating resumes.</p> <p>Sure, you can have people glance at resumes or use keyword filtering (or worse, some sort of &quot;AI&quot; resume filter), but this is not going to give you the effective and fair screening that you want. Resumes themselves are often messy and also allow you to over-index on certain things, such as schools, past employer names, and even more insidious signals to bias against.</p> <h2 id="so-what-do-you-do-if-you-dont-look-at-resumes">So what do you do, if you don't look at resumes?</h2> <p>Resumes are nice, if quite imperfect, summaries and advertisements of a candidates skills and experience. The whole point is that a short document can convey their fit for the roles they're applying for. So what's a reasonable way to not actually look at resumes?</p> <p>If you've read my <a href="proxy.php?url=https://dshiring.com">book</a> or other <a href="proxy.php?url=https://roycoding.com/blog">blog posts</a>, you'll know that the first steps in hiring are to understand your needs, identify the tasks and skills needed, and the skills you want to evaluate. In a situation with a very high candidate volume, I recommend using a technical skills assessment as a first step screen. Typically I'll use a relatively short, automated, online screen that covers basic skills needed for the role[2].</p> <p>I give this assessment to almost all applicants, sometimes (cheating and ) briefly checking to see if the candidate was in the wrong place (e.g. machinists applying for ML roles, which I've seen before!). This assessment covers skills that would be otherwise probed in the interview process, but in this case are just moved to the first step.</p> <p>The whole point of this is to reduce the size of your candidate funnel. That tends to happen in a couple ways. Some people simply won't take a skills assessment, because either they refuse to out of principle (probably a bad idea in this industry) or because they see the assessment as too much friction for this specific role [3]. The other way is of course candidates not doing well enough on the assessment, relative to your perceived needs or simply the rest of the candidate pool.</p> <h2 id="ok-but-what-about-the-resumes">OK, but what about the resumes?</h2> <p>The skills assessment only covers a portion of the total evaluation of a candidate. You're still going to interview them by talking to them and going more in depth on their background and technical skills.</p> <p>For this process, I actually do still use people's resumes, but it's more of an additional piece of information, rather than a something to screen against. Imagine two otherwise identical candidates, the one with more time spent in your domain or using the techniques you need, as reflected in their resume, is probably the one you'd choose.</p> <p>So the headline is somewhat misleading, but flipping this process has proven useful if dealing with the candidate volumes that I've seen, especially when hiring for early career data roles in particular[4].</p> <p>Hiring strategy and process are incredibly difficult and controversial things. My main advice is to put more effort into it than you think you might need to, create a plan, and be ready to change as needed. In this case, I ended up making a somewhat radical change, but this was in response to the seemingly exponential change happening in the applicant pool. It seemed to work.</p> <p>For more on this topic, please check out my <a href="proxy.php?url=https://dshiring.com">book</a>, subscribe to my <a href="proxy.php?url=https://roycoding.com/blog">blog</a> (--&gt; the <a href="proxy.php?url=https://roycoding.com/blog/rss.xml">RSS feed</a>), and follow me on Twitter, <a href="proxy.php?url=https://twitter.com/roycoding">@roycoding</a>.</p> <hr> <p>[1] And cover letters</p> <p>[2] I plan to cover this in another blog post, otherwise, please check out the chapter in my <a href="proxy.php?url=https://dshiring.com">book</a>.</p> <p>[3] I do my best to accommodate candidates that have logistical difficulties with the assessment. You want your candidates to have as positive an experience as possible, so flexibility is always good.</p> <p>[4] In situations where you have a lower ratio of applicants to resources (e.g. more senior level roles or working with a large HR/recruiting team), looking at resumes is totally reasonable. On the other hand, having used the &quot;no resumes&quot; process, questions about how fair it is to use resumes to make initial decisions seem more tangible.</p> </body> </html>[email protected] (Roy Keyes)https://roycoding.com/blog/2021/not_using_resumes.htmlSat, 02 Oct 2021 00:00:00 +0000How I wrote my book – the technical detailshttps://roycoding.com/blog/2021/writing.html<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>How I wrote my book - the technical details</title> <link rel="stylesheet" href="proxy.php?url=https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/markdown.css"> <link rel="stylesheet" href="proxy.php?url=https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/highlight.css"> <style> body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', system-ui, 'Ubuntu', 'Droid Sans', sans-serif; font-size: 14px; line-height: 1.6; } </style> <style> .task-list-item { list-style-type: none; } .task-list-item-checkbox { margin-left: -20px; vertical-align: middle; } </style> </head> <body class="vscode-body vscode-light"> <h1 id="how-i-wrote-my-book---the-technical-details">How I wrote my book - the technical details</h1> <p><strong><a href="proxy.php?url=https://roycoding.com">Roy Keyes</a></strong></p> <blockquote> <p><em>18 Aug 2021</em> - This is a post on my <a href="proxy.php?url=https://roycoding.com/blog">blog</a>. Get the <a href="proxy.php?url=https://roycoding.com/blog/rss.xml">RSS feed</a>!</p> </blockquote> <p><img src="proxy.php?url=https://roycoding.com/blog/2021/images/writing-hand.jpg" alt="writing hand"></p> <p>I recently released a book called <a href="proxy.php?url=http://dshiring.com">Hiring Data Scientists and Machine Learning Engineers</a>. I self-published the book via a platform called <a href="proxy.php?url=https://leanpub.com">Leanpub</a>. This post is about how I actually did my writing, primarily the technical details of my setup and workflow.</p> <h2 id="lets-say-youve-decided-to-self-publish-a-book">Let's say you've decided to self-publish a book...</h2> <p>When I decided to actually write a book, my main criteria was being able to create and distribute something high quality with the least amount of effort, while retaining a lot of control over the final product. As mentioned in my early blog post that talked about my <a href="proxy.php?url=https://roycoding.com/blog/2021/book.html">motivation for writing the book</a>, I came across Leanpub because a friend had published a <a href="proxy.php?url=http://fizzbuzzbook.com/">book</a> there.</p> <h3 id="why-leanpub-in-particular">Why Leanpub in particular?</h3> <p>In the era of social media, blogs, podcasts, and streaming video creators can essentially deliver directly to their audiences. Similarly, self-publishing is not only easier than ever, but also just makes sense for a lot of people. This is the age of fast turnaround and disintermediation, which is just a fancy term for removing the middle man (and usually replacing them with a &quot;platform&quot;).</p> <p>But why Leanpub? There are a lot of choices for self-publishing a book these days. I chose Leanpub for a few specific reasons:</p> <ul> <li>It's aimed primarily at technical content (computery stuff).</li> <li>It already has a lot of good data science and machine learning-related books and courses.</li> <li>It provides several nice authoring workflows (see below).</li> <li>It pays a good royalty rate (80% to the author).</li> <li>It allows you to publish and sell in-progress works.</li> <li>It allows you to set a minimum price and allows readers to pay what they want over that.</li> <li>It allows you to create coupons (including programmatically via API).</li> </ul> <p>Overall I like how pro-author they are.</p> <h2 id="how-i-wrote-my-book">How I wrote my book</h2> <p>Leanpub has several authoring workflows available, including fully in-browser authoring, but I decided to use a their more developer-style workflow. It starts with writing the content locally in <a href="proxy.php?url=http://markua.com/">Markua</a>, Leanpub's book oriented dialect of Markdown, using git for version control, pushing changes to a Github repo, and then finally Leanpub pulling the latest version of the source and building the ebook files in several formats for you.</p> <p>I really liked writing that way, because it's similar to how I already write notes, documentation, and blog posts in my text editor (VS Code these days). Markua is close enough to vanilla Markdown that the Markdown extension I use with VS Code renders it well enough for writing. In the end you need to have Leanpub build updated ebook files to know for sure how things will look, but this is definitely not a deal breaker.</p> <h2 id="pretending-to-be-a-designer">Pretending to be a designer</h2> <p>I enjoy making graphics, and even though I am a certified amateur, I decided to try my hand at creating the cover for my book. I created the cover and other graphics using <a href="proxy.php?url=https://inkscape.org/">Inkscape</a>, which is a pretty incredible open source vector graphics editor.</p> <p>The graphics I used inside the book, which were primarily decorative, were a mix of vector graphics I created by hand, modified public domain vector graphics, and equations typeset using LaTeX within Inkscape.</p> <p>The build process Leanpub uses allows you to use SVG files directly, so instead of exporting my graphics to a bitmap format I used SVG, with the exception of the cover graphic.</p> <h2 id="writing-on-a-6-screen">Writing on a 6&quot; screen</h2> <p>At some point while writing my book, it occurred to me that I could probably write directly on my phone. This is probably viewed as masochistic by many, but it seemed like a fun thing to try &quot;because it was there&quot;.</p> <p>To get this to work I installed an app called <a href="proxy.php?url=https://termux.com/">Termux</a>, which is a terminal emulator and Linux environment for Android, available via F-Droid, an Android app store with only open source apps. Termux allows you to install and run a lot of the programs that Linux/Unix users expect to be able to run from the command line, including git.</p> <p><img src="proxy.php?url=https://roycoding.com/blog/2021/images/termux.png" alt="Termux"></p> <p>Next up I installed git and ssh via the Termux package manager:</p> <blockquote> <p>pkg install git</p> </blockquote> <blockquote> <p>pkg install openssh</p> </blockquote> <p>Once I installed git, I set up a new Github key on my phone following the <a href="proxy.php?url=https://docs.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh">usual process</a>, pulled down my book repo and was ready to start writing. Since I was writing in a Markdown dialect, I looked for a Markdown editing app, ultimately settling on an app called <a href="proxy.php?url=https://gsantner.net/project/markor.html">Markor</a>. The workflow is then identical to writing on a full-sized computer.</p> <ol> <li>Write some.</li> <li>Commit the changes in git.</li> <li>Push the changes to Github.</li> </ol> <p>As you might imagine, long-form writing on a virtual phone keyboard is not the funnest way to write. The nice thing about a phone is that they tend to handle voice typing well. While I did that some, I prefer to type when writing. Fortunately the keyboard that I normally use is very small and designed with portability in mind: the <a href="proxy.php?url=https://shop.keyboard.io/products/keyboardio-atreus">Keyboardio Atreus</a>. The Atreus is a small, split, columnar layout, mechanical keyboard. Yes, you need to be a keyboard geek to use the Atreus, but look, you've read this far for a reason.</p> <p><img src="proxy.php?url=https://roycoding.com/blog/2021/images/atreus.jpg" alt="Keyboardio Atreus" title="The Keyboardio Atreus (not my hands)"></p> <p>So aside from small edits, I did most writing on my phone with the same keyboard I used the rest of the time on my computer.</p> <h2 id="what-might-be-improved">What might be improved</h2> <p>Overall I've been happy with the choices I made and the process, but it was certainly not without room for improvement.</p> <p>It would be nice to be able to build the book completely locally, if only for the sake of portability. What if I decide I want to take my Markua source elsewhere? Would I need to convert to a completely different format?</p> <p>Another issue was sharing my work with others for editing and commentary. For people that are used to a Github centric workflow, I could share my source text with them and they could comment or make proposed changes just like they would with code. But for other people who helped me with the book, but who were not used to working with git, I would send them PDF or raw text, which was not ideal. It was not a showstopper, but it could certainly be improved.</p> <h2 id="what-does-it-all-mean">What does it all mean?</h2> <p>Writing a book can be a long, often grueling process. It was nice to be able to use the tools I'm used to and to branch off into the (practical[?], but) whimsical as well.</p> <p>If you have any questions about this process, please email/message me.</p> <p>And don't forget to check out my hiring book at <a href="proxy.php?url=http://dshiring.com">dshiring.com</a>.</p> </body> </html>[email protected] (Roy Keyes)https://roycoding.com/blog/2021/writing.htmlWed, 18 Aug 2021 00:00:00 +0000What’s hard about hiring data roleshttps://roycoding.com/blog/2021/hiring.html<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>What&apos;s hard about hiring data roles</title> <link rel="stylesheet" href="proxy.php?url=https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/markdown.css"> <link rel="stylesheet" href="proxy.php?url=https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/highlight.css"> <style> body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', system-ui, 'Ubuntu', 'Droid Sans', sans-serif; font-size: 14px; line-height: 1.6; } </style> <style> .task-list-item { list-style-type: none; } .task-list-item-checkbox { margin-left: -20px; vertical-align: middle; } </style> </head> <body class="vscode-body vscode-light"> <h1 id="whats-hard-about-hiring-data-roles">What's hard about hiring data roles</h1> <p><strong><a href="proxy.php?url=https://roycoding.com">Roy Keyes</a></strong></p> <p><em>9 Aug 2021</em> - This is a post on my <a href="proxy.php?url=https://roycoding.com/blog">blog</a>. Get the <a href="proxy.php?url=https://roycoding.com/blog/rss.xml">RSS feed</a>!</p> <p><img src="proxy.php?url=https://user-images.githubusercontent.com/747219/128797161-d5c8fb32-e39e-40aa-b454-3ded365c4786.jpg" alt="children-fishing.jpg"></p> <h2 id="hiring-is-hard-and-data-hiring-has-special-challenges">Hiring is hard and data hiring has special challenges</h2> <p>I recently released a book about hiring called <a href="proxy.php?url=http://dshiring.com">Hiring Data Scientists and Machine Learning Engineers</a>. Some of the questions I've fielded related to my book are:</p> <ul> <li>Why is hiring data roles hard?</li> <li>What makes DS and MLE hiring specifically difficult?</li> </ul> <p>These questions are at the core of why I wrote my book. Hiring in general is difficult, but hiring for data roles comes with some special challenges. In this post I will lay out my view on why data roles are specifically hard to hire for.</p> <h2 id="hype-and-hope">Hype and hope</h2> <p>Data science and machine learning have emerged in the last ten years as particularly promising approaches to a number of business problems. Organizations have more access to more data than ever and the hope that business value can be extracted from that data has reached a fever pitch.</p> <p>The hype around data science and machine learning (and something called &quot;artificial intelligence&quot;) has driven all sorts of organizations to build out data teams. The hype has also led a huge number of people to enter the field in the hopes of landing an interesting and highly compensated career (remember the &quot;<a href="proxy.php?url=https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century">sexiest job of the 21st century</a>&quot;?).</p> <p>This is still the early days of this era of data science and machine learning. While there has been some evolution of roles and titles, the field is extremely broad and there is still a lot of disagreement on what exactly data scientists, machine learning engineers, and related roles actually do. A data scientist at one company may have a very different skill set and set of duties than a data scientist at another company, despite the shared title.</p> <h2 id="hiring-amid-the-hype">Hiring amid the hype</h2> <p>The newness, hype, and lack of clarity create a number of challenges for people trying to hire for these roles.</p> <p>The fact that these fields are relatively new means that there are few really experienced data scientists and machine learning engineers, but also there are few people that are really experienced in hiring for these roles. The newness means that the easiest thing to do has been to simply use or adapt hiring strategies from more established roles, such as for software engineering. This creates challenges, as the wrong assessment strategies will lead to more false positives and false negatives in the process.</p> <p>The newness also means that finding those experienced candidates can be very difficult and ultimately very expensive when you do find them, since the supply is so low. Organizations are faced with the fact that experienced talent may simply be outside their budgets.</p> <p>The wide breadth of topics, skills, and technologies that fall under data science and machine learning means that there is often a mismatch between the open role and the candidates. It's often not clear to either side what the other brings to the hiring table, and this can often lead to confusion and frustration.</p> <blockquote> <p>&quot;I thought this was a data science role!&quot;</p> <p>&quot;It is!&quot;</p> </blockquote> <p>This lack of alignment and understanding on the basic role definitions is a common source of issues.</p> <p>The hype has led to a huge influx of people joining the field, leading to a glut of early career people applying for DS and MLE roles. This means that a hiring manager looking to fill more junior level roles is likely to be overwhelmed with the volume of applicants. A team looking to fill both junior level and senior level positions may be faced with two extremely different sourcing problems: too many and too few applicants at the same time.</p> <h2 id="overcoming-these-challenges">Overcoming these challenges</h2> <p>To hire effectively and efficiently, you need to understand these challenges and design a hiring strategy to address them. Core aspects of such a strategy are:</p> <ul> <li>Understanding your business needs and determining the specific roles that match.</li> <li>Creating a job ad that crisply describes the role you are hiring for.</li> <li>Understanding your resources, your constraints, and the market overall.</li> <li>Designing a recruiting, assessment, and interviewing process that's optimized with your resources and constraints in mind.</li> </ul> <p>You can read more in depth on these issues in my book, which is available at <a href="proxy.php?url=http://dshiring.com">dshiring.com</a>.</p> </body> </html> [email protected] (Roy Keyes)https://roycoding.com/blog/2021/hiring.htmlMon, 09 Aug 2021 00:00:00 +0000I wrote a bookhttps://roycoding.com/blog/2021/book.html<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>I wrote a book</title> <style> </style> <link rel="stylesheet" href="proxy.php?url=https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/markdown.css"> <link rel="stylesheet" href="proxy.php?url=https://cdn.jsdelivr.net/gh/Microsoft/vscode/extensions/markdown-language-features/media/highlight.css"> <style> body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe WPC', 'Segoe UI', system-ui, 'Ubuntu', 'Droid Sans', sans-serif; font-size: 14px; line-height: 1.6; } </style> <style> .task-list-item { list-style-type: none; } .task-list-item-checkbox { margin-left: -20px; vertical-align: middle; } </style> </head> <body class="vscode-body vscode-light"> <h1 id="i-wrote-a-book">I wrote a book</h1> <p><strong><a href="proxy.php?url=https://roycoding.com">Roy Keyes</a></strong></p> <p><em>14 July 2021</em> - This is a post on my <a href="proxy.php?url=https://roycoding.com/blog">blog</a>. Get the <a href="proxy.php?url=https://roycoding.com/blog/rss.xml">RSS feed</a>!</p> <h2 id="i-never-really-set-out-to-write-a-book">I never really set out to write a book.</h2> <p>I wrote a book. A book about hiring. You can buy it at <a href="proxy.php?url=http://dshiring.com">dshiring.com</a>.</p> <p>Back in 2020 we had a global pandemic. Like many people, I was laid off (along with most of my awesome team). Like some people, I decided to try to do something a little different during that time. Somehow I ended up writing a book. This blog post is about that book and how it came into existence.</p> <p>When not working in a full-time role (or infrequently, in addition to) I usually take on consulting projects. Earlier in my career those projects where mostly about helping companies implement data science projects, but more recently the projects I've worked on have centered more on strategic decisions related to data-related product and project planning, and more and more often about team planning and hiring strategy.</p> <p>In parallel with consulting, last summer I also decided to see if I could land on an idea for creating my own software business. Given my experience and interest, one of the more obvious ideas was building software to improve the process for companies hiring data scientists and machine learning practitioners.</p> <p>Hiring for data roles can be difficult. One of the main sources of difficulty is often the shear scale of the problem. Especially when hiring for more junior roles, the number of applicants you need to process can be overwhelming. I was looking for an idea to build something that would help improve that common situation.</p> <p>While investigating business ideas and also talking to clients, some of the same themes kept coming up over and over:</p> <ul> <li>How do you know which roles to hire?</li> <li>How to you find the right people?</li> <li>How can you tell who knows what they're doing?</li> <li>How do you deal with the large volumes of candidates?</li> </ul> <p>On the one hand, people were willing to pay me to answer these questions, but on the other hand, I was hearing these same questions enough, that it made sense to write them down.</p> <p>So I decided to write some <em>blog posts</em>.</p> <h2 id="what-about-the-book">What about the book?</h2> <p>My initial plan was to write some blog posts that I could then send to people who asked me questions about hiring. At the same time, I was struggling to find an product idea that really felt like it would make sense. Two things happened. The first was I came across some advice that the first &quot;product&quot; of your software business need not be actual software, but could be a newsletter, or something else that would help you grow mindshare, while you were working on your software. The other thing was that I saw Joel Grus release his latest book <a href="proxy.php?url=http://fizzbuzzbook.com">Ten Essays on FizzBuzz</a>. Unlike his last book, <a href="proxy.php?url=https://www.oreilly.com/library/view/data-science-from/9781492041122/">Data Science from Scratch in Python</a>, he decided to self publish this book on a platform called <a href="proxy.php?url=https://leanpub.com">Leanpub</a>.</p> <p>Naturally I thought, &quot;oh, maybe I should just write a book instead of boring old blog posts. How long could it possibly take?&quot;. The reader will of course know that that was one of those fantasies that your brain likes to tell itself sometimes. In reality, instead of a couple months, the book took me closer to eight months to fully complete.</p> <h2 id="self-publishing-in-the-internet-age">Self publishing in the internet age</h2> <p>When my dad was not much older than I am now, he decided to write a very niche book and self publish it. Needless to say, at that time things were much different. I remember stacks of boxes of his book sitting in the garage for years, as he slowly sold them, eventually giving many copies away to win back valuable space[1].</p> <p>Today the world works very differently. My main reason to self-publish was because I assumed (I still think correctly) that it would be much quicker and take less effort on my part. Depending on your goals, self publishing today is probably the best choice for most people who have the inclination to write a book. The barrier to entry is lower than it has ever been, just was with most media publishing (video, audio, etc).</p> <p>I ended up publishing my book via Leanpub, a publishing platform geared primarily towards computing related technical books. Leanpub has very author friendly terms (e.g. 80% of the purchase price goes to the author) and also is geared towards people like myself, who would prefer to write in something simple like Markdown[2]. I'm planning another post detailing the specific workflow I used to write my book.</p> <p>Self publishing is not without its drawbacks. The royalty structure from a publishing platform like Leanpub is much better than with a traditional publisher, but marketing the book falls almost entirely on the author's shoulders. Somewhere an economist is doing a calculation to find how these scenarios compare, but it's certainly not clear to me how the ultimate expected compensation compares.</p> <p>In the end you probably shouldn't write a book with the expectation of making a lot of money. I suspect that, overall, I will come out better monetarily than my dad's effort did, though (sorry, Dad!).</p> <h2 id="im-now-a-published-author">I'm now a published author</h2> <p>Well, sort of. Self published, anyway. But that's fine and my book probably wouldn't exist if I'd decided to try to publish with a traditional publisher, as that sounds like an order of magnitude more effort[3]. I'm not sure I would have gone through with it if I had decided that that was the only &quot;correct&quot; way to do it.</p> <p>I'm glad I wrote this book, as I ended up having a lot of great conversations while making this, including the interviews that I included in the book. It's also been great to hear feedback (and even some appreciation) for the book. I hope I've provided readers with some useful insight and ways to think about the process of hiring data science and machine learning people.</p> <p>If you haven't checked my book out yet, please take a look at <a href="proxy.php?url=http://dshiring.com">dshiring.com</a>. And follow me on Twitter <a href="proxy.php?url=https://twitter.com/roycoding">@roycoding</a>!</p> <h3 id="footnotes">Footnotes</h3> <ol> <li>On occasion a listing for his long out-of-print book will come up online with an absurdly inflated price due to unrealistic pricing algorithms.</li> <li>Leanpub uses it's own version of Markdown called Markua, which is designed for writing books.</li> <li>If you're nonetheless interested in offering me a publishing contract, my emails are open.</li> </ol> </body> </html>[email protected] (Roy Keyes)https://roycoding.com/blog/2021/book.htmlTue, 13 Jul 2021 00:00:00 +0000Notifications with Keybase: You get a puppy!https://gist.github.com/roycoding/87ec2c423c54300ead2fdf97fb1dd939[email protected] (Roy Keyes)https://gist.github.com/roycoding/87ec2c423c54300ead2fdf97fb1dd939Wed, 22 Jul 2020 00:00:00 +0000It's 2020 and I made an RSS feed for my blog (with Python)https://gist.github.com/roycoding/c4af3c1ea4ae2d9b4ce9536dab20d3da[email protected] (Roy Keyes)https://gist.github.com/roycoding/c4af3c1ea4ae2d9b4ce9536dab20d3daThu, 21 May 2020 00:00:00 +0000Multi-armed banditry in Python with slotshttps://gist.github.com/roycoding/fc430c360c87a755047185b796c10a5e[email protected] (Roy Keyes)https://gist.github.com/roycoding/fc430c360c87a755047185b796c10a5eMon, 22 Aug 2016 00:00:00 +0000A year of working remotelyhttps://gist.github.com/roycoding/59e209032abc8bd3473b[email protected] (Roy Keyes)https://gist.github.com/roycoding/59e209032abc8bd3473bMon, 29 Feb 2016 00:00:00 +00004 Core attributes of a data scientisthttps://gist.github.com/roycoding/3739c6b3b54c27ba6ca3[email protected] (Roy Keyes)https://gist.github.com/roycoding/3739c6b3b54c27ba6ca3Sun, 17 Jan 2016 00:00:00 +0000Computer generated art (in Python)http://nbviewer.ipython.org/gist/roycoding/13e70595e9c74e735c86[email protected] (Roy Keyes)http://nbviewer.ipython.org/gist/roycoding/13e70595e9c74e735c86Sun, 21 Sep 2014 00:00:00 +0000MapReduce Patterns (in Python)https://gist.github.com/roycoding/b11b650d3ed8d4e86c39[email protected] (Roy Keyes)https://gist.github.com/roycoding/b11b650d3ed8d4e86c39Wed, 17 Sep 2014 00:00:00 +0000Klackers strategy: Monte Carlo FTWhttps://gist.github.com/roycoding/6e4dc97ecba3edf5a6fb[email protected] (Roy Keyes)https://gist.github.com/roycoding/6e4dc97ecba3edf5a6fbMon, 19 May 2014 00:00:00 +0000Github Gists: Blogging for the lazyhttps://gist.github.com/roycoding/9500675[email protected] (Roy Keyes)https://gist.github.com/roycoding/9500675Tue, 11 Mar 2014 00:00:00 +0000