Robin Linacre's homepage: Probabilistic record linkage, Data Science and Data EngineeringProbabilistic record linkage, Data Deduplication, Data Science, Engineering and the Environmenthttps://www.robinlinacre.com/Letterpaths - a free library for teaching cursive writinghttps://www.robinlinacre.com/letterpaths_blog/https://www.robinlinacre.com/letterpaths_blog/Letterpaths is a free and open source library for powering educational appsSun, 19 Apr 2026 00:00:00 GMTSplink or Swim: The Benefits of Being Small Enough To Failhttps://www.robinlinacre.com/splink_or_swim/https://www.robinlinacre.com/splink_or_swim/A behind the scenes look at winning the Civil Service innovation awardTue, 24 Feb 2026 00:00:00 GMTRespectful use of AI in software development teamshttps://www.robinlinacre.com/respectful_use_of_ai/https://www.robinlinacre.com/respectful_use_of_ai/How to use LLMs in team settings without damaging team healthMon, 12 Jan 2026 00:00:00 GMTMeasuring the accuracy of record linkagehttps://www.robinlinacre.com/measuring_data_linking_accuracy/https://www.robinlinacre.com/measuring_data_linking_accuracy/Some thoughts on how to measure the accuracyTue, 18 Nov 2025 00:00:00 GMTUsing a fault tolerant trie for address matchinghttps://www.robinlinacre.com/fault_tolerant_trie/https://www.robinlinacre.com/fault_tolerant_trie/An interactive explanation of how a fault tolerant trie can be used for address matchingTue, 23 Sep 2025 00:00:00 GMTBuilding Accurate Address Matching Systemshttps://www.robinlinacre.com/address_matching/https://www.robinlinacre.com/address_matching/A bag of tricks to improve the accuracy of geocodingSun, 29 Jun 2025 00:00:00 GMTPutting Scaffolding Around Vibe Coding to Build More Complex Appshttps://www.robinlinacre.com/structured_vibe_coding/https://www.robinlinacre.com/structured_vibe_coding/How to work past the limits of LLMs to build more complex appsSun, 18 May 2025 00:00:00 GMTWhy DuckDB is my first choice for data processinghttps://www.robinlinacre.com/recommend_duckdb/https://www.robinlinacre.com/recommend_duckdb/Why DuckDB has become my go-to tool for data processing, offering simplicity, speed, and powerful features.Sun, 16 Mar 2025 00:00:00 GMTAn alternative way to think about predicted probabilities in the Fellegi Sunter modelhttps://www.robinlinacre.com/alternative_prob_random_match/https://www.robinlinacre.com/alternative_prob_random_match/A second equivalent mental model to help think about how we arrive at predicted probabilities in the Fellegi Sunter modelTue, 18 Feb 2025 00:00:00 GMTLive DuckDB WASM Splink modelhttps://www.robinlinacre.com/live_splink/https://www.robinlinacre.com/live_splink/A demo of a live Splink model in the browserMon, 03 Feb 2025 00:00:00 GMTGraph editor for illustrating clustering concepts (graph playground)https://www.robinlinacre.com/graph_playground_iframe/https://www.robinlinacre.com/graph_playground_iframe/Graph editor for illustrating clustering conceptsFri, 31 Jan 2025 00:00:00 GMTAI probably won't replace me in 2025https://www.robinlinacre.com/llms_in_2025/https://www.robinlinacre.com/llms_in_2025/My mental model of LLMs, their strengths and shortcomingsWed, 01 Jan 2025 00:00:00 GMTThe emerging impact of LLMs on my productivityhttps://www.robinlinacre.com/two_years_of_llms/https://www.robinlinacre.com/two_years_of_llms/The emerging impact of LLMs on productivitySun, 08 Dec 2024 00:00:00 GMTSplink: Transforming data linking through open source collaborationhttps://www.robinlinacre.com/transforming_data_linking_open_source/https://www.robinlinacre.com/transforming_data_linking_open_source/How a small team of MoJ analysts built data linking software that’s used by governments across the worldWed, 02 Oct 2024 00:00:00 GMTConnected components visualisationhttps://www.robinlinacre.com/connected_components/https://www.robinlinacre.com/connected_components/A visualisation of how the connected components algorithm worksSat, 21 Sep 2024 00:00:00 GMTMatch weight calculatorhttps://www.robinlinacre.com/match_weight_calculator/https://www.robinlinacre.com/match_weight_calculator/A calculator for converting between match weights, probabilities, and Bayes factorsMon, 02 Sep 2024 00:00:00 GMTSuper-fast deduplication of large datasets using Splink and DuckDBhttps://www.robinlinacre.com/fast_deduplication/https://www.robinlinacre.com/fast_deduplication/Evaluating 1 billion record comparisons to deduplicate 7 million records in two minutesThu, 18 Jan 2024 00:00:00 GMTWhy Probabilistic Linkage is More Accurate than Fuzzy Matching For Data Deduplicationhttps://www.robinlinacre.com/fellegi_sunter_accuracy/https://www.robinlinacre.com/fellegi_sunter_accuracy/How to ensure that all available information is used to make predictionsTue, 24 Oct 2023 00:00:00 GMTThoughts and questions about the short term impact of LLMs on knowledge workershttps://www.robinlinacre.com/llm_short_term_thoughts_questions/https://www.robinlinacre.com/llm_short_term_thoughts_questions/What will be the impact of LLMs on knowledge workersThu, 19 Oct 2023 00:00:00 GMTVisualising updating a priorhttps://www.robinlinacre.com/posterior_treemap/https://www.robinlinacre.com/posterior_treemap/Using treemaps to visualise updating the prior with information about a scenario in the Fellegi Sunter modelWed, 18 Oct 2023 00:00:00 GMTComputing the Fellegi Sunter modelhttps://www.robinlinacre.com/computing_fellegi_sunter/https://www.robinlinacre.com/computing_fellegi_sunter/A set of interactive, explorable explanations of the Fellegi Sunter model of probabilistic record linkage. This article shows how to compute the model from an algorithmic perspectiveMon, 02 Oct 2023 00:00:00 GMTm and u values in the Fellegi-Sunter modelhttps://www.robinlinacre.com/m_and_u_values/https://www.robinlinacre.com/m_and_u_values/Deep dive into the role and interpretation of m and u probabilities in the Fellegi-Sunter model for probabilistic linkage. Learn how these probabilities impact match weights and how to quantify the strength of evidence in favor or against a record match.Fri, 22 Sep 2023 00:00:00 GMTPartial match weightshttps://www.robinlinacre.com/partial_match_weights/https://www.robinlinacre.com/partial_match_weights/Partial match weights in the Fellegi-Sunter model. Part of an explorable, interactive introduction to probabilistic record linkage (data deduplication) theoryWed, 20 Sep 2023 00:00:00 GMTThe relationship between probabilities, match weights and Bayes factorshttps://www.robinlinacre.com/prob_bf_mw/https://www.robinlinacre.com/prob_bf_mw/Visualising the correspondence between match weights, probabilities, Bayes factors and their intuitive explanationsFri, 07 Jul 2023 00:00:00 GMTSplink and the Open Source Dividendhttps://www.robinlinacre.com/open_source_dividend/https://www.robinlinacre.com/open_source_dividend/Splink and the open source dividendThu, 09 Mar 2023 00:00:00 GMTSQL should be the default choice for data transformation logichttps://www.robinlinacre.com/recommend_sql/https://www.robinlinacre.com/recommend_sql/SQL should be the first option considered for new data engineering work. It’s robust, fast, future-proof and testable. With a bit of care, it’s clear and readable.Mon, 30 Jan 2023 00:00:00 GMTWhy parquet files are my preferred API for bulk open datahttps://www.robinlinacre.com/parquet_api/https://www.robinlinacre.com/parquet_api/Open data should be served as CORS-enabled parquet files rather than using a custom APIMon, 09 Jan 2023 00:00:00 GMTWhy don't you justhttps://www.robinlinacre.com/just/https://www.robinlinacre.com/just/The phrase 'why don't you just' is problematicFri, 11 Nov 2022 00:00:00 GMTThe Intuition Behind the Use of Expectation Maximisation to Train Record Linkage Modelshttps://www.robinlinacre.com/em_intuition/https://www.robinlinacre.com/em_intuition/An intuitive explanation for how the Expectation Maximisation algorithm is able to produce unsupervised estimates of Splink model parametersFri, 14 Oct 2022 00:00:00 GMTSplink 3: Fast, accurate and scalable linkage in Pythonhttps://www.robinlinacre.com/splink_3/https://www.robinlinacre.com/splink_3/Splink 3 now offers support for Python and AWS Athena backends, in addition to Spark. It's now easier to use, faster and more flexible, and can be used for close to real time linkage.Fri, 05 Aug 2022 00:00:00 GMTm and u probability generator with starting valueshttps://www.robinlinacre.com/m_u_generator_starting/https://www.robinlinacre.com/m_u_generator_starting/Generate m and u probabilities to input into Splink. Part of the introduction to Fellegi Sunter series.Mon, 15 Nov 2021 00:00:00 GMTAre more complex probabilistic linkage models more accurate? Part 2, unsupervised learninghttps://www.robinlinacre.com/comparing_splink_models_unsupervised/https://www.robinlinacre.com/comparing_splink_models_unsupervised/How good is Splink: Are more complex probabilistic linkage models more accurate?Fri, 05 Nov 2021 00:00:00 GMTAre more complex probabilistic linkage models more accurate? Part 1, supervised learninghttps://www.robinlinacre.com/comparing_splink_models/https://www.robinlinacre.com/comparing_splink_models/How good is Splink: Are more complex probabilistic linkage models more accurate?Mon, 01 Nov 2021 00:00:00 GMTThe Thorniest Problem of Building an Analytical Platformhttps://www.robinlinacre.com/thorniest_problem_of_analytical_platforms/https://www.robinlinacre.com/thorniest_problem_of_analytical_platforms/The Thorniest Problem of Building an Analytical Platform: Enabling collaborative development of the platform itself without losing control of complexity.Fri, 29 Oct 2021 00:00:00 GMTThe carbon impact of switiching to an electric carhttps://www.robinlinacre.com/carbon_electric_car/https://www.robinlinacre.com/carbon_electric_car/What is the comparative carbon footprint of electric cars? As an existing petrol ICE car owner, should you switch to an electric carFri, 03 Sep 2021 00:00:00 GMTm and u probability generatorhttps://www.robinlinacre.com/m_u_generator/https://www.robinlinacre.com/m_u_generator/Generate m and u probabilities to input into Splink. Part of the introduction to Fellegi Sunter series.Thu, 10 Jun 2021 00:00:00 GMTDependencies between match weightshttps://www.robinlinacre.com/match_weight_dependencies/https://www.robinlinacre.com/match_weight_dependencies/An set of interactive, explorable explanations of the Fellegi Sunter model of probabilistic record linkage. The dependencies between match weights.Thu, 10 Jun 2021 00:00:00 GMTUnderstanding match weights in the Fellegi Sunter modelhttps://www.robinlinacre.com/understanding_match_weights/https://www.robinlinacre.com/understanding_match_weights/An set of interactive, explorable explanations of the Fellegi Sunter model of probabilistic record linkage. This article discusses match weights.Sun, 23 May 2021 00:00:00 GMTVisualising the Fellegi Sunter modelhttps://www.robinlinacre.com/visualising_fellegi_sunter/https://www.robinlinacre.com/visualising_fellegi_sunter/An set of interactive, explorable explanations of the Fellegi Sunter model of probabilistic record linkage. This article presents a way of visualising how the model works.Sat, 22 May 2021 00:00:00 GMTMaths of Fellegi Sunter (old version)https://www.robinlinacre.com/archived_maths_fellegi_sunter/https://www.robinlinacre.com/archived_maths_fellegi_sunter/An set of interactive, explorable explanations of the Fellegi Sunter model of probabilistic record linkage. This article shows how to compute the modelFri, 21 May 2021 00:00:00 GMTThe mathematics of the Fellegi Sunter modelhttps://www.robinlinacre.com/maths_of_fellegi_sunter/https://www.robinlinacre.com/maths_of_fellegi_sunter/A set of interactive, explorable explanations of the Fellegi Sunter model of probabilistic record linkage. This article shows the derivation of the mathematical formulation of the modelFri, 21 May 2021 00:00:00 GMTAn Interactive Introduction to Record Linkage (Data Deduplication) in the Fellegi-Sunter frameworkhttps://www.robinlinacre.com/intro_to_probabilistic_linkage/https://www.robinlinacre.com/intro_to_probabilistic_linkage/The first in a series of interactive, explorable explanations of the Fellegi-Sunter model, providing an introduction to probabilistic record linkage (data deduplication).Thu, 20 May 2021 00:00:00 GMTThe Downfall of Command and Control Data Leadershiphttps://www.robinlinacre.com/command_control/https://www.robinlinacre.com/command_control/The Downfall of Command and Control Data Leadership - why new big bang data platforms failSat, 07 Nov 2020 00:00:00 GMTDemystifying Apache Arrowhttps://www.robinlinacre.com/demystifying_arrow/https://www.robinlinacre.com/demystifying_arrow/Demystifying Apache Arrow - some observations from a data scientist. Learning more about a tool that can filter and aggregate two billion rows on a laptop in two secondsThu, 22 Oct 2020 00:00:00 GMTBirdsong quizhttps://www.robinlinacre.com/bird_quiz/https://www.robinlinacre.com/bird_quiz/Test how good you are at identifying UK birdsong recordingsSun, 26 Apr 2020 00:00:00 GMTBirdsong recording finderhttps://www.robinlinacre.com/birdsong/https://www.robinlinacre.com/birdsong/Listen to UK birdsong using the xeno-canto APISat, 25 Apr 2020 00:00:00 GMTComparing energy usage across countrieshttps://www.robinlinacre.com/country_energy_usage/https://www.robinlinacre.com/country_energy_usage/Fri, 17 Apr 2020 00:00:00 GMTFilling the country with solar panelshttps://www.robinlinacre.com/fill_country_solar/https://www.robinlinacre.com/fill_country_solar/Fri, 17 Apr 2020 00:00:00 GMTFuzzy Matching and Deduplicating Hundreds of Millions of Records with Splinkhttps://www.robinlinacre.com/introducing_splink/https://www.robinlinacre.com/introducing_splink/Introducing Splink, a fast, accurate and scalable fuzzy record matching library that supports multiple SQL backendsThu, 16 Apr 2020 00:00:00 GMTWhy you should open source your analytical workhttps://www.robinlinacre.com/open_sourcing_analytical_work/https://www.robinlinacre.com/open_sourcing_analytical_work/Sat, 22 Feb 2020 00:00:00 GMTUnderstanding the Spark UI by example: sorting datahttps://www.robinlinacre.com/spark_sort/https://www.robinlinacre.com/spark_sort/Sun, 08 Dec 2019 00:00:00 GMTUnderstanding the Spark UI by example: the Left Joinhttps://www.robinlinacre.com/left_join/https://www.robinlinacre.com/left_join/Understanding the Spark UI by example: the Left JoinSun, 01 Dec 2019 00:00:00 GMTSpark UI SQL detailed annotatorhttps://www.robinlinacre.com/spark_explain/https://www.robinlinacre.com/spark_explain/Fri, 15 Nov 2019 00:00:00 GMTUnsupervised probabalistic data matching using the Expectation Maximisation algorithmhttps://www.robinlinacre.com/em_algorithm_interactive/https://www.robinlinacre.com/em_algorithm_interactive/Sun, 03 Nov 2019 00:00:00 GMTCarbon offsetting vs. the cost of renewable energyhttps://www.robinlinacre.com/offsetting_renewables/https://www.robinlinacre.com/offsetting_renewables/Sun, 13 Oct 2019 00:00:00 GMTInteractive blogging with Observable Notebooks and gatsby.jshttps://www.robinlinacre.com/interactive_blogging/https://www.robinlinacre.com/interactive_blogging/Fri, 11 Oct 2019 00:00:00 GMTFlight distance calculatorhttps://www.robinlinacre.com/flight_distance/https://www.robinlinacre.com/flight_distance/Simple flight distance calculator. Advert free. Export data to spreadsheet.Wed, 09 Oct 2019 00:00:00 GMTEnergy usage ready reckonerhttps://www.robinlinacre.com/energy_usage/https://www.robinlinacre.com/energy_usage/Energy usage calculator for everyday activitiesSat, 05 Oct 2019 00:00:00 GMTMy flightshttps://www.robinlinacre.com/flights/https://www.robinlinacre.com/flights/A history of my flightsSat, 05 Oct 2019 00:00:00 GMTEffective testing of analytical models using automated sense checkshttps://www.robinlinacre.com/effective_testing/https://www.robinlinacre.com/effective_testing/Effective testing of analytical models using automated sense checksMon, 26 Aug 2019 00:00:00 GMTQuestions Senior Leaders Should Ask Their Data Delivery Teamshttps://www.robinlinacre.com/questions_senior_leaders/https://www.robinlinacre.com/questions_senior_leaders/How to improve the likelihood of success whilst reducing the governance burden on teamsThu, 14 Mar 2019 00:00:00 GMTWhy I’m backing Vega-Lite as our default tool for data visualisationhttps://www.robinlinacre.com/backing_vega_lite/https://www.robinlinacre.com/backing_vega_lite/Why I’m backing Vega-Lite as our default tool for data visualisationWed, 22 Aug 2018 00:00:00 GMTTransforming analytical functions by mainstreaming data sciencehttps://www.robinlinacre.com/transforming_analytical_functions/https://www.robinlinacre.com/transforming_analytical_functions/Sat, 11 Aug 2018 00:00:00 GMTPushing the boundaries of data science with the MOJ Analytical Platformhttps://www.robinlinacre.com/pushing_boundaries_data_science_analytical_platform/https://www.robinlinacre.com/pushing_boundaries_data_science_analytical_platform/How the MoJ Analytical Platform gives analysts access to cutting edge open source toolsThu, 05 Apr 2018 00:00:00 GMT