[{"content":"I posted a couple years ago about my adventures applying various graph visualization tools to Decision Diagrams (DD). This is an interesting problem because DDs have characteristics that don\u0026rsquo;t apply to other graphs. They are layered1 and each arc from a layer n-1 to layer n has the same direction. Nodes in a layer can be merged or split apart, with those operations generally staying within the same layer2. Sub-diagrams can even be peeled off of parent diagrams during branch-and-bound, while maintaining much of their original structure.\nAt the time, I settled on using Mermaid for automated DD rendering, but that still had a few issues. The rendering itself was nice, but it was hard to keep labels readable. Mermaid uses its own data format for graph representation. I\u0026rsquo;d rather draw graphs based on something like a Python data structure without translating that data into an intermediate format.\nSince then, I\u0026rsquo;ve found myself stepping iteratively through processes that modify DDs in various ways for a side research project3. None of the options I looked at before are quite suitable. I need to visually inspect the impacts of DD operations like reduction and restriction, and to separate specific arcs in an iterative process that I can drive interactively. This led me to experiment with Dash and, by extension, Cytoscape4.\nDash \u0026amp; Cytoscape Let\u0026rsquo;s look at the same example diagram as before using Dash. First we initialize a Python list of graph elements to display. We\u0026rsquo;ll feed this list directly into Dash\u0026rsquo;s Cytoscape layout.\nELEMENTS = [ # nodes: layer 0 {\u0026#34;data\u0026#34;: {\u0026#34;id\u0026#34;: \u0026#34;r\u0026#34;, \u0026#34;label\u0026#34;: \u0026#34;r\u0026#34;}}, # arcs: layer 0 -\u0026gt; layer 1 {\u0026#34;data\u0026#34;: {\u0026#34;source\u0026#34;: \u0026#34;r\u0026#34;, \u0026#34;target\u0026#34;: \u0026#34;1\u0026#34;}}, {\u0026#34;data\u0026#34;: {\u0026#34;source\u0026#34;: \u0026#34;r\u0026#34;, \u0026#34;target\u0026#34;: \u0026#34;2\u0026#34;}}, {\u0026#34;data\u0026#34;: {\u0026#34;source\u0026#34;: \u0026#34;r\u0026#34;, \u0026#34;target\u0026#34;: \u0026#34;3\u0026#34;}}, # nodes: layer 1 {\u0026#34;data\u0026#34;: {\u0026#34;id\u0026#34;: \u0026#34;1\u0026#34;, \u0026#34;label\u0026#34;: \u0026#34;0\u0026#34;}}, {\u0026#34;data\u0026#34;: {\u0026#34;id\u0026#34;: \u0026#34;2\u0026#34;, \u0026#34;label\u0026#34;: \u0026#34;[[1,2],4]\u0026#34;}}, {\u0026#34;data\u0026#34;: {\u0026#34;id\u0026#34;: \u0026#34;3\u0026#34;, \u0026#34;label\u0026#34;: \u0026#34;3\u0026#34;}}, # arcs: layer 1 -\u0026gt; layer 2 {\u0026#34;data\u0026#34;: {\u0026#34;source\u0026#34;: \u0026#34;2\u0026#34;, \u0026#34;target\u0026#34;: \u0026#34;4\u0026#34;}}, {\u0026#34;data\u0026#34;: {\u0026#34;source\u0026#34;: \u0026#34;2\u0026#34;, \u0026#34;target\u0026#34;: \u0026#34;5\u0026#34;}}, {\u0026#34;data\u0026#34;: {\u0026#34;source\u0026#34;: \u0026#34;3\u0026#34;, \u0026#34;target\u0026#34;: \u0026#34;6\u0026#34;}}, # nodes: layer 2 {\u0026#34;data\u0026#34;: {\u0026#34;id\u0026#34;: \u0026#34;4\u0026#34;, \u0026#34;label\u0026#34;: \u0026#34;10\u0026#34;}}, {\u0026#34;data\u0026#34;: {\u0026#34;id\u0026#34;: \u0026#34;5\u0026#34;, \u0026#34;label\u0026#34;: \u0026#34;20\u0026#34;}}, {\u0026#34;data\u0026#34;: {\u0026#34;id\u0026#34;: \u0026#34;6\u0026#34;, \u0026#34;label\u0026#34;: \u0026#34;100\u0026#34;}}, # arcs: layer 2 -\u0026gt; layer 3 {\u0026#34;data\u0026#34;: {\u0026#34;source\u0026#34;: \u0026#34;4\u0026#34;, \u0026#34;target\u0026#34;: \u0026#34;t\u0026#34;}}, {\u0026#34;data\u0026#34;: {\u0026#34;source\u0026#34;: \u0026#34;5\u0026#34;, \u0026#34;target\u0026#34;: \u0026#34;t\u0026#34;}}, {\u0026#34;data\u0026#34;: {\u0026#34;source\u0026#34;: \u0026#34;6\u0026#34;, \u0026#34;target\u0026#34;: \u0026#34;t\u0026#34;}}, # nodes: layer 3 {\u0026#34;data\u0026#34;: {\u0026#34;id\u0026#34;: \u0026#34;t\u0026#34;, \u0026#34;label\u0026#34;: \u0026#34;t\u0026#34;}}, ] Already this is pretty nice. This list is easy to generate from any graph data model. It\u0026rsquo;s just a flat list of nodes and arcs. If we need to, we can add additional information directly to the data dictionary.\n{\u0026#34;data\u0026#34;: {\u0026#34;source\u0026#34;: \u0026#34;6\u0026#34;, \u0026#34;target\u0026#34;: \u0026#34;t\u0026#34;, \u0026#34;xyzzy\u0026#34;: \u0026#34;plugh\u0026#34;}}, Next we initialize Cytoscape layouts and create a Dash app.\nimport dash_cytoscape as cyto from dash import Dash cyto.load_extra_layouts() app = Dash() The app is responsible for serving a web page containing our Cytoscape layout. We can add more data and layouts, interactive elements such as buttons, and logic through callbacks to the app as well5.\nNow we just add a Cytoscape layout to the app and run it. Note the dagre layout name renders the diagram top down, while klay renders it left to right.\napp.layout = cyto.Cytoscape( layout={ \u0026#34;name\u0026#34;: \u0026#34;dagre\u0026#34; }, elements=ELEMENTS, ) app.run() That\u0026rsquo;s it! So what does our beautiful diagram look like?\nStyling Wait, that\u0026rsquo;s not very good, is it? If anything, it\u0026rsquo;s at least as bad as any of the other options, right?\nAt this point, yes, but one of the qualities that separates Cytoscape from other graph visualization options is its capacity for element styling. Let\u0026rsquo;s improve on this visualization by adding some styles.\nSTYLES = [ { \u0026#34;selector\u0026#34;: \u0026#34;edge\u0026#34;, \u0026#34;style\u0026#34;: { \u0026#34;curve-style\u0026#34;: \u0026#34;bezier\u0026#34;, \u0026#34;target-arrow-shape\u0026#34;: \u0026#34;triangle\u0026#34;, }, }, { \u0026#34;selector\u0026#34;: \u0026#34;node\u0026#34;, \u0026#34;style\u0026#34;: { \u0026#34;shape\u0026#34;: \u0026#34;rectangle\u0026#34;, \u0026#34;width\u0026#34;: \u0026#34;label\u0026#34;, \u0026#34;height\u0026#34;: \u0026#34;label\u0026#34;, \u0026#34;content\u0026#34;: \u0026#34;data(label)\u0026#34;, \u0026#34;font-family\u0026#34;: \u0026#34;monospace\u0026#34;, \u0026#34;text-valign\u0026#34;: \u0026#34;center\u0026#34;, \u0026#34;padding\u0026#34;: \u0026#34;10px\u0026#34;, }, }, ] app.layout = cyto.Cytoscape( layout={ \u0026#34;name\u0026#34;: \u0026#34;dagre\u0026#34; }, elements=ELEMENTS, stylesheet=STYLES, ) app.run() Already, this is significantly better than the default style.\nSince the elements and styles are cleanly separated, it\u0026rsquo;s convenient to style nodes and arcs based on aspects of their data. To give you a sense of what this means for DDs, here is a screenshot from that research project I mentioned.\nIn this case, border colors, background colors, and line styles have different meanings. It\u0026rsquo;s easy to add interactivity like toggling on more information in the node labels, or restructuring the diagram and its representation based on user input. Try out the example to see how Dash is built from the ground up for interactivity.\nResources dash-cytoscape.py provides the full example visualization Though this is less the case as DD implementations become more like Dynamic Programming and abandon layers.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nIbid.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nNot to make excuses, but research projects go a lot slower lately than they used to.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nDash delegates all of its graph rendering functionality to Cytoscape, and provides an API layer for graph data management and interactivity.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThese are out of scope for this post.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ryanjoneil.dev/posts/2025-10-18-visualizing-decision-diagrams-with-dash-and-cytoscape/","summary":"\u003cp\u003e\u003ca href=\"../2023-09-13-visualizing-decision-diagrams/\"\u003eI posted a couple years ago\u003c/a\u003e about my adventures applying various graph visualization tools to Decision Diagrams (DD). This is an interesting problem because DDs have characteristics that don\u0026rsquo;t apply to other graphs. They are layered\u003csup id=\"fnref:1\"\u003e\u003ca href=\"#fn:1\" class=\"footnote-ref\" role=\"doc-noteref\"\u003e1\u003c/a\u003e\u003c/sup\u003e and each arc from a layer \u003ccode\u003en-1\u003c/code\u003e to layer \u003ccode\u003en\u003c/code\u003e has the same direction. Nodes in a layer can be merged or split apart, with those operations generally staying within the same layer\u003csup id=\"fnref:2\"\u003e\u003ca href=\"#fn:2\" class=\"footnote-ref\" role=\"doc-noteref\"\u003e2\u003c/a\u003e\u003c/sup\u003e. Sub-diagrams can even be \u003ca href=\"https://arxiv.org/abs/2302.05483\"\u003epeeled off of parent diagrams during branch-and-bound\u003c/a\u003e, while maintaining much of their original structure.\u003c/p\u003e","title":"🖍 Visualizing Decision Diagrams with Dash and Cytoscape"},{"content":"Some years ago, I worked on real-time meal delivery at Zoomer, a YC startup based out of Philadelphia. Zoomer\u0026rsquo;s production tech stack was primarily Ruby. As it grew we moved from using heuristics for things like routing and scheduling to open source optimization solvers.\nLike most languages that aren\u0026rsquo;t Python, Ruby doesn\u0026rsquo;t have an especially mature ecosystem for optimization (or data science, or machine learning, for that matter). For some use cases that didn\u0026rsquo;t matter. When we upgraded the routing engine, we built a model in C++ using Gecode and wrapped a Ruby gem around a SWIG wrapper. But when we wanted to use integer programming to build schedules, the lack of solver APIs proved inconvenient.1\nAt the time, PuLP was probably the most commonly used open source multi-solver Python library for linear and integer programming.2 This led me the opportunity to develop RAMS, a PuLP-inspired library for basic MILP modeling in Ruby.\nThen the Zoomer team became part of Grubhub. We moved to a Java stack and a commercial optimization solver. Improvements to the RAMS project languished on my todo list. It lagged behind major versions of Ruby, optimization, solvers, and dependencies, painfully out of date and unmaintained.\nThen, last month, Github released its Copilot agent. Unlike vibe coding directly in the editor, which sounds like speeding maniacally through a bad acid trip, the idea here is more like running a project: create issues, receive and comment on pull requests, iterate.\nI figured the grunt work of library upgrades should be perfect fodder to try out an AI developer assistant. RAMS is already well structured and tested. The upgrade is well defined. No creativity required.\nA RAMS modeling example This post is meandering through two topics: solving optimization models with Ruby and RAMS, and my experiences maintaining that library using Copilot. I could have split this into two posts, but that didn\u0026rsquo;t feel right. So let\u0026rsquo;s show what building a model in RAMS looks like first.\nI don\u0026rsquo;t use Ruby with any regularity these days3, but modeling with RAMS reminded me how elegant Ruby DSLs can be. Here\u0026rsquo;s a simple example of a binary integer program.\n#!/usr/bin/env ruby require \u0026#39;rams\u0026#39; m = RAMS::Model.new x1 = m.variable type: :binary x2 = m.variable type: :binary x3 = m.variable type: :binary m.constrain(x1 + x2 + x3 \u0026lt;= 2) m.constrain(x2 + x3 \u0026lt;= 1) m.sense = :max m.objective = 1 * x1 + 2 * x2 + 3 * x3 solution = m.solve puts \u0026lt;\u0026lt;-HERE objective: #{solution.objective} x1 = #{solution[x1]} x2 = #{solution[x2]} x3 = #{solution[x3]} HERE I think that\u0026rsquo;s rather nice, and very clean.\nRAMS enhancements The biggest change in RAMS is that it now supports the HiGHS optimization solver. Prior to v0.2.0, GLPK was the default solver, but now that is HiGHS. There are a number of smaller changes as well.\nRAMS requires Ruby v3.1. CPLEX support was removed since I can\u0026rsquo;t test it.4 One can set solver paths using environment variables (e.g. RAMS_SOLVER_PATH_CBC). Improved documentation and a logo! The Copilot agent as coding companion While I tend to err on the side of LLM skepticism, working with the Copilot agent for this upgrade was generally positive. It was a bit like working with a fast, responsive, and inexperienced developer. The issues it ran into were pretty much the same, but the time scale was compressed.\nI had it open three pull requests for me.\n🤨 PR 29: Upgrade Ruby and dependencies Performance here was middling. Copilot got through some of the task without assistance. It also made a number of changes that were unhelpful and irrelevant to the request.\nOn a positive note, I forgot to ask it to change from CircleCI to GitHub Actions for testing. This gave me the opportunity to test its response to feature creep. It responded with a partially working GitHub Actions workflow (and no grumbling!).\nCopilot made a number of errors and wasn\u0026rsquo;t able to finish the upgrade on its own.\nIt decided to build the optimizers from source instead of simply installing binary packages using apt or dnf. Not only is this wasteful and overly complicated, it ultimately wasn\u0026rsquo;t able to build and install them from source. Once I told it to use a Fedora 42 base image, this improved, but it couldn\u0026rsquo;t figure out what package to use for the CBC solver. It switched back and forth without prompting between cbc (incorrect) and coin-or-Cbc (correct). It inexplicably couldn\u0026rsquo;t figure out the latest stable version of Ruby. It added a bunch of architecture-specific package definitions to the build, unprompted. This was unnecessary given that RAMS is a vanilla Ruby project. I had to help it figure out that the CBC binary is now called coin.cbc on Fedora. This wasn\u0026rsquo;t entirely surprising. 🤩 PR 32: Add environment variables for solver paths Copilot did a great job on this task. I had no issue with the code it wrote. It followed the style of the rest of the package nicely. It added appropriate documentation and unit tests.\n👌 PR 34: Support the HiGHS optimization solver Copilot did pretty well here, even though it didn\u0026rsquo;t get the feature working. It was able to create a new solver interface and get most of the logic for solution parsing right. I was a little surprised that it forgot to test the new solver integration in GitHub Actions. The biggest issue it needed my help on was solution status parsing, where it didn\u0026rsquo;t realize that the second condition here will never trigger.\nreturn :feasible if status =~ /feasible/i return :infeasible if status =~ /infeasible/i This should have been the following (note the ^).\nreturn :feasible if status =~ /^feasible/i return :infeasible if status =~ /infeasible/i I don\u0026rsquo;t remember finding any MILP modeling interfaces for Ruby like PuLP in 2016-17. More recently, Rulp and Opt have been developed.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nPulLP is still heavily used and developed.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nOnce upon a time I was a Perl programmer. Ruby was originally written to be a better Perl. I\u0026rsquo;ve long since given up the old ways.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nFor now, RAMS is focussing on open source solvers. Maintaining commercial solver licenses can be challenging when you\u0026rsquo;re not part of academia. PRs welcome.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ryanjoneil.dev/posts/2025-06-25-rams-reboot/","summary":"\u003cp\u003eSome years ago, I worked on real-time meal delivery at Zoomer, a YC startup based out of Philadelphia. Zoomer\u0026rsquo;s production tech stack was primarily Ruby. As it grew we moved from using heuristics for things like routing and scheduling to open source optimization solvers.\u003c/p\u003e\n\u003cp\u003eLike most languages that aren\u0026rsquo;t Python, Ruby doesn\u0026rsquo;t have an especially mature ecosystem for optimization (or data science, or machine learning, for that matter). For some use cases that didn\u0026rsquo;t matter. When we upgraded the routing engine, we built a model in C++ using \u003ca href=\"https://www.gecode.org/\"\u003eGecode\u003c/a\u003e and wrapped a Ruby gem around a \u003ca href=\"https://www.swig.org/\"\u003eSWIG\u003c/a\u003e wrapper. But when we wanted to use integer programming to build schedules, the lack of solver APIs proved inconvenient.\u003csup id=\"fnref:1\"\u003e\u003ca href=\"#fn:1\" class=\"footnote-ref\" role=\"doc-noteref\"\u003e1\u003c/a\u003e\u003c/sup\u003e\u003c/p\u003e","title":"🐏 RAMS Reboot"},{"content":"I build decision science and optimization software.\nBy day, I am an optimization engineer, coder, and co-founder of Nextmv. I\u0026rsquo;m interested in hybrid optimization, decision diagrams, and mixed integer programming. My applications skew toward logistics for delivery platforms, with detours into cutting and packing. Lately I\u0026rsquo;ve been embedding a lot of trained machine learning models in optimization problems, and exploring applications of inverse optimization.\nFor the past several years, I\u0026rsquo;ve worked in real-time optimization for on-demand delivery, scheduling, forecasting, and simulation. I received a MS in Operations Research by night at George Mason University, then a PhD in the same department under the advisement of Karla Hoffman.\nAppearances This is a running list of talks I\u0026rsquo;ve given and am scheduled to give. It probably isn\u0026rsquo;t exhaustive. Some of them have slides or videos available.\n2025 Jun 26 🎥 Nextmv Videos Operationalize a Gurobi price optimization notebook: Deploy, collaborate, test, and visualize with Nextmv May 15 ODSC East 2025 Predict \u0026amp; Prescribe: Combining forecasts with optimized plans Apr 8 University of Luxembourg Decision model, meet reality: Testing lessons from food logistics and delivery operations Mar 14-16 📄 📄 INFORMS Computing Society Conference 2025 Chair and organizer of the solvers cluster Mar 6 🎥 Nextmv Videos Nextmv Hexaly Integration: How to run, test, and manage with DecisionOps workflows 2024 Nov 7 🎥 Nextmv Videos Nextmv ML/OR connectors: A price optimization example with Gurobipy, Gurobi ML, and Gurobipy Pandas Oct 21 📄 💻 INFORMS Annual Meeting 2024 Solving the Weapon Target Assignment Problem with Decision Diagrams Oct 3 🎥 Nextmv Videos Uncertainty, ML + OR, and stochastic optimization: Demo and Q\u0026amp;A with Seeker creator Jul 30 🎥 Nextmv Videos Operationalizing HiGHS-based MIP models and Q\u0026amp;A with project developers Jun 27 💻 HiGHS Workshop 2024 Symphonic HiGHS: Operationalizing next moves with DecisionOps Jun 18 🎥 Nextmv Videos Combining machine learning (ML) and operations research (OR) through horizontal computing Jun 7 📄 💻 🎥 EURO Practitioners\u0026rsquo; Forum Three model problem: Combining machine learning (ML) and operations research (OR) through horizontal computing Apr 14 📄 INFORMS Analytics Conference 2024 The sushi is ready. How do I deliver it? Forecast, schedule, route with DecisionOps Apr 10 🎥 Nextmv Videos Getting started with DecisionOps for decision science models using Gurobi 2023 Dec 6 📄 🧑‍💻️ 💻 🎥 PyData Global 2023 Order up! How do I deliver it? Build on-demand logistics apps with Python, OR-Tools, and DecisionOps Nov 16 🎥 Nextmv Videos Forecast, schedule, route: 3 starter models for on-demand logistics Oct 17 📄 INFORMS Annual Meeting 2023 Adapting to Change in On-Demand Delivery: Unpacking a Suite of Testing Methodologies Sep 20 📄 💻 🎥 DecisionCAMP 2023 Decision model, meet the real world: Testing optimization models for use in production environments Aug 27 💻 DPSOLVE 2023 Implementing Decision Diagrams in Production Systems May 11 🎥 Nextmv Videos Several people are optimizing: Collaborative workflows for decision model operations Apr 17 📄 INFORMS Analytics Conference 2023 Decision Model, Meet Production: A Collaborative Workflow for Optimizing More Operations Feb 16 🎥 Nextmv Videos Decision diagrams in operations research, optimization, vehicle routing, and beyond Jan 18 🎥 Nextmv Videos In conversation with Karla Hoffman 2022 Nov 16 🎥 Nextmv Videos Decision model, meet production 2020 Oct 5 🎥 INFORMS Philadelphia Chapter Real-Time Routing for On-Demand Delivery 2019 Oct 22 💻 INFORMS Annual Meeting 2019 Decision Diagrams for Real-Time Routing 2017 July 6 📄 🎥 PyData Seattle 2017 Practical Optimization for Stats Nerds Mar 5 💻 Data Science DC Practical Optimization for Stats Nerds 2015 Dec 4 💻 🎥 PyData NYC 2015 Optimize your Docker Infrastructure with Python 2014 Jul 17 📄 💻 IFORS 2014 A MIP-Based Dual Bounding Technique for the Irregular Nesting Problem 2010 Feb 19 🎥 PyCon 2010 Optimal Resource Allocation using Python Articles, papers \u0026amp; patents I\u0026rsquo;m an desultory blogger and intermittent academic. Most of my current and old posts live here. Some of my other content is below.\n2024 Dec 4 ORMS Today Ops Researchers, It’s Time to Git with the Flow Mar 26 🖨️ USPTO Fast computational generation of digital pickup and delivery plans describes algorithms for fast on-demand routing in pickup and delivery problems. Oct 31 Nextmv Blog 5 things software teams should know about operations research and decision science Oct 17 Nextmv Blog New integration: Bring your Hexaly decision model to Nextmv Mar 7 Nextmv Blog Nextmv Gurobi integration: Build, test, deploy decision models using Gurobi and DecisionOps Feb 13 Nextmv Blog CI/CD for decision science: What is it, how does it work, and why does it matter? Feb 1 Nextmv Blog New decision apps, an open source decision model hub, and an individual plan 2023 Dec 26 🖨️ USPTO Prediction of travel time and determination of prediction interval describes technology for predicting travel times for on-demand delivery platforms. Dec 19 Nextmv Blog Shift scheduling optimization: Generating shift types, planning for demand, and assigning workers Jun 13 🖨️ USPTO Runners for optimization solvers and simulators describes technology for creating and executing Decision Diagram-based optimization solvers and state-based simulators in cloud environments. 2022 Apr 20 Nextmv Blog You need a solver. What is a solver? 2021 Mar 2 Nextmv Blog Binaries are beautiful 2020 Sep 11 🖨️ Operations Research Forum MIPLIBing: Seamless Benchmarking of Mathematical Optimization Problems and Metadata Extensions presents a Python library that automatically downloads queried subsets from the current versions of MIPLIB, MINLPLib, and QPLIB, provides a centralized local cache across projects, and tracks the best solution values and bounds on record for each problem. Mar 2 Nextmv Blog How Hop Hops 2019 May 🖨️ Operations Research Letters Decision diagrams for solving traveling salesman problems with pickup and delivery in real time explores the use of Multivalued Decision Diagrams and Assignment Problem inference duals for real-time optimization of TSPPDs. 2018 Oct 2 🖨️ Optimization Online Integer Models for the Asymmetric Traveling Salesman Problem with Pickup and Delivery proposes a new ATSPPD model, new valid inequalities for the Sarin-Sherali-Bhootra ATSPPD, and studies the impact of relaxing complicating constraints in these. Sep 13 Grubhub Bytes Decisions are first class citizens: an introduction to Decision Engineering Sep 2 🖨️ Optimization Online Exact Methods for Solving Traveling Salesman Problems with Pickup and Delivery in Real Time examines exact methods for solving TSPPDs with consolidation in real-time applications. It considers enumerative, Mixed Integer Programming, Constraint Programming, and hybrid optimization approaches under various time budgets. Apr 10 🖨️ Optimization Online The Meal Delivery Routing Problem introduces the MDRP to formalize and study an important emerging class of dynamic delivery operations. It also develops optimization-based algorithms tailored to solve the courier assignment (dynamic vehicle routing) and capacity management (offline shift scheduling) problems encountered in meal delivery operations. 2015 Jan 5 The Yhat Blog Currency Portfolio Optimization Using ScienceOps 2014 Nov 10 The Yhat Blog How Yhat Does Cloud Balancing: A Case Study Software Most of my work is proprietary, but some of it is open. Here are a few projects I\u0026rsquo;ve built or made significant contributions. I\u0026rsquo;ve also done some work on projects such as PuLP, MIPLIBing, and MDRPlib.\nActive(ish) projects The Ruby Algebraic Modeling System is a simple modeling tool for formulating and solving MILPs in Ruby. ap.cpp is an incremental primal-dual assignment problem solver written in C++. It can vastly improve propagation in hybrid optimization models that use AP relaxations. I use it within custom propagators in Gecode and in Decision Diagrams for solving the Traveling Salesman Problem with side constraints. ap is a Go version of ap.cpp. TSPPD Hybrid Optimization Code and TSPPD Decision Diagram Code are both used in my dissertation. The former contains C++14 code for hybrid CP and MIP models for solving TSPPDs. The latter uses a hybridized Decision Diagram implementation with an Assignment Problem inference dual inside a branch-and-bound. TSPPDlib is a standard test set for TSPPDs. The instances are based on observed meal delivery data at Grubhub. Defunct projects python-zibopt was a Python interface to the SCIP Optimization Suite. This was no longer necessary once PySCIPOpt emerged. Chute was a simple, lightweight tool for running discrete event simulations in Python. PyGEP was a simple library suitable for academic study of GEP (Gene Expression Programming) in Python 2. Et al In my spare time, I\u0026rsquo;m a cat and early music enthusiast, plus\u0026hellip;\na board member of Classical Uprising, a mentor of startup founders at the Roux Institute, chair of the INFORMS Membership Committee, and a cellist in the Southern Maine Symphony Orchestra. Iconography 📄 = abstract 🧑‍💻️ = code 🖨️ = pdf 🎟 = registration 💻 = slides 🎥 = video ","permalink":"https://ryanjoneil.dev/about/","summary":"\u003cp\u003eI build decision science and optimization software.\u003c/p\u003e\n\u003cp\u003eBy day, I am an optimization engineer, coder, and co-founder of \u003ca href=\"https://nextmv.io\"\u003eNextmv\u003c/a\u003e. I\u0026rsquo;m interested in hybrid optimization, \u003ca href=\"https://www.andrew.cmu.edu/user/vanhoeve/mdd/\"\u003edecision diagrams\u003c/a\u003e, and \u003ca href=\"https://en.wikipedia.org/wiki/Integer_programming\"\u003emixed integer programming\u003c/a\u003e. My applications skew toward logistics for delivery platforms, with detours into \u003ca href=\"https://www.euro-online.org/websites/esicup/\"\u003ecutting and packing\u003c/a\u003e. Lately I\u0026rsquo;ve been embedding a lot of trained machine learning models in optimization problems, and exploring applications of \u003ca href=\"https://pubsonline.informs.org/doi/10.1287/opre.2022.0382\"\u003einverse optimization\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eFor the past several years, I\u0026rsquo;ve worked in real-time optimization for on-demand delivery, scheduling, forecasting, and simulation. I received a MS in Operations Research by night at \u003ca href=\"https://seor.gmu.edu/\"\u003eGeorge Mason University\u003c/a\u003e, then a PhD in the same department under the advisement of \u003ca href=\"https://seor.vse.gmu.edu/~khoffman/\"\u003eKarla Hoffman\u003c/a\u003e.\u003c/p\u003e","title":"🖖 Hi, I'm Ryan."},{"content":"I just returned from the 2025 INFORMS Computing Society conference, where I had the privilege of organizing a cluster on optimization solvers. The cluster had two sessions, Solvers I and Solvers II, and focussed on new developments in the implementation of optimization solvers.\nIn the coming days, I\u0026rsquo;m going to explore some of these solvers in more depth. For now, I wanted to give a few hot takes from the sessions while they are still fresh in my mind.\nHybrid optimization is everywhere Hybrid optimization combines multiple techniques to solve a given problem. Most of the hybrid optimization literature focuses on leveraging strengths of different techniques to solve a particular well-defined problem, such as a routing problem with time windows, but it can also provide clear benefits to general-purpose solvers.\nOR-Tools, which is likely the most commonly used open source solver, gave a talk on the design of their CP-SAT[-LP] algorithm. To users interacting with OR-Tools through its APIs, CP-SAT looks like an ordinary constraint programming (CP) solver. Internally, however, they boost its CP solver with techniques from satisfiability (SAT) and linear programming (LP). This gives a whole that is much more powerful that its parts, as shown below.\n☝ Please pardon the poor quality of this photo I took during their ICS talk.\nMeanwhile, the commercial optimizer Hexaly incorporates a basketful of technologies under the hood. These include techniques from from exact methods, heuristics like large neighborhood search, and even some ideas from Decision Diagrams (DD).\nInterestingly, both solvers admit to some component algorithms being well behind leading implementations. OR-Tools\u0026rsquo;s SAT and LP solvers are somewhat rudimentary, and Hexaly\u0026rsquo;s simplex and interior point algorithms would not be competitive on their own. It is the combination of multiple algorithms and approaches that makes the solvers powerful.\nState-based modeling has a big opportunity Mixed Integer Programming (MIP) (and other math programming classes), CP, and Dynamic Programming (DP) have all been standard techniques in the optimization toolkit for decades. While MIP and CP both benefit from standard formats and solver interoperability through systems like MiniZinc, AMPL, and other projects, that never really happened for DP. Even now, DP models are usually bespoke and lack both modeling standards and standard solvers.\nThat is rapidly changing with the development of both Domain-Independent Dynamic Programming (DIDP), and new DD solvers like CODD. These efforts are still nascent, but there is growing momentum toward building both domain-independent solvers and modeling languages for state-based models. If this succeeds, DP and state-based models have the potential to become similar to MIP and CP in power, portability, and expressiveness.\nEstablished technologies are rapidly innovating, too Other talks in the cluster showed MaxiCP, a CP solver with roots in MiniCP that is suitable for real-life use, recent developments in proving global optimality for Mixed Integer Non-Linear Programs (MINLP) in Xpress, and an interesting new heuristic solver based on a technique called Random-Key Optimization (RKO) which represents solutions as vectors between 0 and 1 and changes the modeling exercise into solution decoding.\nDuring an interview several years ago, an optimization team leader at a major logistics company who told me that \u0026ldquo;optimization is a solved problem\u0026rdquo; and that new solver development was therefore not interesting. That isn\u0026rsquo;t what I see, though. Instead, I see the practical application of optimization continuing to grow beyond the boundaries of what today\u0026rsquo;s solvers can handle, and a ton of activity in development of those solvers to make them ever more powerful and flexible.\n","permalink":"https://ryanjoneil.dev/posts/2025-03-18-ics-2025-solvers-cluster-takeaways/","summary":"\u003cp\u003eI just returned from the \u003ca href=\"https://sites.google.com/view/ics-2025/home\"\u003e2025 INFORMS Computing Society conference\u003c/a\u003e, where I had the privilege of organizing a cluster on optimization solvers. The cluster had two sessions, \u003ca href=\"https://symposia.gerad.ca/ICS2025/en/schedule?slot_id=2418\"\u003eSolvers I\u003c/a\u003e and \u003ca href=\"https://symposia.gerad.ca/ICS2025/en/schedule?slot_id=2419\"\u003eSolvers II\u003c/a\u003e, and focussed on new developments in the implementation of optimization solvers.\u003c/p\u003e\n\u003cp\u003eIn the coming days, I\u0026rsquo;m going to explore some of these solvers in more depth. For now, I wanted to give a few hot takes from the sessions while they are still fresh in my mind.\u003c/p\u003e","title":"🧺 ICS 2025 Solvers Cluster Takeaways"},{"content":"In the last post, we used Gurobi\u0026rsquo;s hierarchical optimization features to compute the Pareto front for primary and secondary objectives in an assignment problem. This relied on Gurobi\u0026rsquo;s setObjectiveN method and its internal code for managing hierarchical problems.\nSome practitioners may need to do this without access to a commercial license. This post adapts the previous example to use HiGHS and its native Python interface, highspy. It\u0026rsquo;s also useful to see what the procedure is in order to understand it better. This isn\u0026rsquo;t exactly what I\u0026rsquo;d call hard, but it is easy to mess up.1\nCode The mathematical models are available in the last post, so I won\u0026rsquo;t restate them here. We start in roughly the same manner as before2: create a binary variable for each worker-patient pair, add assignment problem constraints, and state the primary objective.\nfrom itertools import product import highspy n = len(data[\u0026#34;cost\u0026#34;]) workers = range(n) patients = range(n) workers_patients = list(product(workers, patients)) h = highspy.Highs() # x[w,p] = 1 if worker w is assigned to patient p. x = {(w, p): h.addBinary(obj=data[\u0026#34;cost\u0026#34;][w][p]) for w, p in workers_patients} # Each worker is assigned to one patient. h.addConstrs(sum(x[w, p] for p in patients) == 1 for w in workers) # Each patient is assigned one worker. h.addConstrs(sum(x[w, p] for w in workers) == 1 for p in patients) # Primary objective: minimize cost. h.setMinimize() h.solve() cost = h.getObjectiveValue() Note that if the costs and affinities were lists instead of matrices, we could have used h.addBinaries instead of h.addBinary.\nFrom here we\u0026rsquo;ll be solving the model twice for every value of alpha. These expressions for total cost and affinity will make a code a little cleaner.\ncost_expr = sum(data[\u0026#34;cost\u0026#34;][w][p] * x[w, p] for w, p in workers_patients) affinity_expr = sum(data[\u0026#34;affinity\u0026#34;][w][p] * x[w, p] for w, p in workers_patients) Now comes the hierarchical optimization logic. For every value of alpha, we find the best affinity possible while keeping cost within alpha of its best possible value.\nUpdate the objective function to maximize affinity (see the calls to h.changeColCost and h.setMaximize). Constrain the cost to be within alpha of the original optimal cost (see cost_cons). Re-optimize and save the maximal affinity. Now we constrain the affinity and re-optimize cost.3\nUpdate the objective function to minimize cost again. Constrain the affinity. Once that\u0026rsquo;s done, we remove the additional constraints and repeat for a new value of alpha.\nfor alpha in alphas: # Secondary objective: maximize affinity. for (w, p), x_wp in x.items(): h.changeColCost(x_wp.index, data[\u0026#34;affinity\u0026#34;][w][p]) # Constrain cost to be within alpha of maximum. cost_cons = h.addConstr(cost_expr \u0026lt;= (1 + alpha) * cost) h.setMaximize() h.solve() affinity = h.getObjectiveValue() # Re-optimize with original cost objective, constraining affinity. for (w, p), x_wp in x.items(): h.changeColCost(x_wp.index, data[\u0026#34;cost\u0026#34;][w][p]) affinity_cons = h.addConstr(affinity_expr \u0026gt;= affinity) h.setMinimize() h.solve() yield alpha, h.getObjectiveValue(), affinity # Remove cost and affinity constraints for h.removeConstr(cost_cons) h.removeConstr(affinity_cons) Encouragingly, running this using the model.py linked below gives the same values as the Gurobi model, albeit not as quickly. Floating point values are rounded for readability.\n| alpha | cost | affinity | | ----- | -------- | -------- | | 0.0 | 11212.0 | 53816.0 | | 0.05 | 11761.0 | 74001.0 | | 0.1 | 12332.0 | 79981.0 | | 0.15 | 12886.0 | 83103.0 | | 0.2 | 13454.0 | 85394.0 | | 0.25 | 13996.0 | 87136.0 | | 0.3 | 14557.0 | 88546.0 | | 0.35 | 15125.0 | 89751.0 | | 0.4 | 15670.0 | 90664.0 | | 0.45 | 16255.0 | 91345.0 | | 0.5 | 16816.0 | 91997.0 | | 0.55 | 17370.0 | 92537.0 | | 0.6 | 17924.0 | 93012.0 | | 0.65 | 18495.0 | 93491.0 | | 0.7 | 19055.0 | 93829.0 | | 0.75 | 19591.0 | 94228.0 | | 0.8 | 20167.0 | 94530.0 | | 0.85 | 20737.0 | 94833.0 | | 0.9 | 21295.0 | 95114.0 | | 0.95 | 21812.0 | 95361.0 | | 1.0 | 22402.0 | 95613.0 | Resources model.py hierarchical objectives HiGHS model It gets even easier to mess up with more than two objectives.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nIsn\u0026rsquo;t it nice that MIP modeling is similar across different APIs?\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nExercise for the reader: why do we need to re-optimize cost?\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ryanjoneil.dev/posts/2024-11-11-hierarchical-optimization-with-highs/","summary":"\u003cp\u003eIn the \u003ca href=\"../2024-11-08-hierarchical-optimization-with-gurobi/\"\u003elast post\u003c/a\u003e, we used Gurobi\u0026rsquo;s hierarchical optimization features to compute the Pareto front for primary and secondary objectives in an assignment problem. This relied on Gurobi\u0026rsquo;s \u003ccode\u003esetObjectiveN\u003c/code\u003e method and its internal code for managing hierarchical problems.\u003c/p\u003e\n\u003cp\u003eSome practitioners may need to do this without access to a commercial license. This post adapts the previous example to use HiGHS and its native Python interface, \u003ca href=\"https://pypi.org/project/highspy/\"\u003e\u003ccode\u003ehighspy\u003c/code\u003e\u003c/a\u003e. It\u0026rsquo;s also useful to see what the procedure is in order to understand it better. This isn\u0026rsquo;t exactly what I\u0026rsquo;d call \u003cem\u003ehard\u003c/em\u003e, but it is \u003cem\u003eeasy to mess up\u003c/em\u003e.\u003csup id=\"fnref:1\"\u003e\u003ca href=\"#fn:1\" class=\"footnote-ref\" role=\"doc-noteref\"\u003e1\u003c/a\u003e\u003c/sup\u003e\u003c/p\u003e","title":"👔 Hierarchical Optimization with HiGHS"},{"content":"One of the first technology choices to make when setting up an optimization stack is which modeling interface to use. Even if we restrict our choices to Python interfaces for MIP modeling, there are lots of options to consider.\nIf you use a specific solver, you can opt for its native Python interface. Examples include libraries like gurobipy, Fusion, highspy, or PySCIPOpt. This approach provides access to important solver-specific features such as lazy constraints, heuristics, and various solver settings. However, it can also lock you into a solver before ready for that.\nYou can also choose a modeling API that targets multiple solvers. In the Python ecosystem. These are libraries like amplpy, Pyomo, PyOptInterface, and linopy. These interfaces target multiple solver backends (both open source and commercial) and provide a subset of the functionality of each. Since they make it easy to switch between solvers, this is usually where I start.1\nHierarchical assignment However, there are plenty of times when solver-specific APIs are useful, or even critical. One example is hierarchical optimization. This is a simple technique for managing trade-offs between multiple objectives in a problem. Let\u0026rsquo;s look at an example.\nImagine we are assigning in-home health care workers ($w \\in W$) to patients ($p \\in P$). For simplicity, let\u0026rsquo;s say we have $n$ workers and $n$ patients, and we are assigning them one-to-one. Each worker has a given cost ($c_{wp}$) of assignment to each patient, which may reflect something like the travel time to get to them. We want to assign each worker to exactly one patient while minimizing the overall cost.\nModel So far, what we have is a simple linear sum assignment problem.\n$$ \\begin{align*} \u0026amp; \\text{min} \u0026amp;\u0026amp; z = \\sum_{wp} c_{wp} x_{wp} \\\\ \u0026amp; \\text{s.t.} \u0026amp;\u0026amp; \\sum_w x_{wp} = 1 \u0026amp;\u0026amp; \\forall \\quad p \\in P \\\\ \u0026amp; \u0026amp;\u0026amp; \\sum_p x_{wp} = 1 \u0026amp;\u0026amp; \\forall \\quad w \\in W \\\\ \u0026amp; \u0026amp;\u0026amp; x \\in \\{0,1\\}^{|W \\times P|} \\end{align*} $$\nSolving this model gives us the minimum cost assignment. That\u0026rsquo;s all well and good, but now say we have a secondary objective of maximizing affinity of workers to patients ($a_{wp}$). That is, we want to prefer assignments that increase overall affinity while still minimizing cost. This is actually a common goal in health care scheduling: if possible, send the same worker to a given patient that you usually send.\nHierarchical optimization gives us a simple way to solve this problem. First, we optimize the model as stated above. This gives us an optimal objective value $z^*$. Then we re-solve the same optimization model, while constraining the cost to be $z^*$ and using the secondary objective function. This says to the optimizer, \u0026ldquo;improve the affinity as much as you can, but keep the cost optimal.\u0026rdquo;\n$$ \\begin{align*} \u0026amp; \\text{max} \u0026amp;\u0026amp; w = \\sum_{wp} a_{wp} x_{wp} \\\\ \u0026amp; \\text{s.t.} \u0026amp;\u0026amp; \\sum_{wp} c_{wp} x_{wp} \\le z^* \\\\ \u0026amp; \u0026amp;\u0026amp; \\sum_w x_{wp} = 1 \u0026amp;\u0026amp; \\forall \\quad p \\in P \\\\ \u0026amp; \u0026amp;\u0026amp; \\sum_p x_{wp} = 1 \u0026amp;\u0026amp; \\forall \\quad w \\in W \\\\ \u0026amp; \u0026amp;\u0026amp; x \\in \\{0,1\\}^{|W \\times P|} \\end{align*} $$\nFrom here, the natural question becomes: what if we trade off some cost for affinity? If we\u0026rsquo;re willing to increase cost by some percentage, how much more affinity do we get? We can do this by setting a constant $\\alpha \\ge 0$ and solving the model a number of times.2\n$$ \\begin{align*} \u0026amp; \\text{max} \u0026amp;\u0026amp; w = \\sum_{wp} a_{wp} x_{wp} \\\\ \u0026amp; \\text{s.t.} \u0026amp;\u0026amp; \\sum_{wp} c_{wp} x_{wp} \\le (1 + \\alpha) z^* \\\\ \u0026amp; \u0026amp;\u0026amp; \\sum_w x_{wp} = 1 \u0026amp;\u0026amp; \\forall \\quad p \\in P \\\\ \u0026amp; \u0026amp;\u0026amp; \\sum_p x_{wp} = 1 \u0026amp;\u0026amp; \\forall \\quad w \\in W \\\\ \u0026amp; \u0026amp;\u0026amp; x \\in \\{0,1\\}^{|W \\times P|} \\end{align*} $$\nFor example, if $\\alpha = 0.05$, then we\u0026rsquo;re willing to accept a 5% increase in overall cost to improve affinity. Setting different values of $\\alpha$ lets us explore the space of that trade-off and its impact on cost and affinity.\nOnce we solve this and get the optimal affinity ($w^*$), we should re-optimize for the primary objective again while constraining the secondary one.\n$$ \\begin{align*} \u0026amp; \\text{min} \u0026amp;\u0026amp; \\sum_{wp} c_{wp} x_{wp} \\\\ \u0026amp; \\text{s.t.} \u0026amp;\u0026amp; \\sum_{wp} a_{wp} x_{wp} \\ge w^* \\\\ \u0026amp; \u0026amp;\u0026amp; \\sum_w x_{wp} = 1 \u0026amp;\u0026amp; \\forall \\quad p \\in P \\\\ \u0026amp; \u0026amp;\u0026amp; \\sum_p x_{wp} = 1 \u0026amp;\u0026amp; \\forall \\quad w \\in W \\\\ \u0026amp; \u0026amp;\u0026amp; x \\in \\{0,1\\}^{|W \\times P|} \\end{align*} $$\nCode So the math looks reasonable. How do we implement it? If we have a Gurobi license, we can use its built-in facilities for multiobjective optimization. This means that, instead solving a model multiple times and adding constraints to keep cost within $\\alpha$ of its optimal value, we can create a single model that does all of this for us.\nAssume we have input data which looks like this.\n{ \u0026#34;cost\u0026#34;: [ [10, 20, ...], [30, 40, ...], ... ], \u0026#34;affinity\u0026#34;: [ [25, 15, ...], [35, 25, ...], ... ] } We start with a simple assignment problem formulation.\nimport gurobipy as gp n = len(data[\u0026#34;cost\u0026#34;]) workers = range(n) patients = range(n) m = gp.Model() m.ModelSense = gp.GRB.MINIMIZE # x[w,p] = 1 if worker w is assigned to patient p. x = m.addVars(n, n, vtype=gp.GRB.BINARY) for i in range(n): # Each worker is assigned to one patient. m.addConstr(gp.quicksum(x[i, p] for p in patients) == 1) # Each patient is assigned one worker. m.addConstr(gp.quicksum(x[w, i] for w in workers) == 1) We add primary and secondary objectives, and call optimize. The objectives are solved in descending order of the priority flag for Model.setObjectiveN. reltol allows us to degrade the primary objective by some amount (e.g. 5%) to improve the secondary objective.\nOne catch is that the model only has one objective sense. Since we are minimizing the primary objective, we give the secondary objective a weight of -1 in order to maximize it.\nfrom itertools import product # Primary objective: minimize cost. z = (data[\u0026#34;cost\u0026#34;][w][p] * x[w, p] for w, p in product(workers, patients)) m.setObjectiveN(expr=gp.quicksum(z), index=0, name=\u0026#34;cost\u0026#34;, priority=1, reltol=alpha) # Secondary objective: maximize affinity. Since the model sense is minimize, # we negate the secondary objective in order to maximize it. w = (data[\u0026#34;affinity\u0026#34;][w][p] * x[w, p] for w, p in product(workers, patients)) m.setObjectiveN( expr=gp.quicksum(w), index=1, name=\u0026#34;affinity\u0026#34;, priority=0, weight=-1 ) m.optimize() Then we use this magic syntax to pull out the optimal cost and affinity.\nm.params.ObjNumber = 0 cost = m.ObjNVal m.params.ObjNumber = 1 affinity = m.ObjNVal Results If we solve this in a loop with alpha values from 0 to 1 in increments of 0.05, we can plot the trade-off between cost and affinity. Going from $\\alpha = 0$ to $\\alpha = 0.05$ or $\\alpha = 0.1$ gives a pretty sizable improvement in affinity. After that, the return starts to gradually level off. This allows us to make a more informed choice about these two objectives.\nResources generate.py generates input data input-100x100.json contains input data model.py hierarchical objectives Gurobi model While commercial libraries like AMPL have always focussed on modeling performance, some of the open source options targeting multiple solvers come with significant performance penalties during formulation and model handoff to the solver. Newer options like linopy (benchmarks) and PyOptInterface (benchmarks) don\u0026rsquo;t have that issue.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThis gives us a Pareto front, which explores the trade-offs between different objectives.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ryanjoneil.dev/posts/2024-11-08-hierarchical-optimization-with-gurobi/","summary":"\u003cp\u003eOne of the first technology choices to make when setting up an optimization stack is which modeling interface to use. Even if we restrict our choices to Python interfaces for MIP modeling, there are lots of options to consider.\u003c/p\u003e\n\u003cp\u003eIf you use a specific solver, you can opt for its native Python interface. Examples include libraries like \u003ca href=\"https://pypi.org/project/gurobipy/\"\u003e\u003ccode\u003egurobipy\u003c/code\u003e\u003c/a\u003e, \u003ca href=\"https://docs.mosek.com/latest/pythonfusion/index.html\"\u003eFusion\u003c/a\u003e, \u003ca href=\"https://pypi.org/project/highspy/\"\u003e\u003ccode\u003ehighspy\u003c/code\u003e\u003c/a\u003e, or \u003ca href=\"https://github.com/scipopt/PySCIPOpt\"\u003e\u003ccode\u003ePySCIPOpt\u003c/code\u003e\u003c/a\u003e. This approach provides access to important solver-specific features such as lazy constraints, heuristics, and various solver settings. However, it can also lock you into a solver before ready for that.\u003c/p\u003e","title":"👔 Hierarchical Optimization with Gurobi"},{"content":"At a Nextmv tech talk a couple weeks ago, I showed a least absolute deviations (LAD) regression model using OR-Tools. This isn\u0026rsquo;t new \u0026ndash; I pulled the formulation from Rob Vanderbei\u0026rsquo;s \u0026ldquo;Local Warming\u0026rdquo; paper, and I\u0026rsquo;ve shown similar models at conference talks in the past using other modeling APIs and solvers.\nThere are a couple reasons I keep coming back to this problem. One is that it\u0026rsquo;s a great example of how to build a machine learning model using an optimization solver. Unless you have an optimization background, it\u0026rsquo;s probably not obvious you can do this. Building a regression or classification model with a solver directly is a great way to understand the model better. And you can customize it in interesting ways, like adding epsilon insensitivity.\nAnother is that least squares, while most commonly used regression form, has a fatal flaw: it isn\u0026rsquo;t robust to outliers in the input data. This is because least squares minimize the sum of squared residuals, as shown in the formulation below. Here, $A$ is an $m \\times n$ matrix of feature data, $b$ is a vector of observations to fit, and $x$ is a vector of coefficients the optimizer must find.\n$$ \\min f(x) = \\Vert Ax-b \\Vert^2 $$\nSince the objective function minimizes squared residuals, outliers have a much bigger impact than other data. LAD regression solves this by simply summing the values of the residuals as they are.\n$$ \\min f(x) = \\vert Ax-b \\vert $$\nSo why isn\u0026rsquo;t this used more? Simple \u0026ndash; least squares has a convenient analytical solution, while LAD requires an algorithm to solve. For instance, you can formulate LAD regression as a linear program, but now you need a solver.\n$$ \\begin{align*} \\min \\quad \u0026amp; 1\u0026rsquo;z \\\\ \\text{s.t.}\\ \\quad \u0026amp; z \\ge Ax - b \\\\ \u0026amp; z \\ge b - Ax \\end{align*} $$\nWhile I like using this example, it paints a rather negative picture of squaring. If it does funny things to solvers, is there any good reason to square? Thus I\u0026rsquo;ve been on the lookout for a practical example where squaring a variable or expression makes a model more useful.\nLuckily for me, Erwin Kalvelagen recently posted about using optimization to schedule team meetings. This is an application where minimizing squared values of overbooking can be beneficial \u0026ndash; it may be worse to be triple booked than double booked.\nI won\u0026rsquo;t recreate the reasoning behind Erwin\u0026rsquo;s post here. You can read his blog for that. What we\u0026rsquo;ll do is look at both the formulations in his post, along with a couple extras using Julia for code, JuMP for modeling, SCIP for optimization, and Gadfly for visualization. All model code and data are linked in the resources section at the end.\nMaximize attendance To start off, I built a new data set, which you can find in the resources section. This differentiates team membership between two types of employees: individual contributors (starting with ic in the data), who attend meetings for 1 or 2 teams, and managers (prefixed with mgr), who attend meetings to coordinate across multiple teams. We schedule meetings for 10 teams (prefix t) into 3 time slots (s).\nThe first model in Erwin\u0026rsquo;s post maximizes attendance. This means it tries to schedule team members for as many unique time slots as possible. It doesn\u0026rsquo;t consider overbooking.\n$$ \\begin{align*} \\max\\quad \u0026amp; \\sum_{i,s} y_{i,s} \\\\ \\text{s.t.}\\quad\u0026amp; \\sum_{s} x_{t,s} = 1 \u0026amp;\\quad\\forall\u0026amp;\\ t \u0026amp; \\text{schedule each team meeting once}\\\\ \u0026amp; y_{i,s} \\le \\sum_{t} m_{i,t}\\ x_{t,s} \u0026amp;\\quad\\forall\u0026amp;\\ i,s \u0026amp; \\text{individuals attend team meetings}\\\\ \u0026amp; x_{t,s} \\in \\{0,1\\} \u0026amp;\\quad\\forall\u0026amp;\\ t,s\\\\ \u0026amp; y_{i,s} \\in \\{0,1\\} \u0026amp;\\quad\\forall\u0026amp;\\ i,s \\end{align*} $$\nThis yields the following team schedule, with red representing a scheduled team meeting.\nIf we look at the manager schedules, we\u0026rsquo;ll see that every manager is completely booked. This makes sense. That\u0026rsquo;s what managers do, right? Go to meetings?\nMinimize overbooking The model gets more interesting once we account for overbooking. Erwin\u0026rsquo;s post has a model that minimizes overbooking, where overbooking is the number of additional meetings in a time slot. If a team member is double booked, that\u0026rsquo;s 1 overbooking. If they are triple booked, that\u0026rsquo;s 2 overbookings.\nSum of overbooking The second model in Erwin\u0026rsquo;s post minimizes the sum of all overbookings. He does this by adding a continuous c vector that only incurs value once a team member goes over a single meeting in a given time slot.\n$$ \\begin{align*} \\min\\quad \u0026amp; \\sum_{i,s} c_{i,s} \\\\ \\text{s.t.}\\quad\u0026amp; \\sum_{s} x_{t,s} = 1 \u0026amp;\\quad\\forall\u0026amp;\\ t \u0026amp; \\text{schedule each team meeting once}\\\\ \u0026amp; c_{i,s} \\ge \\sum_{t} m_{i,t}\\ x_{t,s} - 1 \u0026amp;\\quad\\forall\u0026amp;\\ i,s \u0026amp; \\text{measure overbooking}\\\\ \u0026amp; x_{t,s} \\in \\{0,1\\} \u0026amp;\\quad\\forall\u0026amp;\\ t,s\\\\ \u0026amp; c_{i,s} \\ge 0 \u0026amp;\\quad\\forall\u0026amp;\\ i,s \\end{align*} $$\nGiven our data this results in the following team schedule, which is probably not all that interesting. I\u0026rsquo;ll leave this visualization out from now on.\nWhere it gets interesting is plotting overbookings for the managers. Here we see that 3 manager time slots are triple booked (red), while 8 are double booked (gray).\nSum of squared overbooking Let\u0026rsquo;s say it\u0026rsquo;s worse to triple book (or, gasp, quadruple book) than to double book. How can the model account for this? One answer, if you have a MIQP-enabled solver, is to simply square the c values.\n$$ \\begin{align*} \\min\\quad \u0026amp; \\sum_{i,s} c_{i,s}^2 \\\\ \\text{s.t.}\\quad\u0026amp; \\sum_{s} x_{t,s} = 1 \u0026amp;\\quad\\forall\u0026amp;\\ t \u0026amp; \\text{schedule each team meeting once}\\\\ \u0026amp; c_{i,s} \\ge \\sum_{t} m_{i,t}\\ x_{t,s} - 1 \u0026amp;\\quad\\forall\u0026amp;\\ i,s \u0026amp; \\text{measure overbooking}\\\\ \u0026amp; x_{t,s} \\in \\{0,1\\} \u0026amp;\\quad\\forall\u0026amp;\\ t,s\\\\ \u0026amp; c_{i,s} \\ge 0 \u0026amp;\\quad\\forall\u0026amp;\\ i,s \\end{align*} $$\nThis completely eliminates triple booking, as shown below. No manager is worse off than being double booked, which seems normal given my experiences.\nThe problem with this is that the solver now takes a lot longer. It\u0026rsquo;s not bad for the data in this example, but if you try it with something larger you\u0026rsquo;ll see what I mean. You can find the data generator code in the resources section.\nConstrained bottleneck So how can we do something similar without the computational cost? One option is to continue using MILP formulations, but in the context of hierarchical optimization. This means splitting the model into two. First, we try to minimize the maximum overbookings for any team member (the bottleneck, if you will). This involves adding a variable $b$ representing that maximum.\n$$ b = \\max\\Bigl\\{\\sum_{t} m_{i,t}\\ x_{t,s} - 1 : i \\in I, s \\in S \\Bigr\\} $$\nNow we can simply minimize $b$ using a MILP instead of a MIQP.\n$$ \\begin{align*} \\min\\quad \u0026amp; b \\\\ \\text{s.t.}\\quad\u0026amp; \\sum_{s} x_{t,s} = 1 \u0026amp;\\quad\\forall\u0026amp;\\ t \u0026amp; \\text{schedule each team meeting once}\\\\ \u0026amp; b \\ge \\sum_{t} m_{i,t}\\ x_{t,s} - 1 \u0026amp;\\quad\\forall\u0026amp;\\ i,s \u0026amp; \\text{maximum overbooking}\\\\ \u0026amp; x_{t,s} \\in \\{0,1\\} \u0026amp;\\quad\\forall\u0026amp;\\ t,s \\end{align*} $$\nOnce we solve the first model, we get the minimal value of $b$, which we call $b^*$. We can simply use $b^*$ as an upper bound for overbookings in the second original model.\n$$ \\begin{align*} \\min\\quad \u0026amp; \\sum_{i,s} c_{i,s} \\\\ \\text{s.t.}\\quad\u0026amp; \\sum_{s} x_{t,s} = 1 \u0026amp;\\quad\\forall\u0026amp;\\ t \u0026amp; \\text{schedule each team meeting once}\\\\ \u0026amp; c_{i,s} \\ge \\sum_{t} m_{i,t}\\ x_{t,s} - 1 \u0026amp;\\quad\\forall\u0026amp;\\ i,s \u0026amp; \\text{measure overbooking}\\\\ \u0026amp; x_{t,s} \\in \\{0,1\\} \u0026amp;\\quad\\forall\u0026amp;\\ t,s\\\\ \u0026amp; 0 \\le c_{i,s} \\le b^* \u0026amp;\\quad\\forall\u0026amp;\\ i,s \\end{align*} $$\nAs we see below, this model also eliminates triple bookings, and it\u0026rsquo;s quite a bit faster to solve than the MIQP.\nResources main.go generates input data membership.csv contains input data maximize-attendance.jl MILP model minimize-overbooking.jl MILP model minimize-overbooking-squared.jl MIQP model minimize-bottleneck.jl hierarchical MILP models ","permalink":"https://ryanjoneil.dev/posts/2023-11-26-reducing-overscheduling/","summary":"\u003cp\u003eAt a \u003ca href=\"https://nextmv.io\"\u003eNextmv\u003c/a\u003e \u003ca href=\"https://www.youtube.com/watch?v=XTeit7TAWj4\"\u003etech talk\u003c/a\u003e a couple weeks ago, I showed a \u003ca href=\"https://en.wikipedia.org/wiki/Least_absolute_deviations\"\u003eleast absolute deviations\u003c/a\u003e (LAD) regression model using OR-Tools. This isn\u0026rsquo;t new \u0026ndash; I pulled the formulation from Rob Vanderbei\u0026rsquo;s \u0026ldquo;\u003ca href=\"https://vanderbei.princeton.edu/tex/LocalWarming/LocalWarmingSIREVrev.pdf\"\u003eLocal Warming\u003c/a\u003e\u0026rdquo; paper, and I\u0026rsquo;ve shown similar models at conference talks in the past using other modeling APIs and solvers.\u003c/p\u003e\n\u003cp\u003eThere are a couple reasons I keep coming back to this problem. One is that it\u0026rsquo;s a great example of how to build a machine learning model using an optimization solver. Unless you have an optimization background, it\u0026rsquo;s probably not obvious you can do this. Building a regression or classification model with a solver directly is a great way to understand the model better. And you can customize it in interesting ways, like adding \u003ca href=\"https://www.robots.ox.ac.uk/~az/lectures/ml/2011/lect6.pdf\"\u003eepsilon insensitivity\u003c/a\u003e.\u003c/p\u003e","title":"📅 Reducing Overscheduling"},{"content":"I attended DPSOLVE 2023 recently and found lots of good inspiration for the next version of Nextmv\u0026rsquo;s Decision Diagram (DD) solver, Hop. It\u0026rsquo;s a few years old now, and we learned a lot applying it in the field. Hop formed the basis for our first routing models. While those models moved to a different structure in our latest routing code, the first version broke ground combining DDs with Adaptive Large Neighborhood Search (ALNS), and its use continues to grow organically.\nA feature I\u0026rsquo;d love for Hop is the ability to visualize DDs and monitor the search. That could work interactively, like Gecode\u0026rsquo;s GIST, or passively during the search process. This requires automatic generation of images representing potentially large diagrams. So I spent a few hours looking at graph rendering options for DDs.\nManual rendering We\u0026rsquo;ll start with examples of visualizations built by hand. These form a good standard for how we want DDs to look if we automate rendering. We\u0026rsquo;ll start with some examples from academic literature, look at some we\u0026rsquo;ve used in Nextmv presentations, and show an interesting example that embeds in Hugo, the popular static site generator I use for this blog.\nAll the literature on using Decision Diagrams (DD) for optimization that I\u0026rsquo;m aware of depicts DDs as top-down, layered, directed graphs (digraphs). Some of the diagrams we come across appear to be coded and rendered, while some are fussily created by hand with a diagramming tool.\nAcademia I believe most of of the examples we find in academic literature are coded by hand and rendered using the LaTeX TikZ package. Below is one of the first diagrams that newcomers to DDs encounter. It\u0026rsquo;s from Decision Diagrams for Optimization by Bergman et al, 2016.\nIt doesn\u0026rsquo;t matter here what model this represents. It\u0026rsquo;s a Binary Decision Diagram (BDD), which means that each variable can be $0$ or $1$. The BDD on the left is exact, while the BDD on the right is a relaxed version of the same.\nThere\u0026rsquo;s quite a bit going on, so it\u0026rsquo;s worth an explanation. Let\u0026rsquo;s look at the \u0026ldquo;exact\u0026rdquo; BDD on the left first.\nHorizontal layers group arcs with a binary variable (e.g. $x_1$, $x_2$). Arcs assign either the value $0$ or $1$ to their layer\u0026rsquo;s variable. Dotted lines assign $0$ while solid lines assign $1$. Arc labels specify their costs. The BDD searches for a longest (or shortest) path from the root node $r$ to the terminal node $t$. The \u0026ldquo;relaxed\u0026rdquo; BDD on the right overapproximates both the objective value and the set of feasible solutions of the exact BDD on the left.\nThe diagram is limited to a fixed width (2, in this case) at each layer. The achieve this, the DD merges exact nodes together. Thus, on the left of the relaxed BDD, there is a single node in which $x_2$ can be $0$ or $1$. Here\u0026rsquo;s another example of an exact BDD from the same book.\nIn this diagram, each node has a state. For example, the state of $r$ is $\\{1,2,3,4,5\\}$. If we start at the root node $r$ and assign $x_1 = 0$, we end up at node $u_1$ with state $\\{2,3,4,5\\}$.\nMost other academic literature about DDs uses images similar to these.\nNextmv We\u0026rsquo;ve rendered a number of DDs over the years at Nextmv. Most of these images demonstrate a concept instead of a particular model. We usually create them by hand in a diagramming tool like Whimsical, Lucidchart, or Excalidraw. I built the diagrams below by hand in Whimsical. I think the result is nice, if time consuming and fussy.\nThis is a representation of an exact DD. It doesn\u0026rsquo;t indicate whether this is a BDD or a Multivalued Decision Diagram (MDD). It doesn\u0026rsquo;t have any labels or variable names. It just shows what a DD search might look like in the abstract.\nThe restricted DD below is more involved. It addition to horizontal layers, it divides nodes into explored and deferred groups. Most of the examples I\u0026rsquo;ve seen mix different types of nodes, like exact and relaxed. I really like differentiating node types like this.\nIn this representation, deferred nodes are in Hop\u0026rsquo;s queue for later exploration. Thus they don\u0026rsquo;t connect to any child nodes yet. This is the kind of thing I\u0026rsquo;d like to generate with real diagrams during search so I can examine the state of the solver.\nMy favorite of my DD renderings so far is the next one. This shows a single-vehicle pickup-and-delivery problem. The arc labels are stops (e.g. 🐶, 🐱). The path the 🚗 follows to the terminal node is the route. The gray boxes group together nodes to merge based on state to reduce isomorphisms out of the diagram.\nWe also have some images like those in our post on expanders by hand. As you can see, coding these by hand gets tedious.\nGoAT TikZ is a program that renders manually coded graphics, while Whimsical is a WYSIWYG diagram editor. I like the Whimsical images a lot better \u0026ndash; they feel cleaner and easier to understand.\nHugo supports GoAT diagrams by default, so I tried that out too. Here is an arbitrary MDD with two layers. The $[[1,2],4]$ node is a relaxed node; it doesn\u0026rsquo;t really matter here what the label means.\nx x 1 2 0 1 0 [ [ 1 , 2 2 0 ] , 4 ] 1 3 0 0 I like the way GoAT renders this diagram. It\u0026rsquo;s very readable. Unfortunately, it isn\u0026rsquo;t easy to automate. Creating a GoAT diagram is like using ASCII as a WYSIWYG diagramming tool, as you can see from the code for that image.\n.-. .-----------+ o +-----------. | \u0026#39;+\u0026#39; | | | | v v v .-. .---------. .-. x1 | 0 | | [[1,2],4] | | 3 | \u0026#39;-\u0026#39; \u0026#39;----+----\u0026#39; \u0026#39;+\u0026#39; | | .------------+ | | | | v v v .--. .--. .---. x2 | 10 | | 20 | | 100 | \u0026#39;-+\u0026#39; \u0026#39;-+\u0026#39; \u0026#39;-+-\u0026#39; | | | | v | | .-. | \u0026#39;---------\u0026gt;| * |\u0026lt;----------\u0026#39; \u0026#39;-\u0026#39; Automated rendering Now we\u0026rsquo;ll look at a couple options for automatically generating visualizations of DDs. These convert descriptions of graphs into images.\nGraphviz Graphviz is the tried and true graph visualizer. It\u0026rsquo;s used in the Go pprof library for examining CPU and memory profiles, and lots of other places.\nGraphviz accepts a language called DOT. It uses different layout engines to convert input into a visual representation. The user doesn\u0026rsquo;t have control over node position. That\u0026rsquo;s the job of the layout engine.\nHere\u0026rsquo;s the same MDD as written in DOT. The start -\u0026gt; end lines specify arcs in the digraph. The subgraphs organize nodes into layers. We add a dotted border around each layer and a label to say which variable it assigns. There isn\u0026rsquo;t any way of vertically centering and horizontally aligning the layer labels, so I thought it make more sense this way.\ndigraph G { s1 [label = 0] s2 [label = \u0026#34;[[1,2],4]\u0026#34;] s3 [label = 3] s4 [label = 10] s5 [label = 20] s6 [label = 100] r -\u0026gt; s1 [label = 2] r -\u0026gt; s2 [label = 4] r -\u0026gt; s3 [label = 1] s2 -\u0026gt; s4 [label = 10] s2 -\u0026gt; s5 [label = 4] s3 -\u0026gt; s6 [label = 2] subgraph cluster_0 { label = \u0026#34;x1\u0026#34; labeljust = \u0026#34;l\u0026#34; style = \u0026#34;dotted\u0026#34; s1 s2 s3 } subgraph cluster_1 { label = \u0026#34;x2\u0026#34; labeljust = \u0026#34;l\u0026#34; style = \u0026#34;dotted\u0026#34; s4 s5 s6 } s4 -\u0026gt; t s5 -\u0026gt; t s6 -\u0026gt; t } The result is comprehensible if not very attractive. With some fiddling, it\u0026rsquo;s possible to improve things like the spacing around arc labels. I couldn\u0026rsquo;t figure out how to align the layer labels and boxes. It doesn\u0026rsquo;t seem possible to move the relaxed nodes into their own column either, but that limitation isn\u0026rsquo;t unique to Graphviz.\nMermaid Mermaid is a JavaScript library for diagramming and charting. One can use it on the web or, presumably, embed it in an application.\nMermaid is similar to Graphviz in many ways, but it supports more diagram types. The input for that MDD in Mermaid is a bit simpler. Labels go inside arcs (e.g. -- 2 --\u0026gt;), and there are more sensible rendering defaults.\ngraph TD start((( ))) stop((( ))) A(0) B(\u0026#34;[[1,2],4]\u0026#34;) C(3) D(10) E(20) F(100) start -- 2 --\u0026gt; A start -- 4 --\u0026gt; B start -- 1 --\u0026gt; C B -- 10 --\u0026gt; D B -- 4 --\u0026gt; E C -- 2 --\u0026gt; F D --\u0026gt; stop E --\u0026gt; stop F --\u0026gt; stop subgraph \u0026#34;x1 \u0026#34; A; B; C end subgraph \u0026#34;x2\u0026#34; D; E; F end The result has a lot of the same limitations as the Graphviz version, but it looks more like the GoAT version. The biggest problem, as we see below, is that it\u0026rsquo;s not possible to left-align the layer labels. They can be obscured by arcs.\ngraph TD start((( ))) stop((( ))) A(0) B(\"[[1,2],4]\") C(3) D(10) E(20) F(100) start -- 2 --\u003e A start -- 4 --\u003e B start -- 1 --\u003e C B -- 10 --\u003e D B -- 4 --\u003e E C -- 2 --\u003e F D --\u003e stop E --\u003e stop F --\u003e stop subgraph \"x1 \" A; B; C end subgraph \"x2\" D; E; F end This got me thinking that there isn\u0026rsquo;t a strong reason DDs have to progress downward layer by layer. They could just as easily go from left to right. If we change the opening line from graph TD to graph LR, then we get the following image.\ngraph LR start((( ))) stop((( ))) A(0) B(\"[[1,2],4]\") C(3) D(10) E(20) F(100) start -- 2 --\u003e A start -- 4 --\u003e B start -- 1 --\u003e C B -- 10 --\u003e D B -- 4 --\u003e E C -- 2 --\u003e F D --\u003e stop E --\u003e stop F --\u003e stop subgraph \"x1 \" A; B; C end subgraph \"x2\" D; E; F end I think that\u0026rsquo;s pretty nice for a generated image.\n","permalink":"https://ryanjoneil.dev/posts/2023-09-13-visualizing-decision-diagrams/","summary":"\u003cp\u003eI attended \u003ca href=\"https://sites.google.com/view/dpsolve2023/\"\u003eDPSOLVE 2023\u003c/a\u003e recently and found lots of good inspiration for the next version of Nextmv\u0026rsquo;s Decision Diagram (DD) solver, \u003ca href=\"https://www.nextmv.io/blog/how-hop-hops\"\u003eHop\u003c/a\u003e. It\u0026rsquo;s a few years old now, and we learned a lot applying it in the field. Hop formed the basis for our first routing models. While those models moved to a different structure in our \u003ca href=\"https://www.nextmv.io/docs/vehicle-routing/get-started\"\u003elatest routing code\u003c/a\u003e, the \u003ca href=\"https://www.nextmv.io/docs/vehicle-routing/legacy/routing\"\u003efirst version\u003c/a\u003e broke ground combining DDs with Adaptive Large Neighborhood Search (ALNS), and its use continues to grow organically.\u003c/p\u003e","title":"🖍 Visualizing Decision Diagrams"},{"content":"Note: This post has been updated to work with HiGHS.\nA fun geometry problem to think about is: given two polygons, do they intersect? That is, do they touch on the border or overlap? Does one reside entirely within the other? While this question has obvious applications in computer graphics (see: arcade games of the 1980s), it\u0026rsquo;s also important in areas such as cutting and packing problems.\nThere are a number of way to answer this. In computer graphics, the problem is often approached using a clipping algorithm. This post examines a couple of simpler techniques using linear inequalities and properties of convexity. To simplify the presentation, we assume we\u0026rsquo;re only interested in convex polygons in two dimensions. We also assume that rotation is not an issue. That is, if one of the polygons is rotated, we can simply re-test to see if they overlap.\nProblem Let\u0026rsquo;s say we have two objects: a right triangle and a square. We can place them anywhere inside a larger rectangle. The triangle has vertices:\n$$\\{\\left(x_t, y_t\\right), \\left(x_t, y_t + a\\right), \\left(x_t + a, y_t\\right)\\}$$\nThe square has vertices:\n$$\\{\\left(x_s, y_s\\right), \\left(x_s, y_s + a\\right), \\left(x_s + a, y_s + a\\right), \\left(x_s + a, y_s\\right)\\}$$\nWe will be given $\\left(x_t, y_t\\right)$, $\\left(x_s, y_s\\right)$, and $a$, but we do not know them a priori. We would like to know, for any set of values these can take, whether or not the triangle and square they define intersect.\n$\\left(x_t, y_t\\right)$ and $\\left(x_s, y_s\\right)$ are the offsets of the triangle and square with respect to the bottom left corner of the rectangle. If they are far enough apart in any direction, the two objects do not intersect. The figure below shows such a case, with small gray circles representing $\\left(x_t, y_t\\right)$ and $\\left(x_s, y_s\\right)$.\nHowever, if they are too close in some manner, the objects will either touch or overlap, as shown below.\nhe two polygons can intersect in a few different ways. They may touch on their borders, in which case they will share a single point or line segment. They may overlap such that their intersecting region has nonzero relative interior but each polygon contains points outside the other. Or one of them might live entirely within the other, so that the former is a subset of the latter. Our goal is to determine if any of these cases are true given any $\\left(x_t, y_t\\right)$, $\\left(x_s, y_s\\right)$, and $a$.\nMethod 1. Define the intersecting polygon with linear inequalities The first method we use to detect intersection is based on the fact that our polygons themselves are the intersections of finite numbers of linear inequalities. Instead of defining them based on their vertices, we can equivalently represent them as the set of $\\left(x, y\\right)$ that satisfy a known inequality for each edge.\nLet $S_t$ be the set of points in our triangle. It can be defined as follows. $x$ must be greater than or equal to $x_t$. $y$ must be greater than or equal to $y_t$. And $x + y$ must be left of or lower than the triangle\u0026rsquo;s hypotenuse. There are three sides on the triangle, so we have three inequalities.\n$$ \\begin{array}{rcl} S_t = \\{\\,\\left(x, y\\right) \u0026amp; | \u0026amp; x \\ge x_t,\\\\ \u0026amp; \u0026amp; y \\ge y_t,\\\\ \u0026amp; \u0026amp; x + y \\le x_t + y_t + a \\,\\} \\end{array} $$\nSimilarly, let $S_s$ be the set of points in our square. This set is defined using four inequalities, which are shown in a slightly compacted form.\n$$ \\begin{array}{rcl} S_s = \\{\\,\\left(x, y\\right) \u0026amp; | \u0026amp; x_s \\le x \\le x_s + a,\\\\ \u0026amp; \u0026amp; y_s \\le y \\le y_s + a \\,\\} \\end{array} $$\nFinally, let $S_i = S_t \\cap S_s$ be the set of points that satisfy all seven inequalities.\n$$ \\begin{array}{rcl} S_i = \\{\\,\\left(x, y\\right) \u0026amp; | \u0026amp; x \\ge x_t,\\\\ \u0026amp; \u0026amp; y \\ge y_t,\\\\ \u0026amp; \u0026amp; x + y \\le x_t + y_t + a,\\\\ \u0026amp; \u0026amp; x_s \\le x \\le x_s + a,\\\\ \u0026amp; \u0026amp; y_s \\le y \\le y_s + a \\,\\} \\end{array} $$\nIf $S_i \\ne \\emptyset$, then there must exist some point that satisfies the inequalities of both the triangle and the square. This point resides in both of them, therefore they intersect. If $S_i = \\emptyset$, then there is no such point and they do not intersect.\nMethod 2. Use convex combinations of the polygon vertices Both of our polygons are convex. That is, they contain every convex combination of their vertices. So every point in the triangle, regardless of where it is located, can be represented as a linear combination of ${\\left(x_t, y_t\\right), \\left(x_t + a, y_t\\right), \\left(x_t, y_t + a\\right)}$ where $\\lambda_1, \\lambda_2, \\lambda_3 \\ge 0$ and $\\lambda_1 + \\lambda_2 + \\lambda_3 = 1$.\nWe can define the set $S_t$ equivalently using this concept.\n$$ S_t = \\{\\, \\lambda_1 \\left(\\begin{array}{c} x_t \\\\ y_t \\end{array}\\right) + \\lambda_2 \\left(\\begin{array}{c} x_t + a \\\\ y_t \\end{array}\\right) + \\lambda_3 \\left(\\begin{array}{c} x_t \\\\ y_t + a \\end{array}\\right) \\, | \\\\ \\lambda_1 + \\lambda_2 + \\lambda_3 = 1, \\\\ \\lambda_i \\ge 0, , i = {1, \\ldots, 3 } \\, \\} $$\nSimilarly, the square is defined a the convex combination of its vertices.\n$$ S_s = \\{\\, \\lambda_4 \\left(\\begin{array}{c} x_s \\\\ y_s \\end{array}\\right) + \\lambda_5 \\left(\\begin{array}{c} x_s + a \\\\ y_s \\end{array}\\right) + \\lambda_6 \\left(\\begin{array}{c} x_s \\\\ y_s + a \\end{array}\\right) + \\lambda_7 \\left(\\begin{array}{c} x_s + a \\\\ y_s + a \\end{array}\\right) \\, | \\\\ \\lambda_4 + \\lambda_5 + \\lambda_6 + \\lambda_7 = 1, \\\\ \\lambda_i \\ge 0, , i = {4, \\ldots, 7 } \\, \\} $$\nIf there exists a point inside both the triangle and the square, then it must satisfy both convex combinations. Thus we can define our intersecting set $S_i$ as follows. (This is a little loose with the notation, but I think it makes the point a bit better.)\n$$ \\begin{array}{rl} S_i = \\{\\, \u0026amp; \\\\ \u0026amp; \\lambda_1 \\left(\\begin{array}{c} x_t \\\\ y_t \\end{array}\\right) + \\lambda_2 \\left(\\begin{array}{c} x_t + a \\\\ y_t \\end{array}\\right) + \\lambda_3 \\left(\\begin{array}{c} x_t \\\\ y_t + a \\end{array}\\right) =\\\\ \u0026amp; \\lambda_4 \\left(\\begin{array}{c} x_s \\\\ y_s \\end{array}\\right) + \\lambda_5 \\left(\\begin{array}{c} x_s + a \\\\ y_s \\end{array}\\right) + \\lambda_6 \\left(\\begin{array}{c} x_s \\\\ y_s + a \\end{array}\\right) + \\lambda_7 \\left(\\begin{array}{c} x_s + a \\\\ y_s + a \\end{array}\\right),\\\\ \u0026amp; \\lambda_1 + \\lambda_2 + \\lambda_3 = 1,\\\\ \u0026amp; \\lambda_4 + \\lambda_5 + \\lambda_6 + \\lambda_7 = 1,\\\\ \u0026amp; \\lambda_i \\ge 0, \\, i = {1, \\ldots, 7}\\\\ \\,\\} \u0026amp; \\end{array} $$\nJust as before, if $S_i \\ne \\emptyset$, our polygons intersect.\nCode Both models are pretty easy to implement using an LP Solver. But they look very different. That\u0026rsquo;s because in the first method we\u0026rsquo;re thinking about the problem in terms of inequalities and in the second we\u0026rsquo;re thinking about it in terms of vertices. The code below generates a thousand random instances of the problem and tests that each method produces the same result.\nimport highspy import random def method1(xy_t, xy_s, a): x_t, y_t = xy_t x_s, y_s = xy_s h = highspy.Highs() h.silent() x = h.addVariable() y = h.addVariable() h.addConstrs( x_t \u0026lt;= x \u0026lt;= x_t + a, x_s \u0026lt;= x \u0026lt;= x_s + a, y_t \u0026lt;= y \u0026lt;= y_t + a, y_s \u0026lt;= y \u0026lt;= y_s + a, x + y \u0026lt;= x_t + y_t + a, ) return h def method2(xy_t, xy_s, a): x_t, y_t = xy_t x_s, y_s = xy_s h = highspy.Highs() h.silent() lm = [h.addVariable(lb=0, ub=1) for _ in range(7)] conv_xt = lm[0] * x_t + lm[1] * (x_t + a) + lm[2] * x_t conv_xs = lm[3] * x_s + lm[4] * (x_s + a) + lm[5] * x_s + lm[6] * (x_s + a) conv_yt = lm[0] * y_t + lm[1] * y_t + lm[2] * (y_t + a) conv_ys = lm[3] * y_s + lm[4] * y_s + lm[5] * (y_s + a) + lm[6] * (y_s + a) h.addConstrs( conv_xt == conv_xs, conv_yt == conv_ys, sum(lm[:3]) == 1, sum(lm[3:]) == 1, ) return h if __name__ == \u0026#34;__main__\u0026#34;: problems1 = [] problems2 = [] for _ in range(1000): a = random.random() * 2.5 + 1 x_t = random.random() * 10 y_t = random.random() * 10 x_s = random.random() * 10 y_s = random.random() * 10 problems1.append(method1([x_t, y_t], [x_s, y_s], a)) problems2.append(method2([x_t, y_t], [x_s, y_s], a)) overlap1 = [] for h in problems1: h.solve() overlap1.append(h.getModelStatus()) overlap2 = [] for h in problems2: h.solve() overlap2.append(h.getModelStatus()) assert overlap1 == overlap2 These aren\u0026rsquo;t necessarily the best ways to solve this particular problem, but they are quick and flexible. And they leverage existing solver technology. One downside is that they aren\u0026rsquo;t easy to adapt to certain decision making contexts. That is, we can use them to determine whether objects overlap, but not to force objects not to overlap. In the next post, we\u0026rsquo;ll go over another tool from computational geometry that allows us to embed decisions about the relative locations of objects in our models.\nExercises We assumed convex polygons in this presentation. How might one extend the model to work on non-convex polygons? What problems does this introduce? The two methods shown above are equivalent. How can this be proven? This post only answers the question of whether two convex polygons intersect. Devise models for determining if they only touch, or if one is a subset of the other. ","permalink":"https://ryanjoneil.dev/posts/2015-09-27-detecting-polygon-intersections/","summary":"\u003cp\u003e\u003cem\u003eNote: This post has been updated to work with HiGHS.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eA fun geometry problem to think about is: given two polygons, do they intersect? That is, do they touch on the border or overlap? Does one reside entirely within the other? While this question has obvious applications in computer graphics \u003cem\u003e(see: arcade games of the 1980s)\u003c/em\u003e, it\u0026rsquo;s also important in areas such as \u003ca href=\"https://www.euro-online.org/websites/esicup/\"\u003ecutting and packing problems\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eThere are a number of way to answer this. In computer graphics, the problem is often approached using a \u003ca href=\"https://en.wikipedia.org/wiki/Clipping_(computer_graphics)\"\u003eclipping algorithm\u003c/a\u003e. This post examines a couple of simpler techniques using linear inequalities and properties of convexity. To simplify the presentation, we assume we\u0026rsquo;re only interested in convex polygons in two dimensions. We also assume that rotation is not an issue. That is, if one of the polygons is rotated, we can simply re-test to see if they overlap.\u003c/p\u003e","title":"👾 Detecting Polygon Intersections"},{"content":"Note: This post was originally written using Julia v0.2, GLPK, and Hedonometer data through 2014. It has been updated to use Julia v1.11, HiGHS, and data through May 26, 2025.\nHedonometer popped onto my radar a couple weeks ago. It\u0026rsquo;s a nifty project, attempting to convert samples of words found in the Twitter Gardenhose feed into a time series of happiness.\nWhile I\u0026rsquo;m not a computational social scientist, I must say the data does have a nice intuitive quality to it. There are obvious trends in happiness associated with major holidays, days of the week, and seasons. It seems like the sort of data that could be decomposed into trends based on those various components. The Hedonometer group has, of course, done extensive analyses of their own data which you can find on their papers page.\nThis post examines another approach. It follows the structure of Robert Vanderbei\u0026rsquo;s excellent \u0026ldquo;Local Warming\u0026rdquo; project to separate out the Hedonometer averages into daily, seasonal, solar, and day-of-the-week trends. We\u0026rsquo;ll be using Julia with JuMP and HiGHS for linear optimization, Gadfly for graphing, and a few other libraries. If you haven\u0026rsquo;t installed Julia, first do that. Missing packages should be installed when you import them.\nData Hedonometer provides an API which we can use to pull daily happiness data in JSON format. We can request specific date rates, or leave the dates off to retrieve the full data set.\nWe use the HTTP, JSON3, and DataFrame packages to read the Hedonometer data into a data frame. Calls to parse convert strings to date and float types. Finally, we sort the data frame in place by ascending date.\nusing DataFrames, HTTP, JSON3 url = \u0026#34;https://hedonometer.org/api/v1/happiness/?format=json\u0026amp;timeseries__title=en_all\u0026#34; response = HTTP.get(url) df = DataFrame(JSON3.read(response.body)[:objects]) df.date = parse.(Date, df.date) df.happiness = parse.(Float64, df.happiness) sort!(df, :date) 5367×4 DataFrame Row │ date frequency happiness timeseries │ Date Int64 Float64 String ──────┼───────────────────────────────────────────────────────── 1 │ 2008-09-09 2009276 6.042 /api/v1/timeseries/3/ 2 │ 2008-09-10 5263723 6.028 /api/v1/timeseries/3/ 3 │ 2008-09-11 5298101 6.02 /api/v1/timeseries/3/ 4 │ 2008-09-12 5351503 6.028 /api/v1/timeseries/3/ 5 │ 2008-09-13 5153710 6.035 /api/v1/timeseries/3/ 6 │ 2008-09-14 5170835 6.04 /api/v1/timeseries/3/ 7 │ 2008-09-15 5553350 6.004 /api/v1/timeseries/3/ 8 │ 2008-09-16 5421531 6.011 /api/v1/timeseries/3/ 9 │ 2008-09-17 5380008 6.02 /api/v1/timeseries/3/ 10 │ 2008-09-18 5591645 6.034 /api/v1/timeseries/3/ 11 │ 2008-09-19 5695345 6.063 /api/v1/timeseries/3/ 12 │ 2008-09-20 5291298 6.081 /api/v1/timeseries/3/ 13 │ 2008-09-21 5363113 6.066 /api/v1/timeseries/3/ ⋮ │ ⋮ ⋮ ⋮ ⋮ 5356 │ 2023-05-15 170487394 6.026 /api/v1/timeseries/3/ 5357 │ 2023-05-16 174192397 6.021 /api/v1/timeseries/3/ 5358 │ 2023-05-17 186034773 6.016 /api/v1/timeseries/3/ 5359 │ 2023-05-18 189092448 6.03 /api/v1/timeseries/3/ 5360 │ 2023-05-19 179957496 6.026 /api/v1/timeseries/3/ 5361 │ 2023-05-20 167540306 6.044 /api/v1/timeseries/3/ 5362 │ 2023-05-21 167091303 6.031 /api/v1/timeseries/3/ 5363 │ 2023-05-22 171660415 6.03 /api/v1/timeseries/3/ 5364 │ 2023-05-23 166443756 6.033 /api/v1/timeseries/3/ 5365 │ 2023-05-24 183687637 6.025 /api/v1/timeseries/3/ 5366 │ 2023-05-25 170265817 6.014 /api/v1/timeseries/3/ 5367 │ 2023-05-26 180664806 6.032 /api/v1/timeseries/3/ 5342 rows omitted Note that the data does seem to be missing a few days. This means we need to compute day offsets in our model instead of using row indices.\nlast(df.date) - first(df.date) nrow(df) 5372 days 5367 Now lets take a look at happiness over time, as computed by Hedonometer.\nfunction plot_happiness(df::DataFrame) plot( df, x=:date, y=:happiness, color=[colorant\u0026#34;darkblue\u0026#34;], Guide.xlabel(\u0026#34;Date\u0026#34;), Guide.ylabel(\u0026#34;Happiness\u0026#34;), Coord.cartesian( xmin=minimum(df.date), xmax=maximum(df.date) ), Theme( key_position=:none, line_width=0.75px, background_color=colorant\u0026#34;white\u0026#34; ), Geom.line ) end plot_happiness(df) The data looks right, so we\u0026rsquo;re off to a good start. Now we have to think about what sort of components we believe are important factors to this index. We\u0026rsquo;ll start with the same ones as in the Vanderbei model:\nA linear happiness trend describing how our overall happiness changes over time. Seasonal trends accounting for mood changes with weather. Solar cycle trends. We\u0026rsquo;ll add to this weekly trends, as zooming into the data shows we tend to be happier on the weekends than on work days. In the next section we\u0026rsquo;ll build a model to separate out the effects of these trends on the Hedonometer index.\nModel Vanderbei\u0026rsquo;s model analyzes daily temperature data for a particular location using least absolute deviations (LAD). This is similar to the well-known least squares approach, but while the latter penalizes the model quadratically more for bigger errors, the former does not. In mathematical notation, the least squares model takes in a known $m \\times n$ matrix $A$ and $m \\times 1$ vector $y$ of observed data, then searches for a vector $x$ such that $Ax = \\hat{y}$ and $\\sum_i \\left\\lVert y_i - \\hat{y}_i \\right\\rVert_2^2$ is minimized.\nThe LAD model is similar in that it takes in the same data, but instead of minimizing the sum of the squared $L^2$ norms, it minimizes the sum of the $L^1$ norms. Thus we penalize our model using simply the absolute values of its errors instead of their squares. This makes the LAD model more robust, that is, less sensitive to outliers in our input data.\nUsing a robust model with this data set makes sense because it clearly contains a lot of outliers. While some of them, such as December 25th, may be recurrent, we\u0026rsquo;re going to ignore that detail for now. After all, not every day is Christmas.\nWe formulate our model below using JuMP with the HiGHS solver. The code works by defining a set of variables called coefficients that will converge to optimal values for $x$. For each observation we compute a row of the $A$ matrix that has the following components:\nLinear daily trend ($a_1$ = day number in the data set) Seasonal variation: $\\cos(2, \\pi, a_1 / 365.25)$ and $\\sin(2, \\pi, a_1 / 365.25)$ Solar cycle variation: $\\cos(2, \\pi, a_1 / (10.66 \\times 365.25))$ and $\\sin(2, \\pi, a_1 / (10.66 \\times 365.25))$ Weekly variation: $\\cos(2, \\pi, a_1 / 7)$ and $\\sin(2, \\pi, a_1 / 7)$ We then add a linear variable representing the residual, or error, of the fitted model for each observation. Constraints enforce that these variables always take the absolute values of those errors.\nMinimizing the sum of those residuals gives us a set of eight coefficients for the model. We return these and a function that predicts the happiness level for an offset from the first data record. (Note that the first record appears to be from Wednesday, September 9, 2008.)\nfunction train(df::DataFrame) m = Model(HiGHS.Optimizer) # Define a linear variable for each of our regression coefficients. # Note that by default, JuMP variables are unrestricted in sign. @variable(m, x[1:8]) # Residuals are the absolute values of the error comparing our # observed and fitted values. # # If alpha - beta = residual and alpha, beta \u0026gt;= 0, then we can min # alpha + beta to get the absolute value of the residual. @variable(m, alpha[1:nrow(df)] \u0026gt;= 0) @variable(m, beta[1:nrow(df)] \u0026gt;= 0) # This builds rows for determining fitted values. The first value is # 1 since it is multiplied by our our trend line\u0026#39;s offset. The other # values correspond to the trends described above. Sinusoidal elements # have two variables with sine and cosine terms. function constants(i) [ 1, # Offset i, # Daily trend cos(2pi * i / 365.25), # Seasonal variation sin(2pi * i / 365.25), # cos(2pi * i / (10.66 * 365.25)), # Solar cycle variation sin(2pi * i / (10.66 * 365.25)), # cos(2pi * i / 7), # Weekly variation sin(2pi * i / 7) # ] end # This builds a linear expression as the dot product of a row\u0026#39;s # constants and the coefficient variables. expression(i) = dot(constants(i), x) start = minimum(df.date) for (i, row) in enumerate(eachrow(df)) days = (row.date - start).value @constraint(m, alpha[i] - beta[i] == expression(days) - row.happiness) end # Minimize the total sum of these residuals. @objective(m, Min, sum(alpha + beta)) optimize!(m) # Return the model coefficients and a function that predicts happiness # for a given day, by index from the start of the data set. coefficients = value.(x) # And we would like our model to work over vectors. predict(i) = dot(constants(i), coefficients) return coefficients, predict end coefficients, predictor = train(df) coefficients This gives us the optimal value of $x$. The second value is the change in happiness per day. We can see from this that there does seem to be a small negative trend.\n8-element Vector{Float64}: 6.056241434337748 -1.2891297798930273e-5 -0.004956377505740697 0.00933036370632761 -0.014231170085464805 -0.01043882249306958 -0.01121031443373725 -0.003886782963711294 We can call our predictor function to obtain the fitted happiness level for any day number starting from September 9, 2008.\npredictor(1000) 6.0206559094198635 Similarly, we can compute a range of fitted happiness values.\npredictor.(1000:1009) 10-element Vector{Float64}: 6.0206559094198635 6.0133111627737295 6.01441070655176 6.0230645068990105 6.032696070016563 6.0359946962536455 6.030420605574473 6.020117462633424 6.012792131062921 6.013911265631212 Bootstrapping We now have a set of coefficients and a predictive model. That\u0026rsquo;s nice, but we\u0026rsquo;d like to have some sense of a reasonable range on our model\u0026rsquo;s coefficients. For instance, how certain are we that our daily trend is really even positive? To deal with these uncertainties, we use a method called bootstrapping.\nBootstrapping involves building fake observed data based on our fitted model and its associated errors. We then fit the model to our fake data to determine new coefficients. If we repeat this enough times, we may be able to generate decent confidence intervals around our model coefficients.\nFirst step: compute the errors between the observed and fitted data. We\u0026rsquo;ll construct a new data frame that contains everything we need to construct fake data.\n# Compute fitted data corresponding to our observations and their associated errors. start = minimum(df.date) fitted = DataFrame( date=df.date, happiness=predictor.(map(d -\u0026gt; d.value, df.date .- start)) ) fitted.error = fitted.happiness - df.happiness 4×7 DataFrame Row │ variable mean min median max nmissing eltype │ Symbol Union… Any Any Any Int64 DataType ─────┼────────────────────────────────────────────────────────────────────────────── 1 │ date 2008-09-09 2016-01-19 2023-05-26 0 Date 2 │ observed 6.01506 5.628 6.016 6.376 0 Float64 3 │ fitted 6.01858 5.95996 6.0245 6.06771 0 Float64 4 │ error 0.00351745 -0.327932 0.0 0.353297 0 Float64 Note that the median for our errors is exactly zero. This is a good sign.\nNow we build a function that creates a fake input data set using the fitted values with randomly selected errors. That is, for each observation, we add a randomly selected error with replacement to its corresponding fitted value. Once we\u0026rsquo;ve done that for every observation, we have a complete fake data set.\nfunction fake_data(fitted::DataFrame) indices = rand(1:nrow(fitted), nrow(fitted)) DataFrame( date=fitted.date, happiness=fitted.happiness + fitted.error[indices] ) end Le\u0026rsquo;ts plot some fake data to see if it looks similar.\nplot_happiness(fake_data(fitted)) Visually, the plot of an arbitrary fake data set looks a lot like our original data, but not exactly.\nNow we generate 199 fake data sets and run them through our optimization function above. This generates 100 sets of model coefficients and then computes $2\\sigma$ confidence intervals around them.\nThe following code took a few minutes on my machine. If you\u0026rsquo;re intent on running it yourself, you may want to get some coffee in the meantime.\nusing HypothesisTests coefficient_data = [train(fake_data(fitted))[1] for _ in 1:10] confidence_intervals = map( i -\u0026gt; confint(OneSampleTTest([c[i] for c in coefficient_data])), 1:length(coefficients) ) confidence_intervals 8-element Vector{Tuple{Float64, Float64}}: (6.055873073964057, 6.056558575993962) (-1.304766046882304e-5, -1.2820860816666347e-5) (-0.005375474015660127, -0.004919187929729383) (0.009063427594182353, 0.009549696262121937) (-0.01457854811479927, -0.014061680921618894) (-0.010625952275843275, -0.010121904880077722) (-0.011441598179220986, -0.010964483538449823) (-0.004290285203751687, -0.003833687701331523) Results From the above output we can see that appear to be trending slightly less happy over time, with a daily trend of -0.00001293 in Hedonometer units and a 95% confidence interval on that trend of approximately -0.00001305, and -0.00001282. Bummer.\nNow we take a quick look at our model output. First, we plot the fitted happiness values for the same time period as the observed data. We can see that this resembles the same general trend minus the outliers. The width of the curve is due to weekly variation.\nplot_happiness(fitted) Now we take a look at what a typical week looks like in terms of its effect on our happiness. As September 9, 2008 was a Wednesday, we index Sunday starting at 6.\ndaily(i) = coefficients[6]*cos(2pi*i/7) + coefficients[7]*sin(2pi*i/7) plot( x = [\u0026#34;Sun\u0026#34;, \u0026#34;Mon\u0026#34;, \u0026#34;Tues\u0026#34;, \u0026#34;Wed\u0026#34;, \u0026#34;Thurs\u0026#34;, \u0026#34;Fri\u0026#34;, \u0026#34;Sat\u0026#34;], y = map(daily, [6, 7, 1, 2, 3, 4, 5]), Guide.xlabel(\u0026#34;Day of the Week\u0026#34;), Guide.ylabel(\u0026#34;Happiness\u0026#34;), Geom.line, Geom.point ) The resulting graph highlights what I think we all already know about the work week.\nThat\u0026rsquo;s it for this analysis. We\u0026rsquo;ve learned that, for the being at least, we seem to be trending less happy. When we initially did this analysis, almost 11 years ago, the opposite was true. The fitted data shows pretty clearly when that trend took a stark turn down.\nExercises The particularly ambitious reader may find the following exercises interesting.\nThe code that reruns the model using randomly constructed fake data is eligible for parallelization. Rewrite the list comprehension that calls train so it runs concurrently. According to Google, the lunar cycle is approximately 29.53 days. Add parameters for this to the LAD model above. Does it make sense to include the lunar cycle in the model? In other words, are we lunatics? Some of the happier days in the Hedonometer data, such as Christmas, are recurring, and therefore not really outliers. How might one go about accounting for the effects of those days? Try the same analysis using a least-squares model. Which model is better for this data? Resources are-we-getting-happier.jl contains all the code in this post hedonometer.json contains the Hedonometer data as of May 16, 2025 ","permalink":"https://ryanjoneil.dev/posts/2014-07-18-are-we-getting-happier/","summary":"\u003cp\u003e\u003cem\u003eNote: This post was originally written using Julia v0.2, GLPK, and Hedonometer data through 2014. It has been updated to use Julia v1.11, HiGHS, and data through May 26, 2025.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003ca href=\"https://hedonometer.org/\"\u003eHedonometer\u003c/a\u003e popped onto my radar a couple weeks ago. It\u0026rsquo;s a nifty project, attempting to convert samples of words found in the Twitter Gardenhose feed into a time series of happiness.\u003c/p\u003e\n\u003cp\u003eWhile I\u0026rsquo;m not a computational social scientist, I must say the data does have a nice intuitive quality to it. There are obvious trends in happiness associated with major holidays, days of the week, and seasons. It seems like the sort of data that could be decomposed into trends based on those various components. The Hedonometer group has, of course, done extensive analyses of their own data which you can find on \u003ca href=\"https://www.hedonometer.org/papers.html\"\u003etheir papers page\u003c/a\u003e.\u003c/p\u003e","title":"😁 Are We Getting Happier?"},{"content":"In the previous post, we considered preprocessing for the vehicle routing problem where the vehicles have different starting locations. Our goal was to create potentially overlapping regions for the entire US which we could later use for route construction. We defined these regions using all 5-digit zip codes in the continental US for which one of our regional headquarters is the closest, or one of $n$ closest, headquarters in terms of Euclidean distance. The resulting regions gave us some flexibility in terms of how much redundancy we allow in our coverage of the country.\nThis post refines those regions, replacing the Euclidean distance with a more realistic metric: estimated travel time. Doing this should give us a better sense of how much space a given driver can actually cover. It should also divide the country up more equitably among our drivers.\nOur approach here will be similar to that of the last post, but instead of ranking our headquarter-zip pairs by Euclidean distance, we\u0026rsquo;ll rank them by estimated travel time. The catch is that, while the former is easy to compute using the SpatialTools library, we have to request the latter from a third party. In this post, we\u0026rsquo;ll use the MapQuest Route Matrix, since it provides estimates based on OpenStreetMap data to us for free, and doesn\u0026rsquo;t cap the number of requests we can make.\nTo do this we\u0026rsquo;re going to need a lot of point estimates for location-to-location travel times. In fact, building a full data set for replacing our Euclidean distance ranks would require $37,341 \\times 25 = 933,525$ travel time estimates. That\u0026rsquo;s a bit prohibitive. The good news is we don\u0026rsquo;t need to all the data points unless we generate 25 levels of redundancy. We can just request enough travel time estimates to make us reasonably certain that we\u0026rsquo;ve got all the necessary data. In the last post we generated regions for 1, 2, and 3 levels of redundancy, so here we\u0026rsquo;ll get travel times for the 10 closest headquarters to each zip code, and take the leap of faith that the closest 3 headquarters by travel time for each zip will be among the 10 closest by Euclidean distance.\nLet\u0026rsquo;s assume that you have just executed the code from the last post and have its variables in your current scope.1 First, we define some constants we\u0026rsquo;re going to need in order to make MapQuest requests.\n# Define some constants for making requests to MapQuest and determining # when to save and what to request. library(RCurl) library(rjson) library(utils) MAPQUEST_API_KEY \u0026lt;- \u0026#39;YOUR KEY HERE\u0026#39; MAPQUEST_API_URL \u0026lt;- \u0026#39;http://www.mapquestapi.com/directions/v2/routematrix?key=%s\u0026amp;json=%s\u0026#39; ZIPS_BETWEEN_SAVE \u0026lt;- 250 HQ_RANK_MIN \u0026lt;- 1 # Min/max distance ranks for time estimates HQ_RANK_MAX \u0026lt;- 10 Now we create a data frame to hold our HQ-to-zip travel estimates. The rows correspond to zip codes and the columns correspond to our headquarter locations. We initialize the data frame to contain no estimates and write it to a CSV file. Since it will take on the order of days for us to fill this file in, we\u0026rsquo;re going to write it out and read it back in periodically. That way we can pick up where we left off by simply rerunning the code in case of an error or loss of network connectivity.\n# Write out a blank file containing our time estimates. TIME_CSV_PATH \u0026lt;- \u0026#39;hqs_to_zips_time.csv\u0026#39; if (!file.exists(TIME_CSV_PATH)) { # Clear out everything except row and column names. empty \u0026lt;- as.data.frame(matrix(nrow=nrow(zips_deduped), ncol=nrow(largest_cities)+1)) names(empty) \u0026lt;- c(\u0026#39;zip\u0026#39;, largest_cities$name) empty$zip \u0026lt;- zips_deduped$zip # This represents our current state. write.csv(empty, TIME_CSV_PATH, row.names=F) } # Read in our current state in case we are starting over. hqs_to_zips_time \u0026lt;- read.csv(TIME_CSV_PATH) hqs_to_zips_time$zip \u0026lt;- sprintf(\u0026#39;%05d\u0026#39;, hqs_to_zips_time$zip) # Sanity check: If we have any times = 0, set them to NA so that we re-request them. hqs_to_zips_time[hqs_to_zips_time \u0026lt;= 0] \u0026lt;- NA With that file created, we can start making requests to MapQuest\u0026rsquo;s Route Matrix. For each zip code, we are going to request travel times for its 10 closest HQs. We\u0026rsquo;ll save our time estimates data frame every 250 zip codes. Also, we\u0026rsquo;re going to randomize the order of the zip codes so we fill out our data set more evenly as we go. That way we can generate maps during the process or otherwise inspect the data as we go.\n# Now we start requesting travel times from MapQuest. requests_until_save \u0026lt;- ZIPS_BETWEEN_SAVE col_count \u0026lt;- ncol(hqs_to_zips_time) # Randomize the zip code order so we fill in the map uniformly as we get more data. # This will enable us to check on our data over time and make sure it looks right. for (zip_idx in sample(1:nrow(zips_deduped))) { z \u0026lt;- zips_deduped$zip[zip_idx] z_lat \u0026lt;- zips_deduped$latitude[zip_idx] z_lon \u0026lt;- zips_deduped$longitude[zip_idx] # Find PODs for this zip that are in the rank range. which_hqs \u0026lt;- which( hqs_to_zips_rank[,zip_idx] \u0026gt;= HQ_RANK_MIN \u0026amp; hqs_to_zips_rank[,zip_idx] \u0026lt;= HQ_RANK_MAX ) # We\u0026#39;re only interested in records that aren\u0026#39;t filled in yet. na_pods \u0026lt;- is.na(hqs_to_zips_time[zip_idx, which_hqs+1]) if (length(hqs_to_zips_time[zip_idx,2:col_count][na_pods]) \u0026lt; 1) { next } # Request this block of PODs and fill them all in. print(sprintf(\u0026#39;requesting: zip=%s rank=[%d-%d]\u0026#39;, z, HQ_RANK_MIN, HQ_RANK_MAX)) # Construct a comma-delimited string of lat/lons containing the locations of our # HQs We will use this for our MapQuest requests below: for each zip code, we # make one request for its distance to every HQ in our range. hq_locations \u0026lt;- paste( sprintf(\u0026#34;\u0026#39;%f,%f\u0026#39;\u0026#34;, largest_cities$lat[which_hqs], largest_cities$long[which_hqs]), collapse = \u0026#39;, \u0026#39; ) # TODO: make sure we are requesting from location 1 to 2:n only request_json \u0026lt;- URLencode(sprintf( \u0026#34;{allToAll: false, locations: [\u0026#39;%f,%f\u0026#39;, %s]}\u0026#34;, z_lat, z_lon, hq_locations )) url \u0026lt;- sprintf(MAPQUEST_API_URL, MAPQUEST_API_KEY, request_json) result \u0026lt;- fromJSON(getURL(url)) # If we get back 0s, they should be NA. Otherwise they\u0026#39;ll mess up our # rankings and region drawing later. result$time[result$time \u0026lt;= 0] \u0026lt;- NA hqs_to_zips_time[zip_idx, which_hqs+1] \u0026lt;- result$time[2:length(result$distance)] # See if we should save our current state. requests_until_save \u0026lt;- requests_until_save - 1 if (requests_until_save \u0026lt; 1) { print(\u0026#39;saving current state\u0026#39;) write.csv(hqs_to_zips_time, TIME_CSV_PATH, row.names=F) requests_until_save \u0026lt;- ZIPS_BETWEEN_SAVE } } # Final save once we are done. write.csv(hqs_to_zips_time, TIME_CSV_PATH, row.names=F) Now we generate our ranks based on travel time instead of distance. We have to be a bit more careful this time, as we might have incomplete data. We don\u0026rsquo;t want pairs with travel time of NA showing up in the rankings.\n# Rank HQs by their distance to each unique zip code location. hqs_to_zips_rank2 \u0026lt;- matrix(nrow=nrow(largest_cities), ncol=nrow(zips_deduped)) for (i in 1:nrow(zips_deduped)) { not_na \u0026lt;- !is.na(hqs_to_zips_time[i,2:ncol(hqs_to_zips_time)]) hqs_to_zips_rank2[not_na,i] \u0026lt;- rank(hqs_to_zips_time[i,2:ncol(hqs_to_zips_time)][not_na], ties.method=\u0026#39;first\u0026#39;) } We build our map for the Dallas, TX headquarter the same way as before.\n# Now we draw regions for which Dallas is one of the closest 3 HQs by time. hq_idx \u0026lt;- which(largest_cities$name == \u0026#39;Dallas TX\u0026#39;) redundancy_levels \u0026lt;- c(3, 2, 1) fill_alpha \u0026lt;- c(0.15, 0.30, 0.45) map(\u0026#39;state\u0026#39;) for (i in 1:length(redundancy_levels)) { # Find every zip for which this HQ is within n in time rank. within_n \u0026lt;- hqs_to_zips_rank2[hq_idx,] \u0026lt;= redundancy_levels[i] # Convex hull of zip code points. hull_order \u0026lt;- chull( zips_deduped$longitude[within_n], zips_deduped$latitude[within_n] ) hull_x \u0026lt;- zips_deduped$longitude[within_n][hull_order] hull_y \u0026lt;- zips_deduped$latitude[within_n][hull_order] polygon(hull_x, hull_y, border=\u0026#39;blue\u0026#39;, col=rgb(0, 0, 1, fill_alpha[i])) } # The other HQs. other_hqs = 1:nrow(largest_cities) != hq_idx points( largest_cities$long[other_hqs], largest_cities$lat[other_hqs], pch = 21, bg = rgb(0.4, 0.4, 0.4, 0.6), col = \u0026#39;black\u0026#39;, cex = 1.5 ) # This HQ. points( largest_cities$long[hq_idx], largest_cities$lat[hq_idx], pch = 21, bg = rgb(1, 0, 0, .85), col = \u0026#39;black\u0026#39;, cex = 1.5 ) This shows the regions for which Dallas is among the closest headquarters for 1, 2, and 3 level of redundancy. Compare this map to the one from the previous post, and you\u0026rsquo;ll see that it conforms better to the highway system. For instance, it takes into account I-20 which moves east and west across Texas, instead of pushing up into the Dakotas.\nAnd now our map of the US, showing the regions for each HQ as the set of zip codes for which it is the closest.\n# Map of regions where every zip is served only by its closest HQ. map(\u0026#39;usa\u0026#39;) for (hq_idx in 1:nrow(largest_cities)) { # Find every zip for which this HQ is the closest. within_1 \u0026lt;- hqs_to_zips_rank2[hq_idx,] == 1 within_1[is.na(within_1)] \u0026lt;- F # Convex hull of zip code points. hull_order \u0026lt;- chull( zips_deduped$longitude[within_1], zips_deduped$latitude[within_1] ) hull_x \u0026lt;- zips_deduped$longitude[within_1][hull_order] hull_y \u0026lt;- zips_deduped$latitude[within_1][hull_order] polygon( hull_x, hull_y, border = \u0026#39;black\u0026#39;, col = rgb(0, 0, 1, 0.25) ) } # All HQs points( largest_cities$long, largest_cities$lat, pch = 21, bg = rgb(1, 0, 0, .75), col = \u0026#39;black\u0026#39;, cex = 1.5 ) This gives us our new map. If we compare this with the original, it should better reflect the topology of the highway system. It also looks a bit less jagged.\nExercises for the reader:\nSome of these regions overlap, even though they are supposed to be only composed of zip codes for which a given HQ is the closest. Why is that? Say we want to limit our driver to given maximum travel times. Based on our data from MapQuest, draw concentric circles representing approximate 3, 5, and 7 hour travel time regions. If you need it, you can find that code here.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://ryanjoneil.dev/posts/2014-06-27-preprocessing-for-routing-problems-part-2/","summary":"\u003cp\u003eIn the \u003ca href=\"../2014-05-28-preprocessing-for-routing-problems-part-1/\"\u003eprevious post\u003c/a\u003e, we considered preprocessing for the vehicle routing problem where the vehicles have different starting locations. Our goal was to create potentially overlapping regions for the entire US which we could later use for route construction. We defined these regions using all 5-digit zip codes in the continental US for which one of our regional headquarters is the closest, or one of $n$ closest, headquarters in terms of Euclidean distance. The resulting regions gave us some flexibility in terms of how much redundancy we allow in our coverage of the country.\u003c/p\u003e","title":"🗺️ Preprocessing for Routing Problems - Part 2"},{"content":"Consider an instance of the vehicle routing problem in which we have drivers that are geographically distributed, each in a unique location. Our goal is to deliver goods or services to a set of destinations at the lowest cost. It does not matter to our customers which driver goes to which destination, so long as the deliveries are made.\nOne can think of this problem as a collection of travelling salesman problems, where there are multiple salespeople in different locations and a shared set of destinations. We attempt to find the minimum cost schedule for all salespeople that visits all destinations, where each salesman can technically go anywhere.\nWe believe that sending a driver farther will result in increased cost. But, given a particularly good tour, we might do that anyway. On the other hand, there are plenty of assignments we would never consider. It would be madness to send a driver from Los Angeles to New York if we already have another person stationed near there. Thus there are a large number of scenarios that may be possible, but that we will never pursue.\nOur ultimate goal is to construct a model that finds an optimal (or near-optimal) schedule. Before we get to that, we have a bit of preprocessing to do. We would like to create regions for our drivers that make some bit of sense, balancing constraints on travel time with redundant coverage of our customers. Once we have these regions, we will know where we can allow our drivers to go in the final schedule.\nLet\u0026rsquo;s get started in R. We\u0026rsquo;ll assume that we have drivers stationed at our regional headquarters in the 25 largest US cities by population. We assume that every possible customer address will be in some five digit zip code in the continental US. We pull this information out of the standard R data sets and pare down to only unique locations, fixing a couple errors in the data along the way.\nlibrary(datasets) library(zipcode) data(zipcode) # Alexandria, VA is not in Normandy, France. zipcode[zipcode$zip==\u0026#39;22350\u0026#39;, c(\u0026#39;latitude\u0026#39;, \u0026#39;longitude\u0026#39;)] \u0026lt;- c(38.863930, -77.055547) # New York City, NY is not in Kyrgyzstan. zipcode$longitude[zipcode$zip==\u0026#39;10200\u0026#39;] \u0026lt;- -zipcode$longitude[zipcode$zip==\u0026#39;10200\u0026#39;] # Pare down to zip codes in the continental US. states_continental \u0026lt;- state.abb[!(state.abb %in% c(\u0026#39;AK\u0026#39;, \u0026#39;HI\u0026#39;))] zips_continental \u0026lt;- subset(zipcode, state %in% states_continental) zips_deduped \u0026lt;- zips_continental[!duplicated(zips_continental[, c(\u0026#39;latitude\u0026#39;, \u0026#39;longitude\u0026#39;)]), ] # Geographic information for top 25 cities in the country. library(maps) data(us.cities) largest_cities \u0026lt;- subset( us.cities[order(us.cities$pop, decreasing=T),][1:25,], select = c(\u0026#39;name\u0026#39;, \u0026#39;lat\u0026#39;, \u0026#39;long\u0026#39;) ) With this information we can get some sense of what we\u0026rsquo;re up against. We generate a map off all the zip code locations in blue and our driver locations in red.\n# Plot our corporate headquarters and every unique zip code location. map(\u0026#39;state\u0026#39;) points(zips_deduped$longitude, zips_deduped$latitude, pch=21, col=rgb(0, 0, 1, .5), cex=0.1) points(largest_cities$long, largest_cities$lat, pch=21, bg=rgb(1, 0, 0, .75), col=\u0026#39;black\u0026#39;, cex=1.5) So how do we go about assigning zip codes to our drivers? One option is to draw circles of a given radius around our drivers and increase that radius until we have the coverage we need.\nOn second thought, that doesn\u0026rsquo;t work so well. By the time we have large enough radius, there will be so much overlap the assignments won\u0026rsquo;t make much sense. It would be better if we started by assigning each zip code to the driver that is physically closest. We could then start introducing redundancy into our data by adding the second closest driver, and so on.\n# Euclidean distance from each HQ to each zip code. library(SpatialTools) zips_to_hqs_dist \u0026lt;- dist2( matrix(c(zips_deduped$longitude, zips_deduped$latitude), ncol=2), matrix(c(largest_cities$long, largest_cities$lat), ncol=2) ) # Rank HQs by their distance to each unique zip code location. hqs_to_zips_rank \u0026lt;- matrix(nrow=nrow(largest_cities), ncol=nrow(zips_deduped)) for (i in 1:nrow(zips_deduped)) { hqs_to_zips_rank[,i] \u0026lt;- rank(zips_to_hqs_dist[i,], ties.method=\u0026#39;first\u0026#39;) } Let\u0026rsquo;s see what this looks like on the map. The following shows what the region for the Dallas, TX driver would be if she were only allowed to visit zip codes for which she is the closest, second closest, and third closest. We map these as polygons using the convex hull of their respective zip code locations.\n# Now we draw regions for which Dallas is one of the closest 3 HQs. hq_idx \u0026lt;- which(largest_cities$name == \u0026#39;Dallas TX\u0026#39;) redundancy_levels \u0026lt;- c(3, 2, 1) fill_alpha \u0026lt;- c(0.15, 0.30, 0.45) map(\u0026#39;state\u0026#39;) for (i in 1:length(redundancy_levels)) { # Find every zip for which this HQ is within n in distance rank. within_n \u0026lt;- hqs_to_zips_rank[hq_idx,] \u0026lt;= redundancy_levels[i] # Convex hull of zip code points. hull_order \u0026lt;- chull( zips_deduped$longitude[within_n], zips_deduped$latitude[within_n] ) hull_x \u0026lt;- zips_deduped$longitude[within_n][hull_order] hull_y \u0026lt;- zips_deduped$latitude[within_n][hull_order] polygon(hull_x, hull_y, border=\u0026#39;blue\u0026#39;, col=rgb(0, 0, 1, fill_alpha[i])) } # The other HQs. other_hqs = 1:nrow(largest_cities) != hq_idx points( largest_cities$long[other_hqs], largest_cities$lat[other_hqs], pch = 21, bg = rgb(0.4, 0.4, 0.4, 0.6), col = \u0026#39;black\u0026#39;, cex = 1.5 ) # This HQ. points( largest_cities$long[hq_idx], largest_cities$lat[hq_idx], pch = 21, bg = rgb(1, 0, 0, .85), col = \u0026#39;black\u0026#39;, cex = 1.5 ) This makes a bit more sense. If we enforce a redundancy level of 1, then every zip code has exactly one person assigned to it. As we increase that redundancy level, we have more options in terms of driver assignment. And our optimization model will grow correspondingly in size.\nThe following produces a map of all our regions where each zip code is served only by its closest driver.\n# Map of regions where every zip is served only by its closest HQ. map(\u0026#39;usa\u0026#39;) for (hq_idx in 1:nrow(largest_cities)) { # Find every zip for which this HQ is the closest. within_1 \u0026lt;- hqs_to_zips_rank[hq_idx,] == 1 within_1[is.na(within_1)] \u0026lt;- F # Convex hull of zip code points. hull_order \u0026lt;- chull( zips_deduped$longitude[within_1], zips_deduped$latitude[within_1] ) hull_x \u0026lt;- zips_deduped$longitude[within_1][hull_order] hull_y \u0026lt;- zips_deduped$latitude[within_1][hull_order] polygon( hull_x, hull_y, border = \u0026#39;black\u0026#39;, col = rgb(0, 0, 1, 0.25) ) } # All HQs points( largest_cities$long, largest_cities$lat, pch = 21, bg = rgb(1, 0, 0, .75), col = \u0026#39;black\u0026#39;, cex = 1.5 ) This is a good start. Our preprocessing step gives us a reasonable level of control over the assignments of drivers before we begin optimizing. So what\u0026rsquo;s missing?\nOne immediately apparent failure is that these regions are based on Euclidean distance. Travel time is not a simple function of that. It would be much better if we could create regions using estimated time, drawing them based on topology of the highway system. We\u0026rsquo;ll explore techniques for doing so in the next post.\n","permalink":"https://ryanjoneil.dev/posts/2014-05-28-preprocessing-for-routing-problems-part-1/","summary":"\u003cp\u003eConsider an instance of the \u003ca href=\"https://en.wikipedia.org/wiki/Vehicle_routing_problem\"\u003evehicle routing problem\u003c/a\u003e in which we have drivers that are geographically distributed, each in a unique location. Our goal is to deliver goods or services to a set of destinations at the lowest cost. It does not matter to our customers which driver goes to which destination, so long as the deliveries are made.\u003c/p\u003e\n\u003cp\u003eOne can think of this problem as a collection of \u003ca href=\"https://en.wikipedia.org/wiki/Travelling_salesman_problem\"\u003etravelling salesman problems\u003c/a\u003e, where there are multiple salespeople in different locations and a shared set of destinations. We attempt to find the minimum cost schedule for all salespeople that visits all destinations, where each salesman can technically go anywhere.\u003c/p\u003e","title":"🗺️ Preprocessing for Routing Problems - Part 1"},{"content":"Note: This post was written before Gurobi supported nonlinear optimization. It has been updated to work with Python 3.\nA common problem in handling geometric data is determining the center of a given polygon. This is not quite so easy as it sounds as there is not a single definition of center that makes sense in all cases. For instance, sometimes computing the center of a polygon\u0026rsquo;s bounding box may be sufficient. In some instances this may give a point on an edge (consider a right triangle). If the given polygon is non-convex, that point may not even be inside or on its boundary.\nThis post looks at computing Chebyshev centers for arbitrary convex polygons. We employ essentially the same model as in Boyd \u0026amp; Vandenberghe\u0026rsquo;s Convex Optimization text, but using Gurobi instead of CVXOPT.\nConsider a polygon defined by the intersection of a finite number of half-spaces, $Au \\le b$. We assume we are given the set of vertices, $V$, in clockwise order around the polygon. $E$ is the set of edges connecting these vertices. Each edge in $E$ defines a boundary of the half-space $a_i^\\intercal u \\le b_i$\n$$ V = {(1,1), (2,5), (5,4), (6,2), (4,1)}\\\\ E = {((1,1),(2,5)), ((2,5),(5,4)), ((5,4),(6,2)), ((6,2),(4,1)), ((4,1),(1,1))} $$\nThe Chebyshev center of this polygon is the center point $(x, y)$ of the maximum radius inscribed circle. That is, if we can find the largest circle that will fit inside our polygon without going outside its boundary, its center is the point we are looking for. Our decision variables are the center $(x, y)$ and the maximum inscribed radius, $r$.\nIn order to do this, we consider the edges independently. The long line segment below shows an arbitrary edge, $a_i^\\intercal u \\le b_i$. The short line connected to it is orthogonal in the direction $a$. $(x, y)$ satisfies the inequality.\nThe shortest distance from $(x, y)$ will be in the direction of $a$. We\u0026rsquo;ll call this distance $r$. If we were to move the edge so it had the same slope but went through $(x, y)$, its distance from $a_i^\\intercal u = b_i$ would be $r||a_i||_2$. Thus we can add a constraint of the form $a_i\u0026rsquo;u + r||a_i||_2 \\le b_i$ for each edge and maximize the value of $r$ as our objective function.\n$$ \\begin{align*} \u0026amp; \\text{max} \u0026amp;\u0026amp; r \\\\ \u0026amp; \\text{s.t.} \u0026amp;\u0026amp; (y_i-y_j)x + (x_j-x_i)y + r\\sqrt{(x_j-x_i)^2 + (y_j-y_i)^2} \\le (y_i-y_j)x_i + (x_j-x_i)y_i \\\\ \u0026amp; \u0026amp;\u0026amp; \\quad \\forall \\quad ((x_i,y_i), (x_j,y_j)) \\in E \\\\ \\end{align*} $$\nAs this is linear, we can solve it using any LP solver. The following code does so with Gurobi.\n#!/usr/bin/env python3 from gurobipy import Model, GRB from math import sqrt vertices = [(1,1), (2,5), (5,4), (6,2), (4,1)] edges = zip(vertices, vertices[1:] + [vertices[0]]) m = Model() r = m.addVar() x = m.addVar(lb=-GRB.INFINITY) y = m.addVar(lb=-GRB.INFINITY) m.update() for (x1, y1), (x2, y2) in edges: dx = x2 - x1 dy = y2 - y1 m.addConstr((dx*y - dy*x) + (r * sqrt(dx**2 + dy**2)) \u0026lt;= dx*y1 - dy*x1) m.setObjective(r, GRB.MAXIMIZE) m.optimize() print(\u0026#39;r = %.04f\u0026#39; % r.x) print(\u0026#39;(x, y) = (%.04f, %.04f)\u0026#39; % (x.x, y.x)) The model output shows our center and its maximum inscribed radius.\n$$ r = 1.7466\\\\ (x, y) = (3.2370, 2.7466) $$\nQuestion for the reader: in certain circumstances, such as rectangles, the Chebyshev center is ambiguous. How might one get around this ambiguity?\n","permalink":"https://ryanjoneil.dev/posts/2014-02-03-chebyshev-centers-of-polygons-with-gurobi/","summary":"\u003cp\u003e\u003cem\u003eNote: This post was written before Gurobi supported nonlinear optimization. It has been updated to work with Python 3.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eA common problem in handling geometric data is determining the center of a given polygon. This is not quite so easy as it sounds as there is not a single definition of center that makes sense in all cases. For instance, sometimes computing the center of a polygon\u0026rsquo;s bounding box may be sufficient. In some instances this may give a point on an edge (consider a right triangle). If the given polygon is non-convex, that point may not even be inside or on its boundary.\u003c/p\u003e","title":"⭕ Chebyshev Centers of Polygons with Gurobi"},{"content":"Note: A reader pointed out that Union-Find is a very efficient way to accomplish this task. Start there if you have the same problem!\nLast week, Paul Rubin wrote an excellent post on Extracting a Connected Graph from an existing graph. Lately I\u0026rsquo;ve been performing related functions on data from OpenStreetMap, though without access to a solver. In my case I\u0026rsquo;m taking in arbitrary network data and splitting it into disconnected sub-networks. I thought it might be a good case study to show an algorithmic way doing this and some of the performance issues I ran into.\nA small example can be seen below. This shows a road network around the Las Vegas strip. There is one main (weakly) connected network in black. The roads highlighted in red are disconnected from the main network. We want code that will split these into connected sub-networks.\nSay we have data that looks like the following. Instead of nodes, the numbers in quotes represent edges. Think of these as streets.\n{ \u0026#34;0\u0026#34;: [1, 2, 3], \u0026#34;1\u0026#34;: [9248, 9249, 9250], \u0026#34;2\u0026#34;: [589, 9665, 9667], \u0026#34;3\u0026#34;: [0, 5, 6], \u0026#34;4\u0026#34;: [0, 5, 6], \u0026#34;5\u0026#34;: [588], \u0026#34;6\u0026#34;: [4, 8, 9], ... } Our basic strategy is the following:\nStart with every edge alone in its own subnetwork. For each connection, merge the networks of the source and destination edges. #!/usr/bin/env python3 import json import sys import time class hset(set): \u0026#39;\u0026#39;\u0026#39;A hashable set. Note that it only hashes by the pointer, and not by the elements.\u0026#39;\u0026#39;\u0026#39; def __hash__(self): return hash(id(self)) def __cmp__(self, other): return cmp(id(self), id(other)) if __name__ == \u0026#39;__main__\u0026#39;: try: inputfile = sys.argv[1] except: print \u0026#39;usage: %s network.json\u0026#39; % sys.argv[0] sys.exit() print(time.asctime(), \u0026#39;parsing json input\u0026#39;) connections = json.load(open(inputfile)) edge_to_net = {} # Edge ID -\u0026gt; set([edges that are in the same network]) nets = set() # Set of known networks print(time.asctime(), \u0026#39;detecting disconnected subgraphs\u0026#39;) for i, (from_edge, to_set) in enumerate(connections.iteritems()): from_edge = int(from_edge) try: from_net = edge_to_net[from_edge] except KeyError: from_net = edge_to_net[from_edge] = hset([from_edge]) nets.add(from_net) if not (i+1) % (25 * 1000): print time.asctime(), \u0026#39;%d edges processed / %d current subnets\u0026#39; % (i+1, len(nets)) for to in to_set: try: to_net = edge_to_net[to] # If we get here, merge the to_net into the from_net. if to_net is not from_net: to_net.update(from_net) for e in from_net: edge_to_net[e] = to_net nets.remove(from_net) from_net = to_net except KeyError: from_net.add(to) edge_to_net[to] = from_net print(time.asctime(), len(nets), \u0026#39;subnets found\u0026#39;) We run this against the network pictured above and it works reasonably quickly, finishing in about 7 seconds:\nMon Jul 29 12:22:38 2013 parsing json input Mon Jul 29 12:22:38 2013 detecting disconnected subgraphs Mon Jul 29 12:22:38 2013 25000 edges processed / 1970 current subnets Mon Jul 29 12:22:44 2013 50000 edges processed / 124 current subnets Mon Jul 29 12:22:45 2013 60 subnets found However, when run against a road network for an entire city, the process continues for several hours. What is the issue?\u0026lt;\nThe inefficiency occurs from lines 46 to 50. In this we are frequently removing references to every element in a large set. Instead, it would be better to remove as few references as possible. Therefore, instead of merging from_net into to_net, we will determine which network is the smaller of the two and marge that one into the larger one. Note that this does not necessarily change the worst case time complexity of the algorithm, but it should make the code fast enough to be useful. The new version appears below.`\n# !/usr/bin/env python import json import sys import time class hset(set): \u0026#39;\u0026#39;\u0026#39;A hashable set. Note that it only hashes by the pointer, and not by the elements.\u0026#39;\u0026#39;\u0026#39; def __hash__(self): return hash(id(self)) def __cmp__(self, other): return cmp(id(self), id(other)) if __name__ == \u0026#39;__main__\u0026#39;: try: inputfile = sys.argv[1] except: print(\u0026#39;usage: %s network.json\u0026#39; % sys.argv[0]) sys.exit() print(time.asctime(), \u0026#39;parsing json input\u0026#39;) connections = json.load(open(inputfile)) edge_to_net = {} # Edge ID -\u0026gt; set([edges that are in the same network]) nets = set() # Set of known networks print(time.asctime(), \u0026#39;detecting disconnected subgraphs\u0026#39;) for i, (from_edge, to_set) in enumerate(connections.iteritems()): from_edge = int(from_edge) try: from_net = edge_to_net[from_edge] except KeyError: from_net = edge_to_net[from_edge] = hset([from_edge]) nets.add(from_net) if not (i+1) % (25 * 1000): print(time.asctime(), \u0026#39;%d edges processed / %d current subnets\u0026#39; % (i+1, len(nets))) for to in to_set: try: to_net = edge_to_net[to] # If we get here, merge the to_net into the from_net. if to_net is not from_net: # Update references to and remove the smaller set for speed. if len(to_net) \u0026lt; len(from_net): smaller, larger = to_net, from_net else: smaller, larger = from_net, to_net larger.update(smaller) for e in smaller: edge_to_net[e] = larger nets.remove(smaller) edge_to_net[to] = larger from_net = larger except KeyError: from_net.add(to) edge_to_net[to] = from_net print(time.asctime(), len(nets), \u0026#39;subnets found\u0026#39;) Indeed, this is significantly faster. And on very large networks it runs in minutes instead of hours or days. On the small test case used for this post, it runs in under a second. While this could probably be done faster, that\u0026rsquo;s actually good enough for right now.\nMon Jul 29 12:39:55 2013 parsing json input Mon Jul 29 12:39:55 2013 detecting disconnected subgraphs Mon Jul 29 12:39:55 2013 25000 edges processed / 1970 current subnets Mon Jul 29 12:39:55 2013 50000 edges processed / 124 current subnets Mon Jul 29 12:39:55 2013 60 subnets found ","permalink":"https://ryanjoneil.dev/posts/2013-07-29-network-splitting/","summary":"\u003cp\u003e\u003cem\u003eNote: A reader pointed out that \u003ca href=\"https://en.wikipedia.org/wiki/Disjoint-set_data_structure\"\u003eUnion-Find\u003c/a\u003e is a very efficient way to accomplish this task. Start there if you have the same problem!\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eLast week, Paul Rubin wrote an excellent post on \u003ca href=\"https://orinanobworld.blogspot.com/2013/07/extracting-connected-graph.html\"\u003eExtracting a Connected Graph\u003c/a\u003e from an existing graph. Lately I\u0026rsquo;ve been performing related functions on data from \u003ca href=\"https://www.openstreetmap.org/\"\u003eOpenStreetMap\u003c/a\u003e, though without access to a solver. In my case I\u0026rsquo;m taking in arbitrary network data and splitting it into disconnected sub-networks. I thought it might be a good case study to show an algorithmic way doing this and some of the performance issues I ran into.\u003c/p\u003e","title":"✂️ Network Splitting"},{"content":"Note: This post was updated to work with Python 3 and the 2nd edition of \u0026ldquo;Integer Programming\u0026rdquo; by Laurence Wolsey.\nWe\u0026rsquo;ve been studying Lagrangian Relaxation (LR) in the Advanced Topics in Combinatorial Optimization course I\u0026rsquo;m taking this term, and I had some difficulty finding a simple example covering its application. In case anyone else finds it useful, I\u0026rsquo;m posting a Python version for solving the Generalized Assignment Problem (GAP). This won\u0026rsquo;t discuss the theory of LR at all, just give example code using Gurobi.\nGeneralized assignment The GAP as defined by Wolsey consists of a maximization problem subject to a set of set packing constraints followed by a set of knapsack constraints.\n$$ \\begin{align*} \u0026amp; \\text{max} \u0026amp;\u0026amp; \\sum_i \\sum_j c_{ij} x_{ij} \\\\ \u0026amp; \\text{s.t.} \u0026amp;\u0026amp; \\sum_j x_{ij} \\leq 1 \u0026amp;\u0026amp; \\forall i \\\\ \u0026amp; \u0026amp;\u0026amp; \\sum_i a_{ij} x_{ij} \\leq b_{ij} \u0026amp;\u0026amp; \\forall j \\\\ \u0026amp; \u0026amp;\u0026amp; x_{ij} \\in {0, 1} \\end{align*} $$\nNaive model A naive version of this model using Gurobi might look like the following.\n#!/usr/bin/env python # This is the GAP per Wolsey, pg 208. from gurobipy import Model, GRB, quicksum as qsum m = Model(\u0026#34;GAP per Wolsey\u0026#34;) m.modelSense = GRB.MAXIMIZE m.setParam(\u0026#34;OutputFlag\u0026#34;, False) # turns off solver chatter b = [15, 15, 15] c = [ [6, 10, 1], [12, 12, 5], [15, 4, 3], [10, 3, 9], [8, 9, 5], ] a = [ [5, 7, 2], [14, 8, 7], [10, 6, 12], [8, 4, 15], [6, 12, 5], ] # x[i][j] = 1 if i is assigned to j x = [[m.addVar(vtype=GRB.BINARY) for _ in row] for row in c] # sum j: x_ij \u0026lt;= 1 for all i for x_i in x: m.addConstr(sum(x_i) \u0026lt;= 1) # sum i: a_ij * x_ij \u0026lt;= b[j] for all j for j, b_j in enumerate(b): m.addConstr(qsum(a[i][j] * x_i[j] for i, x_i in enumerate(x)) \u0026lt;= b_j) # max sum i,j: c_ij * x_ij m.setObjective( qsum(qsum(c_ij * x_ij for c_ij, x_ij in zip(c_i, x_i)) for c_i, x_i in zip(c, x)) ) m.optimize() # Pull solution out of m. print(f\u0026#34;z = {m.objVal}\u0026#34;) print(\u0026#34;x = [\u0026#34;) for x_i in x: print(f\u0026#34; {[1 if x_ij.x \u0026gt;= 0.5 else 0 for x_ij in x_i]}\u0026#34;) print(\u0026#34;]\u0026#34;) The solver quickly finds the following optimal solution of this toy problem.\nz = 46.0 x = [ [0, 1, 0] [0, 1, 0] [1, 0, 0] [0, 0, 1] [0, 0, 0] ] Lagrangian model There are two sets of constraints we can dualize. It can be beneficial to apply Lagrangian Relaxation against problems composed of knapsack constraints, so we will dualize the set packing ones.\n# sum j: x_ij \u0026lt;= 1 for all i for x_i in x: model.addConstr(sum(x_i) \u0026lt;= 1) We replace these with a new set of variables, penalties, which take the values of the slacks on the set packing constraints. We then modify the objective function, adding Lagrangian multipliers times these penalties.\nInstead of optimizing once, we do so iteratively. An important consideration is we may get nothing more than a dual bound from this process. Any integer solution is not guaranteed to be primal feasible unless it satisfies complementary slackness conditions \u0026ndash; for each dualized constraint either its multiplier or penalty must be zero.\nWe then set the initial multiplier values to 2 and use sub-gradient optimization with a step size of 1 / (iteration #) to adjust them.\n#!/usr/bin/env python # This is the GAP per Wolsey, pg 208, using Lagrangian Relaxation. from gurobipy import Model, GRB, quicksum as qsum m = Model(\u0026#34;GAP per Wolsey with Lagrangian Relaxation\u0026#34;) m.modelSense = GRB.MAXIMIZE m.setParam(\u0026#34;OutputFlag\u0026#34;, False) # turns off solver chatter b = [15, 15, 15] c = [ [6, 10, 1], [12, 12, 5], [15, 4, 3], [10, 3, 9], [8, 9, 5], ] a = [ [5, 7, 2], [14, 8, 7], [10, 6, 12], [8, 4, 15], [6, 12, 5], ] # x[i][j] = 1 if i is assigned to j x = [[m.addVar(vtype=GRB.BINARY) for _ in row] for row in c] # As stated, the GAP has these following constraints. We dualize these into # penalties instead, using variables so we can easily extract their values. penalties = [m.addVar() for _ in x] # Dualized constraints: sum j: x_ij \u0026lt;= 1 for all i for p, x_i in zip(penalties, x): m.addConstr(p == 1 - sum(x_i)) # sum i: a_ij * x_ij \u0026lt;= b[j] for all j for j, b_j in enumerate(b): m.addConstr(qsum(a[i][j] * x_i[j] for i, x_i in enumerate(x)) \u0026lt;= b_j) # u[i] = Lagrangian Multiplier for the set packing constraint i u = [2.0] * len(x) # Re-optimize until either we have run a certain number of iterations # or complementary slackness conditions apply. for k in range(1, 101): # max sum i,j: c_ij * x_ij m.setObjective( qsum( # Original objective function sum(c_ij * x_ij for c_ij, x_ij in zip(c_i, x_i)) for c_i, x_i in zip(c, x) ) + qsum( # Penalties for dualized constraints u_j * p_j for u_j, p_j in zip(u, penalties) ) ) m.optimize() print( f\u0026#34;iteration {k}: z = {m.objVal}, u = {u}, penalties = {[p.x for p in penalties]}\u0026#34; ) # Test for complementary slackness stop = True eps = 10e-6 for u_i, p_i in zip(u, penalties): if abs(u_i) \u0026gt; eps and abs(p_i.x) \u0026gt; eps: stop = False break if stop: print(\u0026#34;primal feasible \u0026amp; optimal\u0026#34;) break else: s = 1.0 / k for i in range(len(x)): u[i] = max(u[i] - s * (penalties[i].x), 0.0) # Pull solution out of m. print(f\u0026#34;z = {m.objVal}\u0026#34;) print(\u0026#34;x = [\u0026#34;) for x_i in x: print(f\u0026#34; {[1 if x_ij.x \u0026gt;= 0.5 else 0 for x_ij in x_i]}\u0026#34;) print(\u0026#34;]\u0026#34;) Again, the example converges very quickly to an optimal solution.\niteration 1: z = 48.0, u = [2.0, 2.0, 2.0, 2.0, 2.000], penalties = [0.0, 0.0, 0.0, 0.0, 1.0] iteration 2: z = 47.0, u = [2.0, 2.0, 2.0, 2.0, 1.000], penalties = [0.0, 0.0, 0.0, 0.0, 1.0] iteration 3: z = 46.5, u = [2.0, 2.0, 2.0, 2.0, 0.500], penalties = [0.0, 0.0, 0.0, 0.0, 1.0] iteration 4: z = 46.2, u = [2.0, 2.0, 2.0, 2.0, 0.167], penalties = [0.0, 0.0, 0.0, 0.0, 1.0] iteration 5: z = 46.0, u = [2.0, 2.0, 2.0, 2.0, 0.000], penalties = [0.0, 0.0, 0.0, 0.0, 1.0] primal feasible \u0026amp; optimal z = 46.0 x = [ [0, 1, 0] [0, 1, 0] [1, 0, 0] [0, 0, 1] [0, 0, 0] ] Exercise for the reader: change the script to dualize the knapsack constraints instead of the set packing constraints. What is the result of this change in terms of convergence?\nResources gap.py gap-lagrangian.py ","permalink":"https://ryanjoneil.dev/posts/2012-09-22-lagrangian-relaxation-with-gurobi/","summary":"\u003cp\u003e\u003cem\u003eNote: This post was updated to work with Python 3 and the 2nd edition of \u0026ldquo;Integer Programming\u0026rdquo; by Laurence Wolsey.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eWe\u0026rsquo;ve been studying Lagrangian Relaxation (LR) in the Advanced Topics in Combinatorial Optimization course I\u0026rsquo;m taking this term, and I had some difficulty finding a simple example covering its application. In case anyone else finds it useful, I\u0026rsquo;m posting a Python version for solving the \u003ca href=\"https://en.wikipedia.org/wiki/Generalized_assignment_problem\"\u003eGeneralized Assignment Problem\u003c/a\u003e (GAP). This won\u0026rsquo;t discuss the theory of LR at all, just give example code using Gurobi.\u003c/p\u003e","title":"🏖️ Langrangian Relaxation with Gurobi"},{"content":"Note: This post was updated to work with Python 3 and PySCIPOpt. The original version used Python 2 and python-zibopt. It has also been edited for clarity.\nAs a followup to the last post, I created another SCIP example for finding Normal Magic Squares. This is similar to solving a Sudoku problem, except that here the number of binary variables depends on the square size. In the case of Sudoku, each cell has 9 binary variables \u0026ndash; one for each potential value it might take. For a normal magic square, there are $n^2$ possible values for each cell, $n^2$ cells, and one variable representing the row, column, and diagonal sums. This makes a total of $n^4$ binary variables and one continuous variables in the model.\nHowever, there are no big-Ms.\nI think the neat part of this code is in this section:\n# Construct an expression for each cell that is the sum of # its binary variables with their associated coefficients. sums = [] for row in matrix: sums_row = [] for cell in row: sums_row.append(sum((i + 1) * x for i, x in enumerate(cell))) sums.append(sums_row) It creates sums of the $n^2$ variables for each cell with their appropriate coefficients ($1$ to $n^2$) and stores those expressions to make the subsequent constraint creation simpler.\nAnother interesting exercise for the reader: Change the code to minimize the sum of each column. How does that impact the solution time?\n","permalink":"https://ryanjoneil.dev/posts/2012-01-13-normal-magic-squares/","summary":"\u003cp\u003e\u003cem\u003eNote: This post was updated to work with Python 3 and \u003ca href=\"https://github.com/scipopt/PySCIPOpt\"\u003ePySCIPOpt\u003c/a\u003e. The original version used Python 2 and \u003ca href=\"https://pythonhosted.org/python-zibopt/\"\u003epython-zibopt\u003c/a\u003e. It has also been edited for clarity.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eAs a followup to the \u003ca href=\"../2012-01-12-magic-squares-and-big-ms/\"\u003elast post\u003c/a\u003e, I created \u003ca href=\"/files/2012-01-13-normal-magic-squares/normal-magic-square.py\"\u003eanother SCIP example\u003c/a\u003e for finding Normal Magic Squares. This is similar to \u003ca href=\"https://github.com/CPMpy/cpmpy/blob/master/examples/quickstart_sudoku.ipynb\"\u003esolving a Sudoku problem\u003c/a\u003e, except that here the number of binary variables depends on the square size. In the case of Sudoku, each cell has 9 binary variables \u0026ndash; one for each potential value it might take. For a normal magic square, there are $n^2$ possible values for each cell, $n^2$ cells, and one variable representing the row, column, and diagonal sums. This makes a total of $n^4$ binary variables and one continuous variables in the model.\u003c/p\u003e","title":"🔲 Normal Magic Squares"},{"content":"Note: This post was updated to work with Python 3 and PySCIPOpt. The original version used Python 2 and python-zibopt. It has also been edited for clarity.\nBack in October of 2011, I started toying with a model for finding magic squares using SCIP. This is a fun modeling exercise and a challenging problem. First one constructs a square matrix of integer-valued variables.\nfrom pyscipopt import Model # [...snip...] m = Model() matrix = [] for i in range(size): row = [m.addVar(vtype=\u0026#34;I\u0026#34;, lb=1) for _ in range(size)] for x in row: m.addCons(x \u0026lt;= M) matrix.append(row) Then one adds the following constraints:\nAll variables ≥ 1. All rows, columns, and the diagonal sum to the same value. All variables take different values. The first two constraints are trivial to implement, and relatively easy for the solver. What I do is add a single extra variable then set it equal to the sums of each row, column, and the diagonal.\nsum_val = m.addVar(vtype=\u0026#34;M\u0026#34;) for i in range(size): m.addCons(sum(matrix[i]) == sum_val) m.addCons(sum(matrix[j][i] for j in range(size)) == sum_val) m.addCons(sum(matrix[i][i] for i in range(size)) == sum_val) It\u0026rsquo;s the third that messes things up. You can think of this as saying, for every possible pair of integer-valued variables $x$ and $y$:\n$$ x \\ge y + 1 \\quad \\text{or} \\quad x \\le y - 1 $$\nWhy is this hard? Because we can\u0026rsquo;t add both constraints to the model. That would make it infeasible. Instead, we add write them in such a way that exactly one will be active for any any given solution. This requires, for each pair of variables, an additional binary variable $z$ and a (possibly big) constant $M$. Thus we reformulate the above as:\n$$ x \\ge (y + 1) - M z \\ x \\le (y - 1) + M (1-z) \\ z \\in {0,1} $$\nIn code this looks like:\nfrom itertools import chain all_vars = list(chain(*matrix)) for i, x in enumerate(all_vars): for y in all_vars[i+1:]: z = m.addVar(vtype=\u0026#34;B\u0026#34;) m.addCons(x \u0026gt;= y + 1 - M*z) m.addCons(x \u0026lt;= y - 1 + M*(1-z)) However, here be dragons. We may not know how big (or small) to make $M$. Generally we want it as small as possible to make the LP relaxation of our integer programming model tighter. Different values of $M$ have unpredictable effects on solution time.\nWhich brings us to an interesting idea:\nSCIP now supports bilinear constraints. This means that I can make $M$ a variable in the above model.\nimport sys try: M = int(sys.argv[2]) except IndexError: M = m.addVar(vtype=\u0026#34;M\u0026#34;, lb=size * size) else: assert M \u0026gt;= size * size The magic square model linked to in this post provides both options. The first command line argument it requires is the matrix size. The second one, $M$, is optional. If not given, it leaves $M$ up to the solver.\nAn interesting exercise for the reader: Change the code to search for a minimal magic square, which minimizes either the value of $M$ or the sums of the columns, rows, and diagonal.\n","permalink":"https://ryanjoneil.dev/posts/2012-01-12-magic-squares-and-big-ms/","summary":"\u003cp\u003e\u003cem\u003eNote: This post was updated to work with Python 3 and \u003ca href=\"https://github.com/scipopt/PySCIPOpt\"\u003ePySCIPOpt\u003c/a\u003e. The original version used Python 2 and \u003ca href=\"https://pythonhosted.org/python-zibopt/\"\u003epython-zibopt\u003c/a\u003e. It has also been edited for clarity.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eBack in October of 2011, I started toying with a model for finding \u003ca href=\"https://en.wikipedia.org/wiki/Magic_square\"\u003emagic squares\u003c/a\u003e using SCIP. This is a fun modeling exercise and a challenging problem. First one constructs a square matrix of integer-valued variables.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" style=\"color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#ff7b72\"\u003efrom\u003c/span\u003e \u003cspan style=\"color:#ff7b72\"\u003epyscipopt\u003c/span\u003e \u003cspan style=\"color:#ff7b72\"\u003eimport\u003c/span\u003e Model\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#8b949e;font-style:italic\"\u003e# [...snip...]\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003em \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e Model()\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003ematrix \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e []\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#ff7b72\"\u003efor\u003c/span\u003e i \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003ein\u003c/span\u003e range(size):\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e row \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e [m\u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e.\u003c/span\u003eaddVar(vtype\u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e\u003cspan style=\"color:#a5d6ff\"\u003e\u0026#34;I\u0026#34;\u003c/span\u003e, lb\u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e\u003cspan style=\"color:#a5d6ff\"\u003e1\u003c/span\u003e) \u003cspan style=\"color:#ff7b72\"\u003efor\u003c/span\u003e _ \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003ein\u003c/span\u003e range(size)]\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#ff7b72\"\u003efor\u003c/span\u003e x \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003ein\u003c/span\u003e row:\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e m\u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e.\u003c/span\u003eaddCons(x \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e\u0026lt;=\u003c/span\u003e M)\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e matrix\u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e.\u003c/span\u003eappend(row)\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThen one adds the following constraints:\u003c/p\u003e","title":"🔲 Magic Squares and Big-Ms"},{"content":"In response to this post, Ben Bitdiddle inquires:\nI understand the concept of using a companion set to remove duplicates from a list while preserving the order of its elements. But what should I do if these elements are composed of smaller pieces? For instance, say I am generating combinations of numbers in which order is unimportant. How do I make a set recognize that [1,2,3] is the same as [3,2,1] in this case?\nThere are a couple points that should help here.\nWhile lists are unhashable and therefore cannot be put into sets, tuples are perfectly capable of this. Therefore I cannot do this.\ns = set() s.add([1,2,3]) Traceback (most recent call last): File \u0026#34;\u0026lt;stdin\u0026gt;\u0026#34;, line 1, in \u0026lt;module\u0026gt; TypeError: unhashable type: \u0026#39;list\u0026#39; But this works just fine (extra space added for emphasis of tuple parentheses).\ns.add( (1,2,3) ) (3,2,1) and (1,2,3) may not hash to the same thing, but tuples are easily sortable. If I sort them before adding them to a set, they look the same.\ntuple(sorted( (3,2,1) )) (1, 2, 3) If I want to be a little fancier, I can user itertools.combinations. The following generates all unique 3-digit combinations of integers from 1 to 4:\nfrom itertools import combinations list(combinations(range(1,5), 3)) [(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)] Now say I want to only find those that match some condition. I can add a filter to return, say, only those 3-digit combinations of integers from 1 to 6 that multiply to a number divisible by 10:\nlist(filter( lambda x: not (x[0]*x[1]*x[2]) % 10, combinations(range(1, 7), 3) )) [(1, 2, 5), (1, 4, 5), (1, 5, 6), (2, 3, 5), (2, 4, 5), (2, 5, 6), (3, 4, 5), (3, 5, 6), (4, 5, 6)] ","permalink":"https://ryanjoneil.dev/posts/2011-11-25-know-your-time-complexities-part-2/","summary":"\u003cp\u003eIn response to \u003ca href=\"../2011-10-25-know-your-time-complexities/\"\u003ethis\u003c/a\u003e post, \u003ca href=\"https://en.wikipedia.org/wiki/Structure_and_Interpretation_of_Computer_Programs\"\u003eBen Bitdiddle\u003c/a\u003e inquires:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eI understand the concept of using a companion set to remove duplicates from a list while preserving the order of its elements. But what should I do if these elements are composed of smaller pieces? For instance, say I am generating \u003ca href=\"https://en.wikipedia.org/wiki/Combination\"\u003ecombinations\u003c/a\u003e of numbers in which order is unimportant. How do I make a set recognize that \u003ccode\u003e[1,2,3]\u003c/code\u003e is the same as \u003ccode\u003e[3,2,1]\u003c/code\u003e in this case?\u003c/p\u003e","title":"⏳️ Know Your Time Complexities - Part 2"},{"content":"This is based on a lightning talk I gave at the LA PyLadies October Hackathon.\nI\u0026rsquo;m actually not going to go into anything much resembling algorithmic complexity here. What I\u0026rsquo;d like to do is present a common performance anti-pattern that I see from novice programmers about once every year or so. If I can prevent one person from committing this error, this post will have achieved its goal. I\u0026rsquo;d also like to show how an intuitive understanding of time required by operations in relation to the size of data they operate on can be helpful.\nSay you have a Big List of Things. It doesn\u0026rsquo;t particularly matter what these things are. Often they might be objects or dictionaries of denormalized data. In this example we\u0026rsquo;ll use numbers. Let\u0026rsquo;s generate a list of 1 million integers, each randomly chosen from the first 100 thousand natural numbers:\nimport random choices = range(100000) x = [random.choice(choices) for i in range(1000000)] Now say you want to remove (or aggregate, or structure) duplicate data while keeping them in order of appearance. Intuitively, this seems simple enough. A first solution might involve creating a new empty list, iterating over x, and only appending those items that are not already in the new list.\nThe Bad Way order = [] for i in x: if i not in order: order.append(i) Try running this. What\u0026rsquo;s wrong with it?\nThe issue is the conditional on line 3. In the worst case, it could look at every item in the order list for each item in x. If the list is big, as it is in our example, that wastes a lot of cycles. We can reason that we can improve the performance of our code by replacing this conditional with something faster.\nThe Good Way Given that sets have near constant time for membership tests, one solution is to create a companion data structure, which we\u0026rsquo;ll call seen. Being a set, it doesn\u0026rsquo;t care about the order of the items, but it will allow us to test for membership quickly.\norder = [] seen = set() for i in x: if i not in seen: seen.add(i) order.append(i) Now try running this. Better?\nNot that this is the best way to perform this particular action. If you aren\u0026rsquo;t familiar with it, take a look at the groupby function from itertools, which is what I will sometimes reach for in a case like this.\n","permalink":"https://ryanjoneil.dev/posts/2011-10-25-know-your-time-complexities/","summary":"\u003cp\u003eThis is based on a lightning talk I gave at the LA PyLadies October Hackathon.\u003c/p\u003e\n\u003cp\u003eI\u0026rsquo;m actually not going to go into anything much resembling algorithmic complexity here. What I\u0026rsquo;d like to do is present a common performance anti-pattern that I see from novice programmers about once every year or so. If I can prevent one person from committing this error, this post will have achieved its goal. I\u0026rsquo;d also like to show how an intuitive understanding of time required by operations in relation to the size of data they operate on can be helpful.\u003c/p\u003e","title":"⏳️ Know Your Time Complexities"},{"content":"I find I have to build simulations with increasing frequency in my work and life. Usually this indicates I\u0026rsquo;m faced with one of the following situations:\nThe need for a quick estimate regarding the quantitative behavior of some situation. The desire to verify the result of a computation or assumption. A situation which is too complex or random to effectively model or understand. Anyone familiar at all with simulation will recognize the last item as the motivating force of the entire field. Simulation models tend to take over when systems become so complex that understanding them is prohibitive in cost and time or entirely infeasible. In a simulation, the modeler can focus on individual interactions between entities while still hoping for useful output in the form of descriptive statistics.\nAs such, simulations are nearly always stochastic. The output of a simulation, whether it be the mean time to service upon entering a queue or the number of fish alive in a pond, is determined by a number of random inputs. It is estimated by looking at a sample of the entire, often infinite, problem space and therefore must be described in terms of mean and variance.\nFor me, simulation building usually follows a process roughly like this:\nWork with a domain expert to understand the process under study. Convert this process into a deterministic simulation (no randomness). Verify the output of the deterministic simulation. Anlyze the inputs of the simulation to determine their probability distributions. Convert the deterministic simulation to a stochastic simulation. The reason for creating a simulation without randomness first is that it can be difficult or impossible to verify its correctness otherwise. Thus one may focus on the simulation logic first before analyzing and adding sources of randomness.\nWhere the procedure breaks down is after the third step. Domain experts are often happy to share their knowledge about systems to aid in designing simulations, and typically can understand the resulting abstractions. They are also invaluable in verifying simulation output. However, they are unlikely to understand why it is necessary to add randomness to a system that they already perceive as functional. Further, doing so can be just as difficult and time consuming as the initial model development and therefore requires justification.\nThis can be a quandary for the model builder. How does one communicate the need to incorporate randomness to decision makers who lack understanding of probability? It is trivially easy to construct simulations that use the same input parameters but yield drastically different outputs. Consider the code below, which simulates two events occurring and counts the number of times event b happens before event a.\nimport random def sim_stochastic(event_a_lambda, event_b_lambda): # Returns 0 if event A arrives first, 1 if event B arrives first # Calculate next arrival time for each event randomly. event_a_arrival = random.expovariate(event_a_lambda) event_b_arrival = random.expovariate(event_b_lambda) return 0.0 if event_a_arrival \u0026lt;= event_b_arrival else 1.0 def sim_deterministic(event_a_lambda, event_b_lambda): # Returns 0 if event A arrives first, 1 if event B arrives first # Calculate next arrival time for each event deterministically. event_a_arrival = 1.0 / event_a_lambda event_b_arrival = 1.0 / event_b_lambda return 0.0 if event_a_arrival \u0026lt;= event_b_arrival else 1.0 if __name__ == \u0026#39;__main__\u0026#39;: event_a_lambda = 0.3 event_b_lambda = 0.5 repetitions = 10000 for sim in (sim_stochastic, sim_deterministic): output = [ sim(event_a_lambda, event_b_lambda) for _ in range(repetitions) ] event_b_first = 100.0 * (sum(output) / len(output)) print(\u0026#39;event b is first %0.1f%% of the time\u0026#39; % event_b_first) Both simulations use the same input parameter, but the second one is essentially wrong as b will always happen first. In the stochastic version, we use exponential distributions for the inputs and obtain an output that verifies our basic understanding of these distributions.\nevent b is first 63.0% of the time event b is first 100.0% of the time How about you? How do you discuss the need to model a random world with decision makers?\n","permalink":"https://ryanjoneil.dev/posts/2011-06-11-deterministic-vs-stochastic-simulation/","summary":"\u003cp\u003eI find I have to build simulations with increasing frequency in my work and life. Usually this indicates I\u0026rsquo;m faced with one of the following situations:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eThe need for a quick estimate regarding the quantitative behavior of some situation.\u003c/li\u003e\n\u003cli\u003eThe desire to verify the result of a computation or assumption.\u003c/li\u003e\n\u003cli\u003eA situation which is too complex or random to effectively model or understand.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eAnyone familiar at all with simulation will recognize the last item as the motivating force of the entire field. Simulation models tend to take over when systems become so complex that understanding them is prohibitive in cost and time or entirely infeasible. In a simulation, the modeler can focus on individual interactions between entities while still hoping for useful output in the form of descriptive statistics.\u003c/p\u003e","title":"🎰 Deterministic vs. Stochastic Simulation"},{"content":"Note: This post was updated to work with NetworkX and for clarity.\nIt\u0026rsquo;s possible this will turn out like the day when Python 2.5 introduced coroutines. At the time I was very excited. I spent several hours trying to convince my coworkers we should immediately abandon all our existing Java infrastructure and port it to finite state machines implemented using Python coroutines. After a day of hand waving over a proof of concept, we put that idea aside and went about our lives.\nSoon after, I left for a Python shop, but in the next half decade I still never found a good place to use this interesting feature.\nBut it doesn\u0026rsquo;t feel like that.\nAs I come to terms more with switching to Python 3.2, the futures module seems similarly exciting. I wish I\u0026rsquo;d had it years ago, and it\u0026rsquo;s almost reason in itself to upgrade from Python 2.7. Who cares if none of your libraries have been ported yet?\nThis library lets you take any function and distribute it over a process pool. To test that out, we\u0026rsquo;ll generate a bunch of random graphs and iterate over all their cliques.\nCode First, let\u0026rsquo;s generate some test data using the dense_gnm_random_graph function. Our data includes 1000 random graphs, each with 100 nodes and 100 * 100 edges.\nimport networkx as nx n = 100 graphs = [nx.dense_gnm_random_graph(n, n*n) for _ in range(1000)] Now we write a function iterate over all cliques in a given graph. NetworkX provides a find_cliques function which returns a generator. Iterating over them ensures we will run through the entire process of finding all cliques for a graph.\ndef iterate_cliques(g): for _ in nx.find_cliques(g): pass Now we just define two functions, one for running in serial and one for running in parallel using futures.\nfrom concurrent import futures def serial_test(graphs): for g in graphs: iterate_cliques(g) def parallel_test(graphs, max_workers): with futures.ProcessPoolExecutor(max_workers=max_workers) as executor: executor.map(iterate_cliques, graphs) Our __main__ simply generates the random graphs, samples from them, times both functions, and write CSV data to standard output.\nfrom csv import writer import random import sys import time if __name__ == \u0026#39;__main__\u0026#39;: out = writer(sys.stdout) out.writerow([\u0026#39;num graphs\u0026#39;, \u0026#39;serial time\u0026#39;, \u0026#39;parallel time\u0026#39;]) n = 100 graphs = [nx.dense_gnm_random_graph(n, n*n) for _ in range(1000)] # Run with a number of different randomly generated graphs for num_graphs in range(50, 1001, 50): sample = random.choices(graphs, k = num_graphs) start = time.time() serial_test(sample) serial_time = time.time() - start start = time.time() parallel_test(sample, 16) parallel_time = time.time() - start out.writerow([num_graphs, serial_time, parallel_time]) The output of this script shows that we get a fairly linear speedup to this code with little effort.\nI ran this on a machine with 8 cores and hyperthreading. Eyeballing the chart, it looks like the speedup is roughly 5x. My system monitor shows spikes on CPU usage across cores whenever the parallel test runs.\nResources Output data Full source listing ","permalink":"https://ryanjoneil.dev/posts/2011-05-19-networkx-and-python-futures/","summary":"\u003cp\u003e\u003cem\u003eNote: This post was updated to work with \u003ca href=\"https://networkx.org/\"\u003eNetworkX\u003c/a\u003e and for clarity.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eIt\u0026rsquo;s possible this will turn out like the day when Python 2.5 introduced \u003ca href=\"https://docs.python.org/release/2.5/whatsnew/pep-342.html\"\u003ecoroutines\u003c/a\u003e. At the time I was very excited. I spent several hours trying to convince my coworkers we should immediately abandon all our existing Java infrastructure and port it to finite state machines implemented using Python coroutines. After a day of hand waving over a proof of concept, we put that idea aside and went about our lives.\u003c/p\u003e","title":"🔮 NetworkX and Python Futures"},{"content":"I recently stumbled across an implementation of the affine scaling interior point method for solving linear programs that I\u0026rsquo;d coded up in R once upon a time. I\u0026rsquo;m posting it here in case anyone else finds it useful. There\u0026rsquo;s not a whole lot of thought given to efficiency or numerical stability, just a demonstration of the basic algorithm. Still, sometimes that\u0026rsquo;s exactly what one wants.\nsolve.affine \u0026lt;- function(A, rc, x, tolerance=10^-7, R=0.999) { # Affine scaling method while (T) { X_diag \u0026lt;- diag(x) # Compute (A * X_diag^2 * A^t)-1 using Cholesky factorization. # This is responsible for scaling the original problem matrix. q \u0026lt;- A %*% X_diag**2 %*% t(A) q_inv \u0026lt;- chol2inv(chol(q)) # lambda = q * A * X_diag^2 * c lambda \u0026lt;- q_inv %*% A %*% X_diag^2 %*% rc # c - A^t * lambda is used repeatedly foo \u0026lt;- rc - t(A) %*% lambda # We converge as s goes to zero s \u0026lt;- sqrt(sum((X_diag %*% foo)^2)) # Compute new x x \u0026lt;- (x + R * X_diag^2 %*% foo / s)[,] # If s is within our tolerance, stop. if (abs(s) \u0026lt; tolerance) break } x } This function accepts a matrix A which contains all technological coefficients for an LP, a vector rc containing its reduced costs, and an initial point x interior to the LP\u0026rsquo;s feasible region. Optional arguments to the function include a tolerance, for detecting when the method is within an acceptable distance from the optimal point, and a value for R, which must be strictly between 0 and 1 and controls scaling.\nThe method works by rescaling the matrix A around the current solution x. It then computes a new x such that it remains feasible and interior, which is why R cannot be 0 or 1. It requires a feasible interior point to start and only projects to other feasible interior points, so the right hand side of the LP is not required (it is implicit from the starting point). The shadow prices for each iteration are captured in the vector lambda, so the gap between primal and dual solutions is easy to compute.\nWe run this function against a 3x3 LP with a known solution:\nmax z = 5x1 + 4x2 + 3x3 st 2x1 + 3x2 + x3 \u0026lt;= 5 4x1 + x2 + 2x3 \u0026lt;= 11 3x1 + 4x2 + 2x3 \u0026lt;= 8 x1, x2, x3 \u0026gt;= 0 The optimal solution to this LP is:\nz = 13 x1 = 2 x2 = 0 x3 = 1 This problem can be run against the affine scaling function by defining A with all necessary slack variables, and using an arbitrary feasible interior point:\nA \u0026lt;- matrix(c( 2,3,1,1,0,0, 4,1,2,0,1,0, 3,4,2,0,0,1 ), nrow=3, byrow=T) rc \u0026lt;- c(5, 4, 3, 0, 0, 0) x \u0026lt;- c(0.5, 0.5, 0.5, 2, 7.5, 3.5) solution \u0026lt;- solve.affine(A, rc, x) print(solution) print(sum(solution * rc)) This provides an output vector that is very close to the optimal primal solution shown above. Since interior point methods converge asymptotically to optimal solutions, it is important to note that we can only ever get (extremely) close to our final optimal objective and decision variable values.\n\u0026gt; print(solution) [1] 1.999998e+00 4.268595e-07 1.000002e+00 1.280579e-06 1.000005e+00 [6] 1.280579e-06 \u0026gt; print(sum(solution * rc)) [1] 13.00000 ","permalink":"https://ryanjoneil.dev/posts/2011-04-27-affine-scaling-in-r/","summary":"\u003cp\u003eI recently stumbled across an implementation of the \u003ca href=\"https://demonstrations.wolfram.com/AffineScalingInteriorPointMethod/\"\u003eaffine scaling\u003c/a\u003e \u003ca href=\"https://en.wikipedia.org/wiki/Interior_point_method\"\u003einterior point method\u003c/a\u003e for solving linear programs that I\u0026rsquo;d coded up in R once upon a time. I\u0026rsquo;m posting it here in case anyone else finds it useful. There\u0026rsquo;s not a whole lot of thought given to efficiency or numerical stability, just a demonstration of the basic algorithm. Still, sometimes that\u0026rsquo;s exactly what one wants.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" style=\"color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;\"\u003e\u003ccode class=\"language-r\" data-lang=\"r\"\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003esolve.affine \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e\u0026lt;-\u003c/span\u003e \u003cspan style=\"color:#ff7b72\"\u003efunction\u003c/span\u003e(A, rc, x, tolerance\u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e\u003cspan style=\"color:#a5d6ff\"\u003e10\u003c/span\u003e^\u003cspan style=\"color:#a5d6ff\"\u003e-7\u003c/span\u003e, R\u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e\u003cspan style=\"color:#a5d6ff\"\u003e0.999\u003c/span\u003e) {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#8b949e;font-style:italic\"\u003e# Affine scaling method\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#ff7b72\"\u003ewhile\u003c/span\u003e (T) {\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e X_diag \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e\u0026lt;-\u003c/span\u003e \u003cspan style=\"color:#d2a8ff;font-weight:bold\"\u003ediag\u003c/span\u003e(x)\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#8b949e;font-style:italic\"\u003e# Compute (A * X_diag^2 * A^t)-1 using Cholesky factorization.\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#8b949e;font-style:italic\"\u003e# This is responsible for scaling the original problem matrix.\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e q \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e\u0026lt;-\u003c/span\u003e A \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e%*%\u003c/span\u003e X_diag\u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e**\u003c/span\u003e\u003cspan style=\"color:#a5d6ff\"\u003e2\u003c/span\u003e \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e%*%\u003c/span\u003e \u003cspan style=\"color:#d2a8ff;font-weight:bold\"\u003et\u003c/span\u003e(A)\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e q_inv \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e\u0026lt;-\u003c/span\u003e \u003cspan style=\"color:#d2a8ff;font-weight:bold\"\u003echol2inv\u003c/span\u003e(\u003cspan style=\"color:#d2a8ff;font-weight:bold\"\u003echol\u003c/span\u003e(q))\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#8b949e;font-style:italic\"\u003e# lambda = q * A * X_diag^2 * c\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e lambda \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e\u0026lt;-\u003c/span\u003e q_inv \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e%*%\u003c/span\u003e A \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e%*%\u003c/span\u003e X_diag^2 \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e%*%\u003c/span\u003e rc\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#8b949e;font-style:italic\"\u003e# c - A^t * lambda is used repeatedly\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e foo \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e\u0026lt;-\u003c/span\u003e rc \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e-\u003c/span\u003e \u003cspan style=\"color:#d2a8ff;font-weight:bold\"\u003et\u003c/span\u003e(A) \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e%*%\u003c/span\u003e lambda\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#8b949e;font-style:italic\"\u003e# We converge as s goes to zero\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e s \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e\u0026lt;-\u003c/span\u003e \u003cspan style=\"color:#d2a8ff;font-weight:bold\"\u003esqrt\u003c/span\u003e(\u003cspan style=\"color:#d2a8ff;font-weight:bold\"\u003esum\u003c/span\u003e((X_diag \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e%*%\u003c/span\u003e foo)^2))\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#8b949e;font-style:italic\"\u003e# Compute new x\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e x \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e\u0026lt;-\u003c/span\u003e (x \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e+\u003c/span\u003e R \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e*\u003c/span\u003e X_diag^2 \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e%*%\u003c/span\u003e foo \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e/\u003c/span\u003e s)[,]\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#8b949e;font-style:italic\"\u003e# If s is within our tolerance, stop.\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#ff7b72\"\u003eif\u003c/span\u003e (\u003cspan style=\"color:#d2a8ff;font-weight:bold\"\u003eabs\u003c/span\u003e(s) \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e\u0026lt;\u003c/span\u003e tolerance) \u003cspan style=\"color:#ff7b72\"\u003ebreak\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e }\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e x\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e}\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThis function accepts a matrix \u003ccode\u003eA\u003c/code\u003e which contains all technological coefficients for an LP, a vector \u003ccode\u003erc\u003c/code\u003e containing its reduced costs, and an initial point \u003ccode\u003ex\u003c/code\u003e interior to the LP\u0026rsquo;s feasible region. Optional arguments to the function include a tolerance, for detecting when the method is within an acceptable distance from the optimal point, and a value for \u003ccode\u003eR\u003c/code\u003e, which must be strictly between 0 and 1 and controls scaling.\u003c/p\u003e","title":"👉 Affine Scaling in R"},{"content":"Note: This post was edited for clarity.\nFor the final JAPH in this series, I implemented a simple transpiler that converts a small subset of Scheme programs to equivalent Python programs. It starts with a Scheme program that prints 'just another scheme hacker'.\n(define (output x) (if (null? x) \u0026#34;\u0026#34; (begin (display (car x)) (if (null? (cdr x)) (display \u0026#34;\\n\u0026#34;) (begin (display \u0026#34; \u0026#34;) (output (cdr x))))))) (output (list \u0026#34;just\u0026#34; \u0026#34;another\u0026#34; \u0026#34;scheme\u0026#34; \u0026#34;hacker\u0026#34;)) The program then tokenizes that Scheme source, parses the token stream, and converts that into Python 3.\ndef output(x): if not x: \u0026#34;\u0026#34; else: print(x[0], end=\u0026#39;\u0026#39;) if not x[1:]: print(\u0026#34;\\n\u0026#34;, end=\u0026#39;\u0026#39;) else: print(\u0026#34; \u0026#34;, end=\u0026#39;\u0026#39;) output(x[1:]) output([\u0026#34;just\u0026#34;, \u0026#34;another\u0026#34;, \u0026#34;python\u0026#34;, \u0026#34;hacker\u0026#34;]) Finally it executes the resulting Python string using exec. Obfuscation is left as an exercise for the reader.\nimport re def tokenize(input): \u0026#39;\u0026#39;\u0026#39;Tokenizes an input stream into a list of recognizable tokens\u0026#39;\u0026#39;\u0026#39; token_res = ( r\u0026#39;\\(\u0026#39;, # open paren -\u0026gt; starts expression r\u0026#39;\\)\u0026#39;, # close paren -\u0026gt; ends expression r\u0026#39;\u0026#34;[^\u0026#34;]*\u0026#34;\u0026#39;, # quoted string (don\u0026#39;t support \\\u0026#34; yet) r\u0026#39;[\\w?]+\u0026#39; # atom ) return re.findall(r\u0026#39;(\u0026#39; + \u0026#39;|\u0026#39;.join(token_res) + \u0026#39;)\u0026#39;, input) def parse(stream): \u0026#39;\u0026#39;\u0026#39;Parses a token stream into a syntax tree\u0026#39;\u0026#39;\u0026#39; if not stream: return [] else: # Build a list of arguments (possibly expressions) at this level args = [] while True: # Get the next token try: x = stream.pop(0) except IndexError: return args # ( and ) control the level of the tree we\u0026#39;re at if x == \u0026#39;(\u0026#39;: args.append(parse(stream)) elif x == \u0026#39;)\u0026#39;: return args else: args.append(x) def compile(tree): \u0026#39;\u0026#39;\u0026#39;Compiles an Scheme Abstract Syntax Tree into near-Python\u0026#39;\u0026#39;\u0026#39; def compile_expr(indent, expr): indent += 1 lines = [] # these will have [(indent, statement), ...] structure while expr: # Two options: expr is a string like \u0026#34;\u0026#39;\u0026#34; or it is a list if isinstance(expr, str): return [( indent, expr.replace(\u0026#39;scheme\u0026#39;, \u0026#39;python\u0026#39;).replace(\u0026#39;\\n\u0026#39;, \u0026#39;\\\\n\u0026#39;) )] else: start = expr.pop(0) if start == \u0026#39;define\u0026#39;: signature = expr.pop(0) lines.append((indent, \u0026#39;def %s(%s):\u0026#39; % ( signature[0], \u0026#39;, \u0026#39;.join(signature[1:]) ) )) while expr: lines.extend(compile_expr(indent, expr.pop(0))) elif start == \u0026#39;if\u0026#39;: # We don\u0026#39;t support multi-clause conditionals yet clause = compile_expr(indent, expr.pop(0))[0][1] lines.append((indent, \u0026#39;if %s:\u0026#39; % clause)) if_true_lines = compile_expr(indent, expr.pop(0)) if_false_lines = compile_expr(indent, expr.pop(0)) lines.extend(if_true_lines) lines.append((indent, \u0026#39;else:\u0026#39;)) lines.extend(if_false_lines) elif start == \u0026#39;null?\u0026#39;: # Only supports conditionals of the form (null? foo) if isinstance(expr[0], str): condition = expr.pop(0) else: condition = compile_expr(indent, expr.pop(0))[0][1] return [(indent, \u0026#39;not %s\u0026#39; % condition)] elif start == \u0026#39;begin\u0026#39;: # This is just a series of statements, so don\u0026#39;t indent while expr: lines.extend(compile_expr(indent-1, expr.pop(0))) elif start == \u0026#39;display\u0026#39;: arguments = [] while expr: arguments.append( compile_expr(indent, expr.pop(0))[0][1] ) lines.append(( indent, \u0026#34;print(%s, end=\u0026#39;\u0026#39;)\u0026#34; % (\u0026#39;, \u0026#39;.join(arguments)) )) elif start == \u0026#39;car\u0026#39;: lines.append((indent, \u0026#39;%s[0]\u0026#39; % expr.pop(0))) elif start == \u0026#39;cdr\u0026#39;: lines.append((indent, \u0026#39;%s[1:]\u0026#39; % expr.pop(0))) elif start == \u0026#39;list\u0026#39;: arguments = [] while expr: arguments.append( compile_expr(indent, expr.pop(0))[0][1] ) lines.append((indent, \u0026#39;[%s]\u0026#39; % \u0026#39;, \u0026#39;.join(arguments))) else: # Assume this is a function call arguments = [] while expr: arguments.append( compile_expr(indent, expr.pop(0))[0][1] ) lines.append(( indent, \u0026#34;%s(%s)\u0026#34; % (start, \u0026#39;, \u0026#39;.join(arguments)) )) return lines return [compile_expr(-1, expr) for expr in tree] if __name__ == \u0026#39;__main__\u0026#39;: scheme = \u0026#39;\u0026#39;\u0026#39; (define (output x) (if (null? x) \u0026#34;\u0026#34; (begin (display (car x)) (if (null? (cdr x)) (display \u0026#34;\\n\u0026#34;) (begin (display \u0026#34; \u0026#34;) (output (cdr x))))))) (output (list \u0026#34;just\u0026#34; \u0026#34;another\u0026#34; \u0026#34;scheme\u0026#34; \u0026#34;hacker\u0026#34;)) \u0026#39;\u0026#39;\u0026#39; python = \u0026#39;\u0026#39; for expr in compile(parse(tokenize(scheme))): python += \u0026#39;\\n\u0026#39;.join([(\u0026#39; \u0026#39; * 4 * x[0]) + x[1] for x in expr]) + \u0026#39;\\n\\n\u0026#39; exec(python) ","permalink":"https://ryanjoneil.dev/posts/2011-04-18-reformed-japhs-transpiler/","summary":"\u003cp\u003e\u003cem\u003eNote: This post was edited for clarity.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eFor the final JAPH in this series, I implemented a simple transpiler that converts a small subset of \u003ca href=\"https://www.scheme.org/\"\u003eScheme\u003c/a\u003e programs to equivalent Python programs. It starts with a Scheme program that prints \u003ccode\u003e'just another scheme hacker'\u003c/code\u003e.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" style=\"color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;\"\u003e\u003ccode class=\"language-scheme\" data-lang=\"scheme\"\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e(\u003cspan style=\"color:#ff7b72\"\u003edefine \u003c/span\u003e(\u003cspan style=\"color:#d2a8ff;font-weight:bold\"\u003eoutput\u003c/span\u003e \u003cspan style=\"color:#79c0ff\"\u003ex\u003c/span\u003e)\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e (\u003cspan style=\"color:#ff7b72\"\u003eif \u003c/span\u003e(null? \u003cspan style=\"color:#79c0ff\"\u003ex\u003c/span\u003e)\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#a5d6ff\"\u003e\u0026#34;\u0026#34;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e (\u003cspan style=\"color:#ff7b72\"\u003ebegin \u003c/span\u003e(display (car \u003cspan style=\"color:#79c0ff\"\u003ex\u003c/span\u003e))\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e (\u003cspan style=\"color:#ff7b72\"\u003eif \u003c/span\u003e(null? (cdr \u003cspan style=\"color:#79c0ff\"\u003ex\u003c/span\u003e))\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e (display \u003cspan style=\"color:#a5d6ff\"\u003e\u0026#34;\\n\u0026#34;\u003c/span\u003e)\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e (\u003cspan style=\"color:#ff7b72\"\u003ebegin \u003c/span\u003e(display \u003cspan style=\"color:#a5d6ff\"\u003e\u0026#34; \u0026#34;\u003c/span\u003e)\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e (\u003cspan style=\"color:#d2a8ff;font-weight:bold\"\u003eoutput\u003c/span\u003e (cdr \u003cspan style=\"color:#79c0ff\"\u003ex\u003c/span\u003e)))))))\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e(\u003cspan style=\"color:#d2a8ff;font-weight:bold\"\u003eoutput\u003c/span\u003e (list \u003cspan style=\"color:#a5d6ff\"\u003e\u0026#34;just\u0026#34;\u003c/span\u003e \u003cspan style=\"color:#a5d6ff\"\u003e\u0026#34;another\u0026#34;\u003c/span\u003e \u003cspan style=\"color:#a5d6ff\"\u003e\u0026#34;scheme\u0026#34;\u003c/span\u003e \u003cspan style=\"color:#a5d6ff\"\u003e\u0026#34;hacker\u0026#34;\u003c/span\u003e))\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThe program then tokenizes that Scheme source, parses the token stream, and converts that into Python 3.\u003c/p\u003e","title":"🐪 Reformed JAPHs: Transpiler"},{"content":"Note: This post was edited for clarity.\nThis JAPH uses a Turing machine. The machine accepts any string that ends in '\\n' and allows side effects. This lets us print the value of the tape as it encounters each character. While the idea of using lambda functions as side effects in a Turing machine is a little bizarre on many levels, we work with what we have. And Python is multi-paradigmatic, so what the heck.\nimport re def turing(tape, transitions): # The tape input comes in as a string. We approximate an infinite # length tape via a hash, so we need to convert this to {index: value} tape_hash = {i: x for i, x in enumerate(tape)} # Start at 0 using our transition matrix index = 0 state = 0 while True: value = tape_hash.get(index, \u0026#39;\u0026#39;) # This is a modified Turing machine: it uses regexen # and has side effects. Oh well, I needed IO. for rule in transitions[state]: regex, next, direction, new_value, side_effect = rule if re.match(regex, value): # Terminal states if new_value in (\u0026#39;YES\u0026#39;, \u0026#39;NO\u0026#39;): return new_value tape_hash[index] = new_value side_effect(value) index += direction state = next break assert \u0026#39;YES\u0026#39; == turing(\u0026#39;just another python hacker\\n\u0026#39;, [ # This Turing machine recognizes the language of strings that end in \\n. # Regex rule, next state, left/right = -1/+1, new value, side effect. [ # State 0: [r\u0026#39;^[a-z ]$\u0026#39;, 0, +1, \u0026#39;\u0026#39;, lambda x: print(x, end=\u0026#39;\u0026#39;)], [r\u0026#39;^\\n$\u0026#39;, 1, +1, \u0026#39;\u0026#39;, lambda x: print(x, end=\u0026#39;\u0026#39;)], [r\u0026#39;^.*$\u0026#39;, 0, +1, \u0026#39;NO\u0026#39;, None], ], [ # State 1: [r\u0026#39;^$\u0026#39;, 1, -1, \u0026#39;YES\u0026#39;, None] ] ]) Obfuscation again consists of converting the above code into lambda functions using Y combinators. This is a nice programming exercise, so I\u0026rsquo;ve left it out of this post in case anyone wants to try. The Turing machine has to return 'YES' to indicate that it accepts the string, thus the assertion. Our final obfuscated JAPH is a single expression.\nassert\u0026#39;\u0026#39;\u0026#39;YES\u0026#39;\u0026#39;\u0026#39;==(lambda g:(lambda f:g(lambda arg:f(f)(arg)))(lambda f:g( lambda arg: f(f)(arg))))(lambda f: lambda q:[(lambda g:(lambda f:g(lambda arg:f(f)(arg)))(lambda f: g(lambda arg:f(f)(arg))))(lambda f: lambda x:(x [0][0]if x[0] and __import__(\u0026#39;re\u0026#39;).match(x[0][0][0],x[1])else f([x[0][1:] ,x[1]]))) ([q[3][q[1]],q[2].get(q[0],\u0026#39;\u0026#39;)])[4](q[2].get(q[0],\u0026#39;\u0026#39;)), (lambda g:(lambda f:g(lambda arg:f(f)(arg))) (lambda f:g(lambda arg:f(f)(arg))))( lambda f:lambda x:(x[0][0]if x[0] and __import__(\u0026#39;re\u0026#39;).match(x[0][0][0],x [1])else f([x[0][1:],x[1]])))([q[3][q[1]],q[2].get(q[0],\u0026#39;\u0026#39;)])[3]if(lambda g:(lambda f:g(lambda arg:f(f)(arg))) (lambda f:g(lambda arg:f(f)(arg))))( lambda f:lambda x:(x[0][0]if x[0]and __import__(\u0026#39;re\u0026#39;).match(x[0][0][0],x[ 1]) else f([x[0][1:],x[1]])))([q[3][q[1]],q[2].get(q[0],\u0026#39;\u0026#39;)])[3]in(\u0026#39;YES\u0026#39;, \u0026#39;NO\u0026#39;)else f([q[0]+(lambda g:(lambda f:g(lambda arg:f(f)(arg)))(lambda f:g (lambda arg:f(f)(arg))))(lambda f:lambda x:(x[0][0]if x[0]and __import__( \u0026#39;re\u0026#39;).match(x[0][0][0],x[1])else f([x[0][1:], x[1]])))([q[3][q[1]], q[2]. get(q[0],\u0026#39;\u0026#39;)])[2],(lambda g:(lambda f:g(lambda arg: f(f)(arg)))(lambda f: g(lambda arg:f(f)(arg))))(lambda f:lambda x:(x[0][0]if x[0]and __import__ (\u0026#39;re\u0026#39;).match(x[0][0][0],x[1])else f([x[0][1:], x[1]])))([q[3][q[1]],q[2]. get(q[0],\u0026#39;\u0026#39;)])[1],q[2],q[3]])][1])([0,0,{i:x for i,x in enumerate(\u0026#39;just \u0026#39; \u0026#39;another python hacker\\n\u0026#39;)}, [[[r\u0026#39;^[a-z ]$\u0026#39;,0,+1,\u0026#39;\u0026#39;,lambda x:print(x,end= \u0026#39;\u0026#39;)], [r\u0026#39;^\\n$\u0026#39;,1,+1,\u0026#39;\u0026#39;,lambda x:print(x, end=\u0026#39;\u0026#39;)],[r\u0026#39;^.*$\u0026#39;,0,+1,\u0026#39;\u0026#39;\u0026#39;NO\u0026#39;\u0026#39;\u0026#39;, lambda x:None]], [[r\u0026#39;\u0026#39;\u0026#39;^$\u0026#39;\u0026#39;\u0026#39;,+1,-1,\u0026#39;\u0026#39;\u0026#39;YES\u0026#39;\u0026#39;\u0026#39;, lambda x: None or None]]]]) ","permalink":"https://ryanjoneil.dev/posts/2011-04-18-reformed-japhs-turing-machine/","summary":"\u003cp\u003e\u003cem\u003eNote: This post was edited for clarity.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eThis JAPH uses a \u003ca href=\"https://en.wikipedia.org/wiki/Turing_machine\"\u003eTuring machine\u003c/a\u003e. The machine accepts any string that ends in \u003ccode\u003e'\\n'\u003c/code\u003e and allows side effects. This lets us print the value of the tape as it encounters each character. While the idea of using lambda functions as side effects in a Turing machine is a little bizarre on many levels, we work with what we have. And Python is multi-paradigmatic, so what the heck.\u003c/p\u003e","title":"🐪 Reformed JAPHs: Turing Machine"},{"content":"Note: This post was edited for clarity.\nAt this point, tricking python into printing strings via indirect means got a little boring. So I switched to obfuscating fundamental computer science algorithms. Here\u0026rsquo;s a JAPH that takes in a Huffman coded version of 'just another python hacker', decodes, and prints it.\n# Build coding tree def build_tree(scheme): if scheme.startswith(\u0026#39;*\u0026#39;): left, scheme = build_tree(scheme[1:]) right, scheme = build_tree(scheme) return (left, right), scheme else: return scheme[0], scheme[1:] def decode(tree, encoded): ret = \u0026#39;\u0026#39; node = tree for direction in encoded: if direction == \u0026#39;0\u0026#39;: node = node[0] else: node = node[1] if isinstance(node, str): ret += node node = tree return ret tree = build_tree(\u0026#39;*****ju*sp*er***yct* h**ka*no\u0026#39;)[0] print( decode(tree, bin(10627344201836243859174935587).lstrip(\u0026#39;0b\u0026#39;).zfill(103)) ) The decoding tree is like a LISP-style sequence of pairs. '*' represents a branch in the tree while other characters are leaf nodes. This looks like the following.\n( ( ( ( (\u0026#39;j\u0026#39;, \u0026#39;u\u0026#39;), (\u0026#39;s\u0026#39;, \u0026#39;p\u0026#39;) ), (\u0026#39;e\u0026#39;, \u0026#39;r\u0026#39;) ), ( ( (\u0026#39;y\u0026#39;, \u0026#39;c\u0026#39;), \u0026#39;t\u0026#39; ), (\u0026#39; \u0026#39;, \u0026#39;h\u0026#39;) ) ), ( (\u0026#39;k\u0026#39;, \u0026#39;a\u0026#39;), (\u0026#39;n\u0026#39;, \u0026#39;o\u0026#39;) ) ) The actual Huffman coded version of our favorite string gets about 50% smaller represented in base-2.\n0000000001000100101011010111011101010111001000110110000110100001010111111110011001111010100110000100011 There\u0026rsquo;s a catch here, which is that this is hard to obfuscate unless we turn it into a single expression. This means that we have to convert build_tree and decode into lambda functions. Unfortunately, they are recursive and lambda functions recurse naturally. Fortunately, we can use Y combinators to get around the problem. These are worth some study since they will pop up again in future JAPHs.\nY = lambda g: ( lambda f: g(lambda arg: f(f)(arg))) (lambda f: g(lambda arg: f(f)(arg)) ) build_tree = Y( lambda f: lambda scheme: ( (f(scheme[1:])[0], f(f(scheme[1:])[1])[0]), f(f(scheme[1:])[1])[1 ] ) if scheme.startswith(\u0026#39;*\u0026#39;) else (scheme[0], scheme[1:]) ) decode = Y(lambda f: lambda x: x[3]+x[1] if not x[2] else ( f([x[0], x[0], x[2], x[3]+x[1]]) if isinstance(x[1], str) else ( f([x[0], x[1][0], x[2][1:], x[3]]) if x[2][0] == \u0026#39;0\u0026#39; else ( f([x[0], x[1][1], x[2][1:], x[3]]) ) ) )) tree = build_tree(\u0026#39;*****ju*sp*er***yct* h**ka*no\u0026#39;)[0] print( decode([ tree, tree, bin(10627344201836243859174935587).lstrip(\u0026#39;0b\u0026#39;).zfill(103), \u0026#39;\u0026#39; ]) ) The final version is a condensed (and expanded, oddly) version of the above.\nprint((lambda t,e,s:(lambda g:(lambda f:g(lambda arg:f(f)(arg)))(lambda f: g(lambda arg: f(f)(arg))))(lambda f:lambda x: x[3]+x[1]if not x[2]else f([ x[0],x[0],x[2],x[3]+x[1]])if isinstance(x[1],str)else f([x[0],x[1][0],x[2] [1:],x[3]])if x[2][0]==\u0026#39;0\u0026#39;else f([x[0],x[1][1],x[2][1:],x[3]]))([t,t,e,s]) )((lambda g:(lambda f:g(lambda arg:f(f)(arg)))(lambda f:g(lambda arg:f(f)( arg))))(lambda f:lambda p:((f(p[1:])[0],f(f(p[1:])[1])[0]),f(f(p[1:])[1])[ 1])if p.startswith(\u0026#39;*\u0026#39;)else(p[0],p[1:]))(\u0026#39;*****ju*sp*er***yct* h**ka*no\u0026#39;)[ 0],bin(10627344201836243859179756385-4820798).lstrip(\u0026#39;0b\u0026#39;).zfill(103),\u0026#39;\u0026#39;)) ","permalink":"https://ryanjoneil.dev/posts/2011-04-14-reformed-japhs-huffman-coding/","summary":"\u003cp\u003e\u003cem\u003eNote: This post was edited for clarity.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eAt this point, tricking \u003ccode\u003epython\u003c/code\u003e into printing strings via indirect means got a little boring. So I switched to obfuscating fundamental computer science algorithms. Here\u0026rsquo;s a JAPH that takes in a \u003ca href=\"https://en.wikipedia.org/wiki/Huffman_coding\"\u003eHuffman coded\u003c/a\u003e version of \u003ccode\u003e'just another python hacker'\u003c/code\u003e, decodes, and prints it.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" style=\"color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#8b949e;font-style:italic\"\u003e# Build coding tree\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#ff7b72\"\u003edef\u003c/span\u003e \u003cspan style=\"color:#d2a8ff;font-weight:bold\"\u003ebuild_tree\u003c/span\u003e(scheme):\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#ff7b72\"\u003eif\u003c/span\u003e scheme\u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e.\u003c/span\u003estartswith(\u003cspan style=\"color:#a5d6ff\"\u003e\u0026#39;*\u0026#39;\u003c/span\u003e):\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e left, scheme \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e build_tree(scheme[\u003cspan style=\"color:#a5d6ff\"\u003e1\u003c/span\u003e:])\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e right, scheme \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e build_tree(scheme)\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#ff7b72\"\u003ereturn\u003c/span\u003e (left, right), scheme\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#ff7b72\"\u003eelse\u003c/span\u003e:\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#ff7b72\"\u003ereturn\u003c/span\u003e scheme[\u003cspan style=\"color:#a5d6ff\"\u003e0\u003c/span\u003e], scheme[\u003cspan style=\"color:#a5d6ff\"\u003e1\u003c/span\u003e:]\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#ff7b72\"\u003edef\u003c/span\u003e \u003cspan style=\"color:#d2a8ff;font-weight:bold\"\u003edecode\u003c/span\u003e(tree, encoded):\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e ret \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e \u003cspan style=\"color:#a5d6ff\"\u003e\u0026#39;\u0026#39;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e node \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e tree\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#ff7b72\"\u003efor\u003c/span\u003e direction \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003ein\u003c/span\u003e encoded:\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#ff7b72\"\u003eif\u003c/span\u003e direction \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e==\u003c/span\u003e \u003cspan style=\"color:#a5d6ff\"\u003e\u0026#39;0\u0026#39;\u003c/span\u003e:\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e node \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e node[\u003cspan style=\"color:#a5d6ff\"\u003e0\u003c/span\u003e]\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#ff7b72\"\u003eelse\u003c/span\u003e:\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e node \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e node[\u003cspan style=\"color:#a5d6ff\"\u003e1\u003c/span\u003e]\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#ff7b72\"\u003eif\u003c/span\u003e isinstance(node, str):\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e ret \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e+=\u003c/span\u003e node\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e node \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e tree\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e \u003cspan style=\"color:#ff7b72\"\u003ereturn\u003c/span\u003e ret\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003etree \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e build_tree(\u003cspan style=\"color:#a5d6ff\"\u003e\u0026#39;*****ju*sp*er***yct* h**ka*no\u0026#39;\u003c/span\u003e)[\u003cspan style=\"color:#a5d6ff\"\u003e0\u003c/span\u003e]\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003eprint(\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e decode(tree, bin(\u003cspan style=\"color:#a5d6ff\"\u003e10627344201836243859174935587\u003c/span\u003e)\u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e.\u003c/span\u003elstrip(\u003cspan style=\"color:#a5d6ff\"\u003e\u0026#39;0b\u0026#39;\u003c/span\u003e)\u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e.\u003c/span\u003ezfill(\u003cspan style=\"color:#a5d6ff\"\u003e103\u003c/span\u003e))\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e)\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eThe decoding tree is like a LISP-style sequence of pairs. \u003ccode\u003e'*'\u003c/code\u003e represents a branch in the tree while other characters are leaf nodes. This looks like the following.\u003c/p\u003e","title":"🐪 Reformed JAPHs: Huffman Coding"},{"content":"Note: This post was updated to work with Python 3.12. It may not work with different versions.\nHere\u0026rsquo;s a JAPH composed solely for effect. For each letter in 'just another python hacker' it loops over each the characters ' abcdefghijklmnopqrstuvwxyz', printing each. Between characters it pauses for 0.05 seconds, backing up and moving on to the next if it hasn\u0026rsquo;t reached the desired one yet. This achieves a sort of rolling effect by which the final string appears on our screen over time.\nimport string import sys import time letters = \u0026#39; \u0026#39; + string.ascii_lowercase for l in \u0026#39;just another python hacker\u0026#39;: for x in letters: print(x, end=\u0026#39;\u0026#39;) sys.stdout.flush() time.sleep(0.05) if x == l: break else: print(\u0026#39;\\b\u0026#39;, end=\u0026#39;\u0026#39;) print() We locate and print each letter in the string with a list comprehension. At the end we have an extra line of code (the eval statement) that gives us our newline.\n[[(lambda x,l:str(print(x,end=\u0026#39;\u0026#39;))+str(__import__(print. __doc__[print.__doc__.index(\u0026#39;stdout\u0026#39;) - 4:print.__doc__. index(\u0026#39;stdout\u0026#39;)-1]).stdout.flush()) + str(__import__(\u0026#39;\u0026#39;. join(reversed(\u0026#39;emit\u0026#39;))).sleep(0o5*1.01/0x64))+str(print( \u0026#39;\\b\u0026#39;,end=\u0026#39;\\x09\u0026#39;.strip())if x!=l else\u0026#39;*\u0026amp;#\u0026#39;))(x1,l1)for x1 in(\u0026#39;\\x20\u0026#39;+getattr(__import__(type(\u0026#39;phear\u0026#39;).__name__+\u0026#39;in\u0026#39; \u0026#39;g\u0026#39;),dir(__import__(type(\u0026#39;snarf\u0026#39;).__name__+\u0026#39;ing\u0026#39;))[15])) [:(\u0026#39;\\x20\u0026#39;+getattr(__import__(type(\u0026#39;smear\u0026#39;).__name__+\u0026#39;in\u0026#39; \u0026#39;g\u0026#39;),dir(__import__(type(\u0026#39;slurp\u0026#39;).__name__+\u0026#39;ing\u0026#39;))[15])) .index(l1)+1]]for l1 in\u0026#39;\u0026#39;\u0026#39;just another python hacker\u0026#39;\u0026#39;\u0026#39;] eval(\u0026#39;\u0026#39;\u0026#39;\\x20\\x09eval(\u0026#34;\\x20\\x09eval(\u0026#39;\\x20 print()\u0026#39;)\u0026#34;)\u0026#39;\u0026#39;\u0026#39;) ","permalink":"https://ryanjoneil.dev/posts/2011-04-11-reformed-japhs-rolling-effect/","summary":"\u003cp\u003e\u003cem\u003eNote: This post was updated to work with Python 3.12. It may not work with different versions.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eHere\u0026rsquo;s a JAPH composed solely for effect. For each letter in \u003ccode\u003e'just another python hacker'\u003c/code\u003e it loops over each the characters \u003ccode\u003e' abcdefghijklmnopqrstuvwxyz'\u003c/code\u003e, printing each. Between characters it pauses for 0.05 seconds, backing up and moving on to the next if it hasn\u0026rsquo;t reached the desired one yet. This achieves a sort of rolling effect by which the final string appears on our screen over time.\u003c/p\u003e","title":"🐪 Reformed JAPHs: Rolling Effect"},{"content":"Note: This post was updated to work with Python 3.12. It may not work with different versions.\nNo series of JAPHs would be complete without ROT13. This is the example through which aspiring Perl programmers learn to use tr and its synonym y. In Perl the basic ROT13 JAPH starts as:\n$foo = \u0026#39;whfg nabgure crey unpxre\u0026#39;; $foo =~ y/a-z/n-za-m/; print $foo; Python has nothing quite so elegant in its default namespace. However, this does give us the opportunity to explore a little used aspect of strings: the translate method. If we construct a dictionary of ordinals we can accomplish the same thing with a touch more effort.\nimport string table = { ord(x): ord(y) for x, y in zip( string.ascii_lowercase, string.ascii_lowercase[13:] + string.ascii_lowercase ) } print(\u0026#39;whfg nabgure clguba unpxre\u0026#39;.translate(table)) We obfuscate the construction of this translation dictionary and, for added measure, use getattr to find the print function off of __builtins__. This will likely only work in Python 3.2, since the order of attributes on __builtins__ matters.\ngetattr(vars()[list(filter(lambda _:\u0026#39;\\x5f\\x62\u0026#39;in _,dir ()))[0]], dir(vars()[list(filter(lambda _:\u0026#39;\\x5f\\x62\u0026#39;in _, dir()))[0]])[list(filter(lambda _:_ [1].startswith( \u0026#39;\\x70\\x72\u0026#39;),enumerate(dir(vars()[list(filter(lambda _: \u0026#39;\\x5f\\x62\u0026#39;in _,dir()))[0]]))))[0][0]])(getattr(\u0026#39;whfg \u0026#39; +\u0026#39;\u0026#39;\u0026#39;nabgure clguba unpxre\u0026#39;\u0026#39;\u0026#39;, dir(\u0026#39;0o52\u0026#39;)[0o116])({ _: (_-0o124) %0o32 +0o141 for _ in range(0o141, 0o173)})) ","permalink":"https://ryanjoneil.dev/posts/2011-04-06-reformed-japhs-rot13/","summary":"\u003cp\u003e\u003cem\u003eNote: This post was updated to work with Python 3.12. It may not work with different versions.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eNo series of JAPHs would be complete without \u003ca href=\"https://en.wikipedia.org/wiki/ROT13\"\u003eROT13\u003c/a\u003e. This is the example through which aspiring Perl programmers learn to use \u003ccode\u003etr\u003c/code\u003e and its synonym \u003ccode\u003ey\u003c/code\u003e. In Perl the basic ROT13 JAPH starts as:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" style=\"color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;\"\u003e\u003ccode class=\"language-perl\" data-lang=\"perl\"\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#79c0ff\"\u003e$foo\u003c/span\u003e \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e \u003cspan style=\"color:#a5d6ff\"\u003e\u0026#39;whfg nabgure crey unpxre\u0026#39;\u003c/span\u003e;\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#79c0ff\"\u003e$foo\u003c/span\u003e \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=~\u003c/span\u003e y\u003cspan style=\"color:#79c0ff\"\u003e/a-z/\u003c/span\u003en\u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e-\u003c/span\u003eza\u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e-\u003c/span\u003e\u003cspan style=\"color:#79c0ff\"\u003em\u003c/span\u003e\u003cspan style=\"color:#f85149\"\u003e/;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#ff7b72\"\u003eprint\u003c/span\u003e \u003cspan style=\"color:#79c0ff\"\u003e$foo\u003c/span\u003e;\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003ePython has nothing quite so elegant in its default namespace. However, this does give us the opportunity to explore a little used aspect of strings: the translate method. If we construct a dictionary of ordinals we can accomplish the same thing with a touch more effort.\u003c/p\u003e","title":"🐪 Reformed JAPHs: ROT13"},{"content":"Here\u0026rsquo;s the second in my reformed JAPH series. It takes an anagram of 'just another python hacker' and converts it prior to printing. It sorts the anagram by the indices of another string, in order of their associated characters. This is sort of like a pre-digested Schwartzian transform.\nx = \u0026#39;upjohn tehran hectors katy\u0026#39; y = \u0026#39;1D0HG6JFO9P5ICKAM87B24NL3E\u0026#39; print(\u0026#39;\u0026#39;.join(x[i] for i in sorted(range(len(x)), key=lambda p: y[p]))) Obfuscation consists mostly of using silly machinations to construct the string we use to sort the anagram.\nprint(\u0026#39;\u0026#39;.join(\u0026#39;\u0026#39;\u0026#39;upjohn tehran hectors katy\u0026#39;\u0026#39;\u0026#39;[_]for _ in sorted(range (26),key=lambda p:(hex(29)[2:].upper()+str(3*3*3*3-3**4)+\u0026#39;HG\u0026#39;+str(sum( range(4)))+\u0026#39;JFO\u0026#39;+str((1+2)**(1+1))+\u0026#39;P\u0026#39;+str(35/7)[:1]+\u0026#39;i.c.k.\u0026#39;.replace( \u0026#39;.\u0026#39;,\u0026#39;\u0026#39;).upper()+\u0026#39;AM\u0026#39;+str(3**2*sum(range(5))-3)+hex(0o5444)[2:].replace (*\u0026#39;\\x62|\\x42\u0026#39;.split(\u0026#39;|\u0026#39;))+\u0026#39;NL\u0026#39;+hex(0o076).split(\u0026#39;x\u0026#39;)[1].upper())[p]))) ","permalink":"https://ryanjoneil.dev/posts/2011-04-03-reformed-japhs-ridiculous-anagram/","summary":"\u003cp\u003eHere\u0026rsquo;s the second in my reformed JAPH series. It takes an anagram of \u003ccode\u003e'just another python hacker'\u003c/code\u003e and converts it prior to printing. It sorts the anagram by the indices of another string, in order of their associated characters. This is sort of like a pre-digested \u003ca href=\"https://en.wikipedia.org/wiki/Schwartzian_transform\"\u003eSchwartzian transform\u003c/a\u003e.\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" style=\"color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003ex \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e \u003cspan style=\"color:#a5d6ff\"\u003e\u0026#39;upjohn tehran hectors katy\u0026#39;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003ey \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e \u003cspan style=\"color:#a5d6ff\"\u003e\u0026#39;1D0HG6JFO9P5ICKAM87B24NL3E\u0026#39;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003eprint(\u003cspan style=\"color:#a5d6ff\"\u003e\u0026#39;\u0026#39;\u003c/span\u003e\u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e.\u003c/span\u003ejoin(x[i] \u003cspan style=\"color:#ff7b72\"\u003efor\u003c/span\u003e i \u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003ein\u003c/span\u003e sorted(range(len(x)), key\u003cspan style=\"color:#ff7b72;font-weight:bold\"\u003e=\u003c/span\u003e\u003cspan style=\"color:#ff7b72\"\u003elambda\u003c/span\u003e p: y[p])))\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eObfuscation consists mostly of using silly machinations to construct the string we use to sort the anagram.\u003c/p\u003e","title":"🐪 Reformed JAPHs: Ridiculous Anagram"},{"content":"Note: This post was edited for clarity.\nMany years ago, I was a Perl programmer. Then one day I became disillusioned at the progress of Perl 6 and decided to import this.\nThis seems to be a fairly common story for Perl to Python converts. While I haven\u0026rsquo;t looked back much, there are a number of things I really miss about perl (lower case intentional). I miss having value types in a dynamic language, magical and ill-advised use of cryptocontext, and sometimes even pseudohashes because they were inexcusably weird. A language that supports so many ideas out of the box enables an extended learning curve that lasts for many years. \u0026ldquo;Perl itself is the game.\u0026rdquo;\nMost of all I think I miss writing Perl poetry and JAPHs. Sadly, I didn\u0026rsquo;t keep any of those I wrote, and I\u0026rsquo;m not competent enough with the language anymore to write interesting ones. At the time I was intentionally distancing myself from a model that was largely implicit and based on archaic systems internals and moving to one that was (supposedly) explicit and simple.\nAfter switching to Python as my primary language, I used the following email signature in a nod to this change in orientation (intended for Python 2):\nprint \u0026#39;just another python hacker\u0026#39; Recently I\u0026rsquo;ve been experimenting with writing JAPHs in Python. I think of these as \u0026ldquo;reformed JAPHs.\u0026rdquo; They accomplish the same purpose as programming exercises but in a more restricted context. In some ways they are more challenging. Creativity can be difficult in a narrowly defined landscape.\nI have written a small series of reformed JAPHs which increase monotonically in complexity. Here is the first one, written in plain understandable Python 3.\nimport string letters = string.ascii_lowercase + \u0026#39; \u0026#39; indices = [ 9, 20, 18, 19, 26, 0, 13, 14, 19, 7, 4, 17, 26, 15, 24, 19, 7, 14, 13, 26, 7, 0, 2, 10, 4, 17 ] print(\u0026#39;\u0026#39;.join(letters[i] for i in indices)) This is fairly simple. Instead of explicitly embedding the string 'just another python hacker' in the program, we assemble it using the index of its letters in the string 'abcdefghijklmnopqrstuvwxyz '. We then obfuscate through a series of minor measures:\nInstead of calling the print function, we import sys and make a call to sys.stdout.write. We assemble string.lowercase + ' ' by joining together the character versions of its respective ordinal values (97 to 123 and 32). We join together the integer indices using 'l' and split that into a list. We apply ''' liberally and rely on the fact that python concatenates adjacent strings. Here\u0026rsquo;s the obfuscated version:\neval(\u0026#34;__import__(\u0026#39;\u0026#39;\u0026#39;\\x73\u0026#39;\u0026#39;\u0026#39;\u0026#39;\u0026#39;\u0026#39;\\x79\u0026#39;\u0026#39;\u0026#39;\u0026#39;\u0026#39;\u0026#39;\\x73\u0026#39;\u0026#39;\u0026#39;).sTdOuT\u0026#34;.lower() ).write(\u0026#39;\u0026#39;.join(map(lambda _:(list(map(chr,range(97,123)))+[chr( 32)])[int(_)],(\u0026#39;\u0026#39;\u0026#39;9l20l18l19\u0026#39;\u0026#39;\u0026#39;\u0026#39;\u0026#39;\u0026#39;l26l0l13l14l19l7l4l17l26l15\u0026#39;\u0026#39;\u0026#39; \u0026#39;\u0026#39;\u0026#39;l24l19l7l14l1\u0026#39;\u0026#39;\u0026#39;\u0026#39;\u0026#39;\u0026#39;3l26l7l0l2l10l4l17\u0026#39;\u0026#39;\u0026#39;).split(\u0026#39;l\u0026#39;)))+\u0026#39;\\n\u0026#39;,) We could certainly do more, but that\u0026rsquo;s where I left this one. Stay tuned for the next JAPH.\n","permalink":"https://ryanjoneil.dev/posts/2011-04-01-reformed-japhs-alphabetic-indexing/","summary":"\u003cp\u003e\u003cem\u003eNote: This post was edited for clarity.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eMany years ago, I was a Perl programmer. Then one day I became disillusioned at the progress of Perl 6 and decided to \u003ca href=\"https://www.python.org/dev/peps/pep-0020/\"\u003eimport this\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eThis seems to be a fairly common story for Perl to Python converts. While I haven\u0026rsquo;t looked back much, there are a number of things I really miss about \u003ccode\u003eperl\u003c/code\u003e \u003cem\u003e(lower case intentional)\u003c/em\u003e. I miss having value types in a dynamic language, magical and ill-advised use of \u003ca href=\"https://www.foo.be/docs/tpj/issues/vol3_1/tpj0301-0003.html\"\u003ecryptocontext\u003c/a\u003e, and sometimes even \u003ca href=\"https://web.archive.org/web/20040712204117/https://perldesignpatterns.com/?PseudoHash\"\u003epseudohashes\u003c/a\u003e because they were inexcusably weird. A language that supports so many ideas out of the box enables an extended learning curve that lasts for \u003ca href=\"https://web.archive.org/web/20020607034341/https://silver.sucs.org/~manic/humour/languages/perlhacker.htm\"\u003emany years\u003c/a\u003e. \u0026ldquo;Perl itself is the game.\u0026rdquo;\u003c/p\u003e","title":"🐪 Reformed JAPHs: Alphabetic Indexing"},{"content":"I hope you saw \u0026ldquo;China’s way to the top\u0026rdquo; on the Post\u0026rsquo;s website recently. It\u0026rsquo;s a very clear presentation of their statement and is certainly worth a look.\nSo say you\u0026rsquo;re an economist and you actually do need to produce a realistic estimate of when China\u0026rsquo;s GDP surpasses that of the USA. Can you use such an approach? Not really. There are several simplifying assumptions the Post made that are perfectly reasonable. However, if the goal is an analytical output from a highly random system such as GDP growth, one should not assume the inputs are fixed. (I\u0026rsquo;m not saying I have any gripe with their interactive. This post has a different purpose.)\nWhy is this? The short answer is that randomness in any system can change its output drastically from one run to the next. Even if the mean from a deterministic analysis is correct, it tells us nothing about the variance of our output. We really need a confidence interval of years when China is likely to overtake the USA.\nWe\u0026rsquo;ll move in the great tradition of all simulation studies. First we pepare our input. A CSV of GDP in current US dollars for both countries from 1960 to 2009 is available from the World Bank data files. We read this into a data frame and calculate their growth rates year over year. Note that the first value for growth has to be NA.\ngdp \u0026lt;- read.csv(\u0026#39;gdp.csv\u0026#39;) gdp$USA.growth \u0026lt;- rep(NA, length(gdp$USA)) gdp$China.growth \u0026lt;- rep(NA, length(gdp$China)) for (i in 2:length(gdp$USA)) { gdp$USA.growth[i] \u0026lt;- 100 * (gdp$USA[i] - gdp$USA[i-1]) / gdp$USA[i-1] gdp$China.growth[i] \u0026lt;- 100 * (gdp$China[i] - gdp$China[i-1]) / gdp$China[i-1] } We now analyze our inputs and assign probability distributions to the annual growth rates. In a full study this would involve comparing a number of different distributions and choosing the one that fits the input data best, but that\u0026rsquo;s well beyond the scope of this post. Instead, we\u0026rsquo;ll use the poor man\u0026rsquo;s way out: plot histograms and visually verify what we hope to be true, that the distributions are normal.\nAnd they pretty much are. That\u0026rsquo;s good enough for our purposes. Now all we need are the distribution parameters, which are mean and standard deviation for normal distributions.\n\u0026gt; mean(gdp$USA.growth[!is.na(gdp$USA.growth)]) [1] 7.00594 \u0026gt; sd(gdp$USA.growth[!is.na(gdp$USA.growth)]) [1] 2.889808 \u0026gt; mean(gdp$China.growth[!is.na(gdp$China.growth)]) [1] 9.90896 \u0026gt; sd(gdp$China.growth[!is.na(gdp$China.growth)]) [1] 10.5712\u0026lt;/code\u0026gt;\u0026lt;/pre\u0026gt; Now our input analysis is done. These are the inputs:\n$$ \\begin{align*} \\text{USA Growth} \u0026amp;\\sim \\mathcal{N}(7.00594, 2.889808^2)\\\\ \\text{China Growth} \u0026amp;\\sim \\mathcal{N}(9.90896, 10.5712^2) \\end{align*} $$\nThis should make the advantage of such an approach much more obvious. Compare the standard deviations for the two countries. China is a lot more likely to have negative GDP growth in any given year. They\u0026rsquo;re also more likely to have astronomical growth.\nWe now build and run our simulation study. The more times we run the simulation the tighter we can make our confidence interval (to a point), so we\u0026rsquo;ll pick a pretty big number somewhat arbitrarily. If we want to, we can be fairly scientific about determining how many iterations are necessary after we\u0026rsquo;ve done some runs, but we have to start somewhere.\nrepetitions \u0026lt;- 10000 This is the code for our simulation. For each iteration, it starts both countries at their 2009 GDPs. It then iterates, changing GDP randomly until China\u0026rsquo;s GDP is at least the same value as the USA\u0026rsquo;s. When that happens, it records the current year.\nresults \u0026lt;- rep(NA, repetitions) for (i in 1:repetitions) { usa \u0026lt;- gdp$USA[length(gdp$USA)] china \u0026lt;- gdp$China[length(gdp$China)] year \u0026lt;- gdp$Year[length(gdp$Year)] while (TRUE) { year \u0026lt;- year + 1 usa.growth \u0026lt;- rnorm(1, 7.00594, 2.889808) china.growth \u0026lt;- rnorm(1, 9.90896, 10.5712) usa \u0026lt;- usa * (1 + (usa.growth / 100)) china \u0026lt;- china * (1 + (china.growth / 100)) if (china \u0026gt;= usa) { results[i] \u0026lt;- year break } } } From the results vector we see that, given the data and assumptions for this model, China should surpass the USA in 2058. We also see that we can be 95% confident that the mean year this will happen is between 2057 and 2059. This is not quite the same as saying we are confident this will actually happen between those years. The result of our simulation is a probability distribution and we are discovering information about it.\n\u0026gt; mean(results) [1] 2058.494 \u0026gt; mean(results) + (sd(results) / sqrt(length(results)) * qnorm(0.025)) [1] 2057.873 \u0026gt; mean(results) + (sd(results) / sqrt(length(results)) * qnorm(0.975)) [1] 2059.114\u0026lt;/code\u0026gt;\u0026lt;/pre\u0026gt; So what\u0026rsquo;s wrong with this model? Well, we had to make a number of assumptions:\nWe assume we actually used the right data set. This was more of a how-to than a proper analysis, so that wasn\u0026rsquo;t too much of a concern. We assume future growth for the next 40-50 years resembles past growth from 1960-2009. This is a bit ridiculous, of course, but that\u0026rsquo;s the problem with forecasting. *We assume growth is normally distributed and that we don\u0026rsquo;t encounter heavy-tailed behaviors in our distributions. We assume each year\u0026rsquo;s growth is independent of the year before it. See the last exercise. Here are some good simulation exercises if you\u0026rsquo;re looking to do more:\nNote how the outputs are quite a bit different from the Post graphic. I expect that\u0026rsquo;s largely due to the inclusion of data back to 1960. Try running the simulation for yourself using just the past 10, 20, and 30 years and see how that changes the result.\u0026lt; Write a simulation to determine the probability China\u0026rsquo;s GDP surpasses the USA\u0026rsquo;s in the next 25 years. Now plot the mean GDP and 95% confidence intervals for each country per year. Assume that there are actually two distributions for growth for each country: one when the previous year had positive growth and another when it was negative. How does that change the output? ","permalink":"https://ryanjoneil.dev/posts/2011-02-23-simulating-gdp-growth/","summary":"\u003cp\u003eI hope you saw \u003ca href=\"https://www.washingtonpost.com/wp-srv/special/business/china-growth/\"\u003e\u0026ldquo;China’s way to the top\u0026rdquo;\u003c/a\u003e on the Post\u0026rsquo;s website recently. It\u0026rsquo;s a very clear presentation of their statement and is certainly worth a look.\u003c/p\u003e\n\u003cp\u003eSo say you\u0026rsquo;re an economist and you actually do need to produce a realistic estimate of when China\u0026rsquo;s GDP surpasses that of the USA. Can you use such an approach? Not really. There are several simplifying assumptions the Post made that are perfectly reasonable. However, if the goal is an analytical output from a highly random system such as GDP growth, one should not assume the inputs are fixed. \u003cem\u003e(I\u0026rsquo;m not saying I have any gripe with their interactive. This post has a different purpose.)\u003c/em\u003e\u003c/p\u003e","title":"📈 Simulating GDP Growth"},{"content":"Note: This post was updated to include an example data file.\nI thought it might be useful to follow up the last post with another one showing the same examples in R.\nR provides a function called lm, which is similar in spirit to NumPy\u0026rsquo;s linalg.lstsq. As you\u0026rsquo;ll see, lm\u0026rsquo;s interface is a bit more tuned to the concepts of modeling.\nWe begin by reading in the example CSV into a data frame:\nresponses \u0026lt;- read.csv(\u0026#39;example_data.csv\u0026#39;) responses respondent vanilla.love strawberry.love chocolate.love dog.love cat.love 1 Aylssa 9 4 9 9 9 2 Ben8 8 6 4 10 4 3 Cy 9 4 8 2 6 4 Eva 3 7 9 4 6 5 Lem 6 8 5 2 5 6 Louis 4 5 3 10 3 A data frame is sort of like a matrix, but with named columns. That is, we can refer to entire columns using the dollar sign. We are now ready to run least squares. We\u0026rsquo;ll create the model for predicting \u0026ldquo;dog love.\u0026rdquo; To create the \u0026ldquo;cat love\u0026rdquo; model, simply use that column name instead:\nfit1 \u0026lt;- lm( responses$dog.love ~ responses$vanilla.love + responses$strawberry.love + responses$chocolate.love ) The syntax for lm is a little off-putting at first. This call tells it to create a model for \u0026ldquo;dog love\u0026rdquo; with respect to (the ~) a function of the form offset + x1 * vanilla love + x2 * strawberry love + x3 * chocolate love. Note that the offset is conveniently implied when using lm, so this is the same as the second model we created in Python. Now that we\u0026rsquo;ve computed the coefficients for our \u0026ldquo;dog love\u0026rdquo; model, we can ask R about it:\nsummary(fit1) Call: lm(formula = responses$dog.love ~ responses$vanilla.love + responses$strawberry.love + responses$chocolate.love) Residuals: 1 2 3 4 5 6 3.1827 2.9436 -4.5820 0.8069 -1.9856 -0.3657 Coefficients: Estimate Std. Error t value Pr(\u0026gt;|t|) (Intercept) 20.9298 15.0654 1.389 0.299 responses$vanilla.love -0.2783 0.9934 -0.280 0.806 responses$strawberry.love -1.4314 1.5905 -0.900 0.463 responses$chocolate.love -0.7647 0.8214 -0.931 0.450 Residual standard error: 4.718 on 2 degrees of freedom Multiple R-squared: 0.4206, Adjusted R-squared: -0.4485 F-statistic: 0.484 on 3 and 2 DF, p-value: 0.7272 This gives us quite a bit of information, including the coefficients for our \u0026ldquo;dog love\u0026rdquo; model and various error metrics. You can find the offset and coefficients under the Estimate column above. We quickly verify this using R\u0026rsquo;s vectorized arithmetic:\n20.9298 - 0.2783 * responses$vanilla.love - 1.4314 * responses$strawberry.love - 0.7647 * responses$chocolate.love [1] 5.8172 7.0562 6.5819 3.1928 3.9853 10.3655 You\u0026rsquo;ll notice the model is essentially the same as the one we got from NumPy. Our next step is to add in the squared inputs. We do this by adding extra terms to the modeling formula. The I() function allows us to easily add additional operators to columns. That\u0026rsquo;s how we accomplish the squaring. We could alternatively add squared input values to the data frame, but using I() is more convenient and natural.\nfit2 \u0026lt;- lm(responses$dog.love ~ responses$vanilla.love + I(responses$vanilla.love^2) + responses$strawberry.love + I(responses$strawberry.love^2) + responses$chocolate.love + I(responses$chocolate.love^2)) summary(fit2) Call: lm(formula = responses$dog.love ~ responses$vanilla.love + I(responses$vanilla.love^2) + responses$strawberry.love + I(responses$strawberry.love^2) + responses$chocolate.love + I(responses$chocolate.love^2)) Residuals: ALL 6 residuals are 0: no residual degrees of freedom! Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(\u0026gt;|t|) (Intercept) -357.444 NaN NaN NaN responses$vanilla.love 72.444 NaN NaN NaN I(responses$vanilla.love^2) -6.111 NaN NaN NaN responses$strawberry.love 59.500 NaN NaN NaN I(responses$strawberry.love^2) -5.722 NaN NaN NaN responses$chocolate.love 7.000 NaN NaN NaN I(responses$chocolate.love^2) NA NA NA NA Residual standard error: NaN on 0 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: NaN F-statistic: NaN on 5 and 0 DF, p-value: NA We can see that we get the same \u0026ldquo;dog love\u0026rdquo; model as produced by the third Python version of the last post. Again, we quickly verify that the output is the same (minus some rounding errors):\n-357.444 + 72.444 * responses$vanilla.love - 6.111 * responses$vanilla.love^2 + 59.5 * responses$strawberry.love - 5.722 * responses$strawberry.love^2 + 7 * responses$chocolate.love [1] 9.009 10.012 2.009 4.011 2.016 10.006 ","permalink":"https://ryanjoneil.dev/posts/2011-02-16-data-fitting-2a-very-very-simple-linear-regression-in-r/","summary":"\u003cp\u003e\u003cem\u003eNote: This post was updated to include an example data file.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eI thought it might be useful to follow up the \u003ca href=\"../2011-02-15-data-fitting-2-very-very-simple-linear-regression-in-python/\"\u003elast post\u003c/a\u003e with another one showing the same examples in R.\u003c/p\u003e\n\u003cp\u003eR provides a function called \u003ccode\u003elm\u003c/code\u003e, which is similar in spirit to \u003ca href=\"https://numpy.org/\"\u003eNumPy\u003c/a\u003e\u0026rsquo;s \u003ccode\u003elinalg.lstsq\u003c/code\u003e. As you\u0026rsquo;ll see, \u003ccode\u003elm\u003c/code\u003e\u0026rsquo;s interface is a bit more tuned to the concepts of modeling.\u003c/p\u003e\n\u003cp\u003eWe begin by reading in the \u003ca href=\"/files/2011-02-16-data-fitting-2a-very-very-simple-linear-regression-in-r/example_data.csv\"\u003eexample CSV\u003c/a\u003e into a data frame:\u003c/p\u003e","title":"🧐 Data Fitting 2a - Very, Very Simple Linear Regression in R"},{"content":"This post is based on a memo I sent to some former colleagues at the Post. I\u0026rsquo;ve edited it for use here since it fits well as the second in a series on simple data fitting techniques. If you\u0026rsquo;re among the many enlightened individuals already using regression analysis, then this post is probably not for you. If you aren\u0026rsquo;t, then hopefully this provides everything you need to develop rudimentary predictive models that yield surprising levels of accuracy.\nData For purposes of a simple working example, we have collected six records of input data over three dimensions with the goal of predicting two outputs. The input data are:\n$$ \\begin{align*} x_1 \u0026amp;= \\text{How much a respondent likes vanilla [0-10]}\\\\ x_2 \u0026amp;= \\text{How much a respondent likes strawberry [0-10]}\\\\ x_3 \u0026amp;= \\text{How much a respondent likes chocolate [0-10]} \\end{align*} $$\nOutput data consist of:\n$$ \\begin{align*} b_1 \u0026amp;= \\text{How much a respondent likes dogs [0-10]}\\\\ b_2 \u0026amp;= \\text{How much a respondent likes cats [0-10]} \\end{align*} $$\nBelow are anonymous data collected from a random sample of people.\nrespondent vanilla ❤️ strawberry ❤️ chocolate ❤️ dog ❤️ cat ❤️ Alyssa P Hacker 9 4 9 9 8 Ben Bitdiddle 8 6 4 10 4 Cy D. Fect 9 4 8 2 6 Eva Lu Ator 3 7 9 4 6 Lem E. Tweakit 6 8 5 2 5 Louis Reasoner 4 5 3 10 3 Our input is in three dimensions. Each output requires its own model, so we\u0026rsquo;ll have one for dogs and one for cats. We\u0026rsquo;re looking for functions, dog(x) and cat(x), that can predict $b_1$ and $b_2$ based on given values of $x_1$, $x_2$, and $x_3$.\nModel 1 For both models we want to find parameters that minimize their squared residuals (read: errors). There\u0026rsquo;s a number of names for this. Optimization folks like to think of it as unconstrained quadratic optimization, but it\u0026rsquo;s more common to call it least squares or linear regression. It\u0026rsquo;s not necessary to entirely understand why for our purposes, but the function that minimizes these errors is:\n$$\\beta = ({A^t}A)^{-1}{A^t}b$$\nThis is implemented for you in the numpy.linalg Python package, which we\u0026rsquo;ll use for examples. Much more information than you probably want can be found here.\nBelow is a first stab at a Python version. It runs least squares against our input and output data exactly as they are. You can see the matrix $A$ and outputs $b_1$ and $b_2$ (dog and cat love, respectively) are represented just as they are in the table.\n# Version 1: No offset, no squared inputs import numpy A = numpy.vstack([ [9, 4, 9], [8, 6, 4], [9, 4, 8], [3, 7, 9], [6, 8, 5], [4, 5, 3] ]) b1 = numpy.array([9, 10, 2, 4, 2, 10]) b2 = numpy.array([9, 4, 6, 6, 5, 3]) print(\u0026#39;dog ❤️:\u0026#39;, numpy.linalg.lstsq(A, b1, rcond=None)[0]) print(\u0026#39;cat ❤️:\u0026#39;, numpy.linalg.lstsq(A, b2, rcond=None)[0]) # Output: # dog ❤️: [0.72548294 0.53045642 -0.29952361] # cat ❤️: [2.36110929e-01 2.61934385e-05 6.26892476e-01] The resulting model is:\ndog(x) = 0.72548294 * x1 + 0.53045642 * x2 - 0.29952361 * x3 cat(x) = 2.36110929e-01 * x1 + 2.61934385e-05 * x2 + 6.26892476e-01 * x3 The coefficients before our variables correspond to beta in the formula above. Errors between observed and predicted data, shown below, are calculated and summed. For these six records, dog(x) has a total error of 20.76 and cat(x) has 3.74. Not great.\nrespondent predicted b1 b1 error predicted b2 b2 error Alyssa P Hacker 5.96 3.04 7.77 1.23 Ben Bitdiddle 7.79 2.21 4.40 0.40 Cy D. Fect 6.25 4.25 7.14 1.14 Eva Lu Ator 3.19 0.81 6.35 0.35 Lem E. Tweakit 7.10 5.10 4.55 0.45 Louis Reasoner 4.66 5.34 2.83 0.17 Total error: 20.76 3.74 Model 2 One problem with this model is that dog(x) and cat(x) are forced to pass through the origin. (Why is that?) We can improve it somewhat if we add an offset. This amounts to prepending 1 to every row in $A$ and adding a constant to the resulting functions. You can see the very slight difference between the code for this model and that of the previous:\n# Version 2: Offset, no squared inputs import numpy A = numpy.vstack([ [1, 9, 4, 9], [1, 8, 6, 4], [1, 9, 4, 8], [1, 3, 7, 9], [1, 6, 8, 5], [1, 4, 5, 3] ]) print(\u0026#39;dog ❤️:\u0026#39;, numpy.linalg.lstsq(A, b1, rcond=None)[0]) print(\u0026#39;cat ❤️:\u0026#39;, numpy.linalg.lstsq(A, b2, rcond=None)[0]) # Output: # dog ❤️: [20.92975427 -0.27831197 -1.43135684 -0.76469017] # cat ❤️: [-0.31744124 0.25133547 0.02978098 0.63394765] This yields the seconds version of our models:\ndog(x) = 20.92975427 - 0.27831197 * x1 - 1.43135684 * x2 - 0.76469017 * x3 cat(x) = -0.31744124 + 0.25133547 * x1 + 0.02978098 * x2 + 0.63394765 * x3 These models provide errors of 13.87 and 3.79. A little better on the dog side, but still not quite usable.\nrespondent predicted b1 b1 error predicted b2 b2 error Alyssa P Hacker 5.82 3.18 7.77 1.23 Ben Bitdiddle 7.06 2.94 4.41 0.41 Cy D. Fect 6.58 4.58 7.14 1.14 Eva Lu Ator 3.19 0.81 6.35 0.35 Lem E. Tweakit 3.99 1.99 4.60 0.40 Louis Reasoner 10.37 0.37 2.74 0.26 Total error: 13.87 3.79 Model 3 The problem is that dog(x) and cat(x) are linear functions. Most observed data don\u0026rsquo;t conform to straight lines. Take a moment and draw the line $f(x) = x$ and the curve $f(x) = x^2$. The former makes a poor approximation of the latter.\nMost of the time, people just use squares of the input data to add curvature to their models. We do this in our next version of the code by just adding squares of the input row values to our $A$ matrix. Everything else is the same. (In reality, you can add any function of the input data you feel best models the data, if you understand it well enough.)\n# Version 3: Offset with squared inputs import numpy A = numpy.vstack([ [1, 9, 9**2, 4, 4**2, 9, 9**2], [1, 8, 8**2, 6, 6**2, 4, 4**2], [1, 9, 9**2, 4, 4**2, 8, 8**2], [1, 3, 3**2, 7, 7**2, 9, 9**2], [1, 6, 6**2, 8, 8**2, 5, 5**2], [1, 4, 4**2, 5, 5**2, 3, 3**2] ]) b1 = numpy.array([9, 10, 2, 4, 2, 10]) b2 = numpy.array([9, 4, 6, 6, 5, 3]) print(\u0026#39;dog ❤️:\u0026#39;, numpy.linalg.lstsq(A, b1, rcond=None)[0]) print(\u0026#39;cat ❤️:\u0026#39;, numpy.linalg.lstsq(A, b2, rcond=None)[0]) # dog ❤️: [1.29368307 7.03633306 -0.44795498 9.98093332 # -0.75689575 -19.00757486 1.52985734] # cat ❤️: [0.47945896 5.30866067 -0.39644128 -1.28704188 # 0.12634295 -4.32392606 0.43081918] This gives us our final version of the model:\ndog(x) = 1.29368307 + 7.03633306 * x1 - 0.44795498 * x1**2 + 9.98093332 * x2 - 0.75689575 * x2**2 - 19.00757486 * x3 + 1.52985734 * x3**2 cat(x) = 0.47945896 + 5.30866067 * x1 - 0.39644128 * x1**2 - 1.28704188 * x2 + 0.12634295 * x2**2 - 4.32392606 * x3 + 0.43081918 * x3**2 Adding curvature to our model eliminates all perceived error, at least within 1e-16. This may seem unbelievable, but when you consider that we only have six input records, it isn\u0026rsquo;t really.\nrespondent predicted b1 b1 error predicted b2 b2 error Alyssa P Hacker 9 0 9 0 Ben Bitdiddle 10 0 4 0 Cy D. Fect 2 0 6 0 Eva Lu Ator 4 0 6 0 Lem E. Tweakit 2 0 5 0 Louis Reasoner 10 0 3 0 Total error: 0 0 It should be fairly obvious how one can take this and extrapolate to much larger models. I hope this is useful and that least squares becomes an important part of your lives.\n","permalink":"https://ryanjoneil.dev/posts/2011-02-15-data-fitting-2-very-very-simple-linear-regression-in-python/","summary":"\u003cp\u003eThis post is based on a memo I sent to some former colleagues at the Post. I\u0026rsquo;ve edited it for use here since it fits well as the second in a series on simple data fitting techniques. If you\u0026rsquo;re among the many enlightened individuals already using regression analysis, then this post is probably not for you. If you aren\u0026rsquo;t, then hopefully this provides everything you need to develop rudimentary predictive models that yield surprising levels of accuracy.\u003c/p\u003e","title":"🧐 Data Fitting 2 - Very, Very Simple Linear Regression in Python"},{"content":"Consider this scenario: You run a contest that accepts votes from the general Internet population. In order to encourage user engagement, you record any and all votes into a database over several days, storing nothing more than the competitor voted for, when each vote is cast, and a cookie set on the voter\u0026rsquo;s computer along with their apparent IP addresses. If a voter already has a recorded cookie set they are denied subsequent votes. This way you can avoid requiring site registration, a huge turnoff for your users. Simple enough.\nUnfortunately, some of the competitors are wily and attached to the idea of winning. They go so far as programming or hiring bots to cast thousands of votes for them. Your manager wants to know which votes are real and which ones are fake Right Now. Given very limited time, and ignoring actions that you could have taken to avoid the problem, how can you tell apart sets of good votes from those that shouldn\u0026rsquo;t be counted?\nOne quick-and-dirty option involves comparing histograms of interarrival times for sets of votes. Say you\u0026rsquo;re concerned that all the votes during a particular period of time or from a given IP address might be fraudulent. Put all the vote times you\u0026rsquo;re concerned about into a list, sort them, and compute their differences:\n# times is a list of datetime instances from vote records times.sort(reversed=True) interarrivals = [y-x for x, y in zip(times, times[1:]] Now use matplotlib to display a histogram of these. Votes that occur naturally are likely to resemble an exponential distribution in their interarrival times. For instance, here are interarrival times for all votes received in a contest:\nThis subset of votes is clearly fraudulent, due to the near determinism of their interarrival times. This is most likely caused by the voting bot not taking random sleep intervals during voting. It casts a vote, receives a response, clears its cookies, and repeats:\nThese votes, on the other hand, are most likely legitimate. They exhibit a nice Erlang shape and appear to have natural interarrival times that one would expect:\nOf course this method is woefully inadequate for rigorous detection of voting fraud. Ideally one would find a method to compute the probability that a set of votes is generated by a bot. This is enough to inform quick, ad hoc decisions though.\n","permalink":"https://ryanjoneil.dev/posts/2010-11-30-off-the-cuff-voter-fraud-detection/","summary":"\u003cp\u003eConsider this scenario: You run a contest that accepts votes from the general Internet population. In order to encourage user engagement, you record any and all votes into a database over several days, storing nothing more than the competitor voted for, when each vote is cast, and a cookie set on the voter\u0026rsquo;s computer along with their apparent IP addresses. If a voter already has a recorded cookie set they are denied subsequent votes. This way you can avoid requiring site registration, a huge turnoff for your users. Simple enough.\u003c/p\u003e","title":"🗳 Off the Cuff Voter Fraud Detection"},{"content":"Note: This post was updated to work with Python 3 and PySCIPOpt. The original version used Python 2 and python-zibopt.\nData fitting is one of those tasks that everyone should have at least some exposure to. Certainly developers and analysts will benefit from a working knowledge of its fundamentals and their implementations. However, in my own reading I\u0026rsquo;ve found it difficult to locate good examples that are simple enough to pick up quickly and come with accompanying source code.\nThis article commences an ongoing series introducing basic data fitting techniques. With any luck they won\u0026rsquo;t be overly complex, while still being useful enough to get the point across with a real example and real data. We\u0026rsquo;ll start with a binary classification problem: presented with a series of records, each containing a set number of input values describing it, determine whether or not each record exhibits some property.\nModel We\u0026rsquo;ll use the cancer1.dt data from the proben1 set of test cases, which you can download here. Each record starts with 9 data points containing physical characteristics of a tumor. The second to last data point contains 1 if a tumor is benign and 0 if it is malignant. We seek to find a linear function we can run on an arbitrary record that will return a value greater than zero if that record\u0026rsquo;s tumor is predicted to be benign and less than zero if it is predicted to be malignant. We will train our linear model on the first 350 records, and test it for accuracy on the remaining rows.\nThis is similar to the data fitting problem found in Chvatal. Our inputs consist of a matrix of observed data, $A$, and a vector of classifications, $b$. In order to classify a record, we require another vector $x$ such that the dot product of $x$ and that record will be either greater or less than zero depending on its predicted classification.\nA couple points to note before we start:\nMost observed data are noisy. This means it may be impossible to locate a hyperplane that cleanly separates given records of one type from another. In this case, we must resort to finding a function that minimizes our predictive error. For the purposes of this example, we\u0026rsquo;ll minimize the sum of the absolute values of the observed and predicted values. That is, we seek $x$ such that we find $min \\sum_i{|a_i^T x-b_i|}$.\nThe slope-intercept form of a line, $f(x)=m^T x+b$, contains an offset. It should be obvious that this is necessary in our model so that our function isn\u0026rsquo;t required to pass through the origin. Thus, we\u0026rsquo;ll be adding an extra variable with the coefficient of 1 to represent our offset value.\nIn order to model this, we use two linear constraints for each absolute value. We minimize the sum of these. Our Linear Programming model thus looks like:\n$$ \\begin{align*} \\min\\quad \u0026amp; z = x_0 + \\sum_i{v_i}\\\\ \\text{s.t.}\\quad\u0026amp; v_i \\geq x_0 + a_i^\\intercal x - 1 \u0026amp;\\quad\\forall\u0026amp;\\quad\\text{benign tumors}\\\\ \u0026amp; v_i \\geq 1 - x_0 - a_i^\\intercal x \u0026amp;\\quad\\forall\u0026amp;\\quad\\text{benign tumors}\\\\ \u0026amp; v_i \\geq x_0 + a_i^\\intercal x - (-1) \u0026amp;\\quad\\forall\u0026amp;\\quad\\text{malignant tumors}\\\\ \u0026amp; v_i \\geq -1 - x_0 - a_i^\\intercal x \u0026amp;\\quad\\forall\u0026amp;\\quad\\text{malignant tumors} \\end{align*} $$\nCode In order to do this in Python, we use SCIP and SoPlex. We start by setting constants for benign and malignant outputs and providing a function to read in the training and testing data sets.\n# Preferred output values for tumor categories BENIGN = 1 MALIGNANT = -1 def read_proben1_cancer_data(filename, train_size): \u0026#39;\u0026#39;\u0026#39;Loads a proben1 cancer file into train \u0026amp; test sets\u0026#39;\u0026#39;\u0026#39; # Number of input data points per record DATA_POINTS = 9 train_data = [] test_data = [] with open(filename) as infile: # Read in the first train_size lines to a training data list, and the # others to testing data. This allows us to test how general our model # is on something other than the input data. for line in infile.readlines()[7:]: # skip header line = line.split() # Records = offset (x0) + remaining data points input = [float(x) for x in line[:DATA_POINTS]] output = BENIGN if line[-2] == \u0026#39;1\u0026#39; else MALIGNANT record = {\u0026#39;input\u0026#39;: input, \u0026#39;output\u0026#39;: output} # Determine what data set to put this in if len(train_data) \u0026gt;= train_size: test_data.append(record) else: train_data.append(record) return train_data, test_data The next function implements the LP model described above using SoPlex and SCIP. It minimizes the sum of residuals for each training record. This amounts to summing the absolute value of the difference between predicted and observed output data. The following function takes in input and observed output data and returns a list of coefficients. Our resulting model consists of taking the dot product of an input record and these coefficients. If the result is greater than or equal to zero, that record is predicted to be a benign tumor, otherwise it is predicted to be malignant.\nfrom pyscipopt import Model def train_linear_model(train_data): \u0026#39;\u0026#39;\u0026#39; Accepts a set of input training data with known output values. Returns a list of coefficients to apply to arbitrary records for purposes of binary categorization. \u0026#39;\u0026#39;\u0026#39; # Make sure we have at least one training record. assert len(train_data) \u0026gt; 0 num_variables = len(train_data[0][\u0026#39;input\u0026#39;]) # Variables are coefficients in front of the data points. It is important # that these be unrestricted in sign so they can take negative values. m = Model() x = [m.addVar(f\u0026#39;x{i}\u0026#39;, lb=None) for i in range(num_variables)] # Residual for each data row residuals = [m.addVar(lb=None, ub=None) for _ in train_data] for r, d in zip(residuals, train_data): # r will be the absolute value of the difference between observed and # predicted values. We can model absolute values such as r \u0026gt;= |foo| as: # # r \u0026gt;= foo # r \u0026gt;= -foo m.addCons(sum(x * y for x, y in zip(x, d[\u0026#39;input\u0026#39;])) + r \u0026gt;= d[\u0026#39;output\u0026#39;]) m.addCons(sum(x * y for x, y in zip(x, d[\u0026#39;input\u0026#39;])) - r \u0026lt;= d[\u0026#39;output\u0026#39;]) # Find and return coefficients that min sum of residuals. m.setObjective(sum(residuals)) m.setMinimize() m.optimize() solution = m.getBestSol() return [solution[xi] for xi in x] We also provide a convenience function for counting the number of correct predictions by our resulting model against either the test or training data sets.\ndef count_correct(data_set, coefficients): \u0026#39;\u0026#39;\u0026#39;Returns the number of correct predictions.\u0026#39;\u0026#39;\u0026#39; correct = 0 for d in data_set: result = sum(x*y for x, y in zip(coefficients, d[\u0026#39;input\u0026#39;])) # Do we predict the same as the output? if (result \u0026gt;= 0) == (d[\u0026#39;output\u0026#39;] \u0026gt;= 0): correct += 1 return correct Finally we write a main method to read in the data, build our linear model, and test its efficacy.\nfrom pprint import pprint if __name__ == \u0026#39;__main__\u0026#39;: # Specs for this input file INPUT_FILE_NAME = \u0026#39;cancer1.dt\u0026#39; TRAIN_SIZE = 350 train_data, test_data = read_proben1_cancer_data( INPUT_FILE_NAME, TRAIN_SIZE ) # Add the offset variable to each of our data records for data_set in [train_data, test_data]: for row in data_set: row[\u0026#39;input\u0026#39;] = [1] + row[\u0026#39;input\u0026#39;] coefficients = train_linear_model(train_data) print(\u0026#39;coefficients:\u0026#39;) pprint(coefficients) # Print % of correct predictions for each data set correct = count_correct(train_data, coefficients) print( \u0026#39;%s / %s = %.02f%% correct on training set\u0026#39; % ( correct, len(train_data), 100 * float(correct) / len(train_data) ) ) correct = count_correct(test_data, coefficients) print( \u0026#39;%s / %s = %.02f%% correct on testing set\u0026#39; % ( correct, len(test_data), 100 * float(correct) / len(test_data) ) ) Results The result of running this model against the cancer1.dt data set is:\ncoefficients: [1.4072882449702786, -0.14014055927954652, -0.6239513714263405, -0.26727681774258882, 0.067107753841131157, -0.28300216102808429, -1.0355594670918404, -0.22774451038152174, -0.69871243677663608, -0.072575089848659444] 328 / 350 = 93.71% correct on training set 336 / 349 = 96.28% correct on testing set The accuracy is pretty good here against the both the training and testing sets, so this particular model generalizes well. This is about the simplest model we can implement for data fitting, and we\u0026rsquo;ll get to more complicated ones later, but it\u0026rsquo;s nice to see we can do so well so quickly. The coefficients correspond to using a function of this form, rounding off to three decimal places:\n$$ \\begin{align*} f(x) =\\ \u0026amp;1.407 - 0.140 x_1 - 0.624 x_2 - 0.267 x_3 + 0.067 x_4 - \\\\ \u0026amp;0.283 x_5 - 1.037 x_6 - 0.228 x_7 - 0.699 x_8 - 0.073 x_9 \\end{align*} $$\nResources cancer1.dt data file from proben1 Full source listing ","permalink":"https://ryanjoneil.dev/posts/2010-11-23-data-fitting-1-linear-data-fitting/","summary":"\u003cp\u003e\u003cem\u003eNote: This post was updated to work with Python 3 and \u003ca href=\"https://github.com/scipopt/PySCIPOpt\"\u003ePySCIPOpt\u003c/a\u003e. The original version used Python 2 and \u003ca href=\"https://pythonhosted.org/python-zibopt/\"\u003epython-zibopt\u003c/a\u003e.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eData fitting is one of those tasks that everyone should have at least some exposure to. Certainly developers and analysts will benefit from a working knowledge of its fundamentals and their implementations. However, in my own reading I\u0026rsquo;ve found it difficult to locate good examples that are simple enough to pick up quickly and come with accompanying source code.\u003c/p\u003e","title":"🧐 Data Fitting 1 - Linear Data Fitting"},{"content":"Note: This post was updated to work with Python 3.\nOne of the most useful tools one learns in an Operations Research curriculum is Monte Carlo Simulation. Its utility lies in its simplicity: one can learn vital information about nearly any process, be it deterministic or stochastic, without wading through the grunt work of finding an analytical solution. It can be used for off-the-cuff estimates or as a proper scientific tool. All one needs to know is how to simulate a given process and its appropriate probability distributions and parameters if that process is stochastic.\nHere\u0026rsquo;s how it works:\nConstruct a simulation that, given input values, returns a value of interest. This could be a pure quantity, like time spent waiting for a bus, or a boolean indicating whether or not a particular event occurs. Run the simulation a, usually large, number of times, each time with randomly generated input variables. Record its output values. Compute sample mean and variance of the output values. In the case of time spent waiting for a bus, the sample mean and variance are estimators of mean and variance for one\u0026rsquo;s wait time. In the boolean case, these represent probability that the given event will occur.\nOne can think of Monte Carlo Simulation as throwing darts. Say you want to find the area under a curve without integrating. All you must do is draw the curve on a wall and throw darts at it randomly. After you\u0026rsquo;ve thrown enough darts, the area under the curve can be approximated using the percentage of darts that end up under the curve times the total area.\nThis technique is often performed using a spreadsheet, but that can be a bit clunky and may make more complex simulations difficult. I\u0026rsquo;d like to spend a minute showing how it can be done in Python. Consider the following scenario:\nPassengers for a train arrive according to a Poisson process with a mean of 100 per hour. The next train arrives exponentially with a rate of 5 per hour. How many passers will be aboard the train?\nWe can simulate this using the fact that a Poisson process can be represented as a string of events occurring with exponential inter-arrival times. We use the sim() function below to generate the number of passengers for random instances of the problem. We then compute sample mean and variance for these values.\nimport random PASSENGERS = 100.0 TRAINS = 5.0 ITERATIONS = 10000 def sim(): passengers = 0.0 # Determine when the train arrives train = random.expovariate(TRAINS) # Count the number of passenger arrivals before the train now = 0.0 while True: now += random.expovariate(PASSENGERS) if now \u0026gt;= train: break passengers += 1.0 return passengers if __name__ == \u0026#39;__main__\u0026#39;: output = [sim() for _ in range(ITERATIONS)] total = sum(output) mean = total / len(output) sum_sqrs = sum(x*x for x in output) variance = (sum_sqrs - total * mean) / (len(output) - 1) print(\u0026#39;E[X] = %.02f\u0026#39; % mean) print(\u0026#39;Var(X) = %.02f\u0026#39; % variance) ","permalink":"https://ryanjoneil.dev/posts/2009-10-08-monte-carlo-simulation-in-python/","summary":"\u003cp\u003e\u003cem\u003eNote: This post was updated to work with Python 3.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eOne of the most useful tools one learns in an Operations Research curriculum is \u003ca href=\"https://en.wikipedia.org/wiki/Monte_Carlo_method\"\u003eMonte Carlo Simulation\u003c/a\u003e. Its utility lies in its simplicity: one can learn vital information about nearly any process, be it deterministic or stochastic, without wading through the grunt work of finding an analytical solution. It can be used for off-the-cuff estimates or as a proper scientific tool. All one needs to know is how to simulate a given process and its appropriate probability distributions and parameters if that process is stochastic.\u003c/p\u003e","title":"🐍 Monte Carlo Simulation in Python"},{"content":"One of the difficulties we encounter in solving the Traveling Salesman Problem (TSP) is that, for even a small number of cities, a complete description of the problem requires a factorial number of constraints. This is apparent in the standard formulation used to teach the TSP to OR students. Consider a set of $n$ cities with the distance from city $i$ to city $j$ denoted $d_{ij}$. We attempt to minimize the total distance of a tour entering and leaving each city exactly once. $x_{ij} = 1$ if the edge from city $i$ to city $j$ is included in the tour, $0$ otherwise:\n$$ \\small \\begin{align*} \\min\\quad \u0026amp; z = \\sum_i \\sum_{j\\ne i} d_{ij} x_{ij}\\\\ \\text{s.t.}\\quad\u0026amp; \\sum_{j\\ne i} x_{ij} = 1 \u0026amp;\\quad\\forall\u0026amp;\\ i \u0026amp; \\text{leave each city once}\\\\ \u0026amp; \\sum_{i\\ne j} x_{ij} = 1 \u0026amp;\\quad\\forall\u0026amp;\\ j \u0026amp; \\text{enter each city once}\\\\ \u0026amp; x_{ij} \\in \\{0,1\\} \u0026amp;\\quad\\forall\u0026amp;\\ i,j \\end{align*} $$\nThis appears like a reasonable formulation until we solve it and see that our solution contains disconnected subtours. Suppose we have four cities, labeled $A$ through $D$. Connecting $A$ to $B$, $B$ to $A$, $C$ to $D$ and $D$ to $C$ provides a feasible solution to our formulation, but does not constitute a cycle. Here is a more concrete example of two disconnected subtours $\\{(1,5),(5,1)\\}$ and $\\{(2,3),(3,4),(4,2)\\}$ over five cities:\nampl: display x; x [*,*] : 1 2 3 4 5 := 1 0 0 0 0 1 2 0 0 1 0 0 3 0 0 0 1 0 4 0 1 0 0 0 5 1 0 0 0 0 ; Realizing we just solved the Assignment Problem, we now add subtour elimination constraints. These require that any proper, non-null subset of our $n$ cities is connected by at most $n-1$ active edges:\n$$ \\sum_{i \\in S} \\sum_{j \\in S} x_{ij} \\leq |S|-1 \\quad\\forall\\ S \\subset {1, \u0026hellip;, n}, S \\ne O $$\nIndexing subtour elimination constraints over a power set of the cities completes the formulation. However, this requires an additional $\\sum_{k=2}^{n-1} \\begin{pmatrix} n \\\\ k \\end{pmatrix}$ rows tacked on the end of our matrix and is clearly infeasible for large $n$. The most current computers can handle using this approach is around 19 cities. It remains an instructive tool for understanding the combinatorial explosion that occurs in problems like TSP and is worth translating into a modeling language. So how does one model it on a computer?\nUnfortunately, AMPL, the gold standard in mathematical modeling languages, is unable to index over sets. Creating a power set in AMPL requires going through a few contortions. The following code demonstrates power and index sets over four cities:\nset cities := 1 .. 4 ordered; param n := card(cities); set indices := 0 .. (2^n - 1); set power {i in indices} := {c in cities: (i div 2^(ord(c) - 1)) mod 2 = 1}; display cities; display n; display indices; display power; This yields the following output:\nset cities := 1 2 3 4; n = 4 set indices := 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; set power[0] := ; # empty set power[1] := 1; set power[2] := 2; set power[3] := 1 2; set power[4] := 3; set power[5] := 1 3; set power[6] := 2 3; set power[7] := 1 2 3; set power[8] := 4; set power[9] := 1 4; set power[10] := 2 4; set power[11] := 1 2 4; set power[12] := 3 4; set power[13] := 1 3 4; set power[14] := 2 3 4; set power[15] := 1 2 3 4; Note how the index set contains an index for each row in our power set. We can now generate the subtour elimination constraints:\nvar x {cities cross cities} binary; s.t. subtours {i in indices: card(power[i]) \u0026gt; 1 and card(power[i]) \u0026lt; card(cities)}: sum {(c,k) in power[i] cross power[i]: k != c} x[c,k] \u0026lt;= card(power[i]) - 1; expand subtours; subject to subtours[3]: x[1,2] + x[2,1] \u0026lt;= 1; subject to subtours[5]: x[1,3] + x[3,1] \u0026lt;= 1; subject to subtours[6]: x[2,3] + x[3,2] \u0026lt;= 1; subject to subtours[7]: x[1,2] + x[1,3] + x[2,1] + x[2,3] + x[3,1] + x[3,2] \u0026lt;= 2; subject to subtours[9]: x[1,4] + x[4,1] \u0026lt;= 1; subject to subtours[10]: x[2,4] + x[4,2] \u0026lt;= 1; subject to subtours[11]: x[1,2] + x[1,4] + x[2,1] + x[2,4] + x[4,1] + x[4,2] \u0026lt;= 2; subject to subtours[12]: x[3,4] + x[4,3] \u0026lt;= 1; subject to subtours[13]: x[1,3] + x[1,4] + x[3,1] + x[3,4] + x[4,1] + x[4,3] \u0026lt;= 2; subject to subtours[14]: x[2,3] + x[2,4] + x[3,2] + x[3,4] + x[4,2] + x[4,3] \u0026lt;= 2; While this does work, the code for generating the power set looks like voodoo. Understanding it required piece-by-piece decomposition, an exercise I suggest you go through yourself if you have a copy of AMPL and 15 minutes to spare:\nset foo {c in cities} := {ord(c)}; set bar {c in cities} := {2^(ord(c) - 1)}; set baz {i in indices} := {c in cities: i div 2^(ord(c) - 1)}; set qux {i in indices} := {c in cities: (i div 2^(ord(c) - 1)) mod 2 = 1}; display foo; display bar; display baz; display qux; This may be an instance where open source leads commercial software. The good folks who produce the SCIP Optimization Suite provide an AMPL-like language called ZIMPL with a few additional useful features. One of these is power sets. Compared to the code above, doesn\u0026rsquo;t this look refreshing?\nset cities := {1 to 4}; set power[] := powerset(cities); set indices := indexset(power); ","permalink":"https://ryanjoneil.dev/posts/2009-02-27-on-the-beauty-of-power-sets/","summary":"\u003cp\u003eOne of the difficulties we encounter in solving the \u003ca href=\"https://www.math.uwaterloo.ca/tsp/\"\u003eTraveling Salesman Problem\u003c/a\u003e (TSP) is that, for even a small number of cities, a complete description of the problem requires a factorial number of constraints. This is apparent in the standard formulation used to teach the TSP to OR students. Consider a set of $n$ cities with the distance from city $i$ to city $j$ denoted $d_{ij}$. We attempt to minimize the total distance of a tour entering and leaving each city exactly once. $x_{ij} = 1$ if the edge from city $i$ to city $j$ is included in the tour, $0$ otherwise:\u003c/p\u003e","title":"⚡️ On the Beauty of Power Sets"},{"content":"Uncapacitated Lot Sizing (ULS) is a classic OR problem that seeks to minimize the cost of satisfying known demand for a product over time. Demand is subject to varying costs for production, set-up, and storage of the product. Technically, it is a mixed binary integer linear program \u0026ndash; the key point separating it from the world of linear optimization being that production cannot occur during any period without paying that period\u0026rsquo;s fixed costs for set-up. Thus it has linear nonnegative variables for production and storage amounts during each period, and a binary variable for each period that determines whether or not production can actually occur.\nFor $n$ periods with per-period fixed set-up cost $f_t$, unit production cost $p_t$, unit storage cost $h_t$,and demand $d_t$, we define decision variables related to production and storage quantities:\n$$ \\small \\begin{align*} x_t \u0026amp;= \\text{units produced in period}\\ t\\\\ s_t \u0026amp;= \\text{stock at the end of period}\\ t\\\\ y_t \u0026amp;= 1\\ \\text{if production occurs in period}\\ t, 0\\ \\text{otherwise} \\end{align*} $$\nOne can minimize overall cost for satisfying all demand on time using the following model per Wolsey (1998), defined slightly differently here:\n$$ \\small \\begin{align*} \\min\\quad \u0026amp; z = \\sum_t{p_t x_t} + \\sum_t{h_t s_t} + \\sum_t{f_t y_t}\\\\ \\text{s.t.}\\quad\u0026amp; s_1 = d_1 + s_1\\\\ \u0026amp; s_{t-1} + x_t = d_t + s_t \u0026amp;\\quad\\forall\u0026amp;\\ t \u0026gt; 1\\\\ \u0026amp; x_t \\leq M y_t \u0026amp;\\quad\\forall\u0026amp;\\ t\\\\ \u0026amp; s_t, x_t \\geq 0 \u0026amp;\\quad\\forall\u0026amp;\\ t\\\\ \u0026amp; y_t \\in {0,1} \u0026amp;\\quad\\forall\u0026amp;\\ t \\end{align*} $$\nAccording to Wolsey, page 11, given that $s_t = \\sum_{i=1}^t (x_i - d_i)$ and defining new constants $K = \\sum_{t=1}^n h_t(\\sum_{i=1}^t d_i)$ and $c_t = p_t + \\sum_{i=t}^n h_i$, the objective function can be rewritten as $z = \\sum_t c_t x_t + \\sum _t f_t y_t - K$. The book lacks a proof of this and it seems a bit non-obvious, so I attempt an explanation in somewhat painstaking detail here.\n$$ \\small \\begin{align*} \u0026amp;\\text{Proof}:\\\\ \u0026amp; \u0026amp; \\sum_t p_t x_t + \\sum_t h_t s_t + \\sum_t f_t y_t \u0026amp;= \\sum_t c_t x_t + \\sum _t f_t y_t - K\\\\ \u0026amp;\\text{1. Remove} \\sum_t f_t y_t:\\\\ \u0026amp; \u0026amp; \\sum_t p_t x_t + \\sum_t h_t s_t \u0026amp;= \\sum_t c_t x_t - K\\\\ \u0026amp;\\text{2. Expand}\\ K\\ \\text{and}\\ c_t:\\\\ \u0026amp; \u0026amp; \\sum_t p_t x_t + \\sum_t h_t s_t \u0026amp;= \\sum_t (p_t + \\sum_{i=t}^n h_i) x_t - \\sum_t h_t (\\sum_{i=1}^t d_i)\\\\ \u0026amp;\\text{3. Remove}\\ \\sum_t p_t x_t:\\\\ \u0026amp; \u0026amp; \\sum_t h_t s_t \u0026amp;= \\sum_t x_t (\\sum_{i=t}^n h_i) - \\sum_t h_t (\\sum_{i=1}^t d_i)\\\\ \u0026amp;\\text{4. Expand}\\ s_t:\\\\ \u0026amp; \u0026amp; \\sum_t h_t (\\sum_{i=1}^t x_i) - \\sum_t h_t (\\sum_{i=1}^t d_i) \u0026amp;= \\sum_t x_t (\\sum_{i=t}^n h_i) - \\sum_t h_t (\\sum_{i=1}^t d_i)\\\\ \u0026amp;\\text{5. Remove}\\ \\sum_t h_t (\\sum_{i=1}^t d_i):\\\\ \u0026amp; \u0026amp; \\sum_t h_t (\\sum_{i=1}^t x_i) \u0026amp;= \\sum_t x_t (\\sum_{i=t}^n h_i) \\end{align*} $$\nThe result from step 5 becomes obvious upon expanding its left and right-hand terms:\n$$ h_1 x_1 + h_2 (x_1 + x_2) + \\cdots + h_n (x_1 + \\cdots + x_n) =\\\\ x_1 (h_1 + \\cdots + h_n) + x2 (h_2 + \\cdots + h_n) + \\cdots + x_n h_n $$\nIn matrix notation with $h$ and $x$ as column vectors in $\\bf R^n$ and $L$ and $U$ being $n \\times n$ lower and upper triangular identity matrices, respectively, this can be written as:\n$$ \\small \\begin{pmatrix} h_1 \\cdots h_n \\end{pmatrix} \\begin{pmatrix} 1 \\cdots 0 \\\\ \\vdots \\ddots \\vdots \\\\ 1 \\cdots 1 \\end{pmatrix} \\begin{pmatrix} x_1 \\\\ \\vdots \\\\ x_n \\end{pmatrix} = \\begin{pmatrix} x_1 \\cdots x_n \\end{pmatrix} \\begin{pmatrix} 1 \\cdots 1 \\\\ \\vdots \\ddots \\vdots \\\\ 0 \\cdots 1 \\end{pmatrix} \\begin{pmatrix} h_1 \\\\ \\vdots \\\\ h_n \\end{pmatrix} $$\nor $h^T L x = x^T U h$.\n","permalink":"https://ryanjoneil.dev/posts/2009-02-20-uncapacitated-lot-sizing/","summary":"\u003cp\u003eUncapacitated Lot Sizing (ULS) is a classic \u003ca href=\"http://en.wikipedia.org/wiki/Operations_research\"\u003eOR\u003c/a\u003e problem that seeks to minimize the cost of satisfying known demand for a product over time. Demand is subject to varying costs for production, set-up, and storage of the product. Technically, it is a mixed binary integer linear program \u0026ndash; the key point separating it from the world of \u003ca href=\"http://en.wikipedia.org/wiki/Linear_programming\"\u003elinear optimization\u003c/a\u003e being that production cannot occur during any period without paying that period\u0026rsquo;s fixed costs for set-up. Thus it has linear nonnegative variables for production and storage amounts during each period, and a binary variable for each period that determines whether or not production can actually occur.\u003c/p\u003e","title":"📐 Uncapacitated Lot Sizing"}]