user home page
landing page
Status board
Plan report
Google maps

What's next for Chicago's loopback

Chicago LoopBack

See It. Fix It. Loop It.

Inspiration

Chicago processes over 4.57 million 311 service requests every year. That number sounds impressive until you realize what it actually means: millions of residents picking up their phones, filing a report about a broken streetlight or a pothole that's been swallowing tires for six months, and then waiting. And waiting. And hearing nothing back.

We grew up in cities like this. You report something, it disappears into a system, and three weeks later the same pothole is still there. The frustration is not just inconvenience — it is a slow erosion of trust between people and the institutions that are supposed to serve them.

What struck us was not that cities lacked data. They had too much of it — fragmented, unstructured, and arriving from seven different intake channels with no shared intelligence connecting them. The problem was never the volume of reports. It was that no system was listening to the pattern beneath them.

That is what inspired LoopBack: the belief that if you could close the feedback loop between citizens and city operations — truly close it — you could turn civic frustration into civic momentum.

What We Built

LoopBack is a civic operations engine with two sides: one facing residents, one facing city departments.

The Citizen Side

Residents open the app and log an issue in under 10 seconds. Location is captured automatically. The report is timestamped, categorized, and immediately joined to any existing reports describing the same real-world problem nearby.

A daily streak system rewards consistent participation. This is deliberate. Civic reporting has historically been episodic — people report when something is bad enough to cross a frustration threshold. We wanted to shift that to a habit, where showing up every day feels meaningful and is reinforced.

The live map reflects real conditions. When 40 people report the same flooded underpass, the map pin for that issue grows. Severity is visible. The city is no longer a static backdrop — it becomes a feedback surface.

The Department Side

This is where the LLM pipeline runs. Every aggregated issue gets passed through our triage engine:

$$ S_{final} = \text{clamp}\left(S_{base} + \Delta_{LLM},\ S_{base} - 1,\ S_{base} + 1\right) $$

where $S_{base}$ is computed from category weights and $\Delta_{LLM}$ is the model's proposed adjustment, clamped to $\pm 1$ to prevent runaway severity inflation.

The crowd signal that feeds into triage is:

$$ P_{crowd} = \alpha \cdot \bar{u} + \beta \cdot \log(n_{unique} + 1) $$

where $\bar{u}$ is the average user-reported priority, $n_{unique}$ is the number of distinct reporters, and $\alpha, \beta$ are tuned weighting coefficients. Taking the log of unique users prevents a single coordinated group from artificially dominating the score.

The LLM (Gemini 2.5 Flash via REST API) receives a JSON payload containing category, location text, aggregated crowd signal, and up to five sample report excerpts. It returns:

A severity score from 1 to 5
A department routing decision — one of CTA_OPS, CITY_311, SECURITY, or COMMUNITY
A professional complaint draft ready to send without editing

Departments receive a ranked queue the night before field work begins. Instead of reacting to whatever arrives in the morning, crews know exactly where they are going and why.

Route Safety

We layered in a route recommendation system for commuters. Given a start and end point, the system queries nearby incident clusters along candidate corridors and asks the LLM to reason about relative risk:

$$ R_{corridor} = \sum_{i} w_i \cdot S_i \cdot e^{-\lambda d_i} $$

where $S_i$ is the severity of incident $i$, $d_i$ is its distance from the route centerline in meters, and $\lambda$ controls the decay rate of spatial influence. Incidents close to the corridor are weighted heavily; those far away decay toward zero.

How We Built It

The stack is intentionally lean.

Backend — Python with FastAPI. The aggregation logic, severity computation, and Gemini API calls all live here. We store reports in a PostGIS-enabled PostgreSQL database so that spatial queries (which reports are within 150 meters of each other?) are fast and clean.

LLM Integration — We call Gemini 2.5 Flash directly over the REST API rather than through a framework. This gave us full control over the prompt, the generation config (temperature 0.2 for consistent triage), and the response parsing. We wrote a robust JSON extractor that handles markdown fences, unicode quote substitution, and trailing comma cleanup — because models do not always return pristine JSON even when asked to.

Frontend — React with a live map component. The map updates reflect the aggregated issue state, not raw report counts, so the interface never looks overwhelming.

Streak and Rewards — Simple but effective. A user's streak increments when they file at least one report in a calendar day. Streaks persist across sessions. We deliberately kept the reward mechanic lightweight — this is not a gamification layer on top of a civic tool, it is a gentle nudge toward consistency.

Challenges

Prompt Stability

Getting the LLM to return valid, schema-conforming JSON every single time was harder than expected. Early versions would occasionally wrap the output in markdown fences, use curly quotes instead of straight quotes, or insert a trailing comma before the closing brace. We built a multi-stage extraction pipeline that tries direct parsing first, then extracts balanced brace candidates sorted by length, then applies a cleanup pass before giving up. The failure rate dropped to under 0.3%.

Severity Clamping Without Losing Signal

We wanted the LLM to have some ability to override the rule-based severity, because it genuinely reasons about context in ways a lookup table cannot. But we also could not let it assign severity 5 (emergency escalation) to a complaint about a dirty park bench. The $\pm 1$ clamp gives the model meaningful influence while keeping it anchored to the base signal:

$$ |\Delta_{LLM}| \leq \delta_{max} = 1 $$

This felt like the right tradeoff between flexibility and operational safety.

Deduplication at Scale

Two reports are "the same issue" if they describe the same real-world problem, not just the same category. We cluster by:

$$ \text{same issue} \iff \text{category matches} \land d(p_1, p_2) < r_{cluster} $$

where $d$ is the Haversine distance and $r_{cluster} = 150$ meters. We also apply a time window — reports more than $T = 7$ days apart are not merged even if they are in the same location, because a repaired pothole that breaks again is a new issue.

Getting the radius and time window right required iterating against real Chicago 311 data. Too tight and every report stood alone. Too loose and distinct issues collapsed into one, diluting the crowd signal.

Keeping Departments in the Loop

The hardest human problem was not technical. City departments are not used to receiving AI-generated complaint drafts. We made sure the draft is clearly labeled as AI-assisted, that the original crowd signal data is always visible alongside it, and that the severity score links back to the reports that produced it. Trust requires transparency, and transparency requires showing your work.

What We Learned

We learned that the bottleneck in civic tech is rarely the data. Chicago's 311 system is remarkably well documented. The bottleneck is the gap between raw signal and actionable decision — and that gap is exactly where an LLM earns its place.

We also learned that streak mechanics work on us too. Watching the engagement chart climb week over week, knowing that real people were forming a habit around reporting issues in their neighborhoods — that was the most motivating feedback loop of the entire build.

Most importantly: closing the loop is not a technical achievement. It is a trust achievement. The technology only works if residents believe their report will become a task, and if departments believe the queue they receive in the morning reflects what actually matters. Every design decision we made was in service of that belief.