Inspiration:

One of the bigger issues within our country stems from the disconnect between the constituents and the issues and politics they are working on. Today, most voters are gathering most of their knowledge of political issues either through their own lens or through large media outlets, which doesn't paint the entire picture. Meanwhile, governments at the city, county, state, and national level produce text documents and datasets of the issues they are addressing. Yet these documents and datasets are rarely read by constituents, due to their length and inaccessibility.

We wanted to make these documents and datasets easier to consume, so that constituents can be better informed during elections and contribute more to their governments.

What it does

  • Scrape local government websites for legislation (ordinances and resolutions at the city level)
  • Scrape upcoming agendas from government websites for key dates
  • Feeds key dates and legislation into a generative AI/ML transformation pipeline to extract:
    • Summarization
    • Key metadata (publication date, document id)
    • Topics and Entities
    • Locations
  • Loads transformed information into a searchable faceted database
  • Loads transformed information into a vector database using textual embeddings
  • Provides a search interface for discovering approved and upcoming legislation
  • Provides a query interface for questions on related documents via vector database
  • Provides a 3d visualization for clusters of related concepts in the vector database
  • Provides a dashboard for citizens to see legislation that is important to them that they may want to take action on

How we built it

We used the OpenAI api, Langchain, Pinecone, and Algolia to summarize the text and expose it on our front end. For our front end, we used React. For the visualization we used dash and plotly.

Challenges we ran into

  • It’s very difficult to find legislation that is coming up for discussion
  • It’s very difficult to figure out what the legislation is about (legislative technical jargon)
  • There is no easily accessible store of data and complicated scrapers are needed
  • Each incorporated city in San Diego County (or beyond) has/could have a different interface and no generic way to scrape the data
  • Large language models are good at summarizing, but are not consistent in the results
  • Some text is too large or complicated for LLMs to process directly
  • Geo locations are difficult to infer from natural language descriptions

Accomplishments that we're proud of

  • Improving our understanding of how local government works
  • Using AI/ML to summarize complicated legislative documents to language we can understand
  • Using AI/ML to extract features from legislative documents to better organize them and surface them
  • Using AI/ML & vector space to cluster legislative concepts beyond simple keyword clustering
  • Building a visualization to understand the clustering of legislative concepts
  • Building an interface to query and summarize across collections of legislative documents

What we learned

As a team, our greatest learning was around understanding the AI services and tools out there. This project allowed us to integrate and play around with some of these services.

Other domains that our team benefited from was gaining experience in working with a team, product ideation, and working under a time crunch.

We gained a better understanding of prompt engineering and how important it can be in the quality of our data pipeline.

What's next for Concerned Citizen

  • Add calls to action to easily submit comments on upcoming legislation
  • Add sync to calendar for attending upcoming meetings virtual or zoom, e-mail notifications of upcoming dates
  • Improve LLM prompts to better summarize text and surface important features
  • Add data from more cities in San Diego County
  • Add data from cities in other counties
  • Add data from regional sources (San Diego County board of governors)
  • Add data from state sources (California assembly and California senate)

Built With

Share this project:

Updates