Inspiration
Information is power! Specially when it comes to information about stock market. If you can access any intel regarding financial securities, you can get ahead of the competitors. In this project, we present a smart search engine for street ids which generates the output as fast as possible while getting smarter based on user interaction.
What it does
Our project receives street id string/substrings and runs queries across the dataset. After considering different matching algorithms, the system outputs details of the most relevant entries to the user.
How we built it
The backbone of our system is Elasticsearch. First, we create a specific index in the Elasticsearch with finetuned labels based on the provided dataset. Then, we developed a python web application to provide APIs for interacting with the Elasticsearch. The web application is also responsible for some parts of matching algorithms and updating the priorities. For the frontend, we developed a website using Wix to provide a simple yet efficient GUI for the end user. Finally we Dockerized our web application and deployed it on Microsoft Cloud using Azure Education Kubernetes Cluster. Then, we used a domain from Domain.com to set DNS for our specific domain for the backend part. Unfortunately, Wix does not allow custom domain usage for non-premium accounts and that's why the frontend is severed on Wix domain.
Elasticsearch Details
This section talks about how we used Elasticsearch features to deliver different levels of requirements.
Exact Match
For finding the exact match we do the following:
An additional "keyword" for every field in the mapping. This keyword is not tokenized. We perform exact match queries on this keyword.
Instead of a single query_string, we perform a boolean query over two matchings, the partial matching and the exact matching. Using query_boosting, we make sure the exact matchings' score exceed that of partial matchings.
Partial Match
We perform a query_string over ngrams of size [2, 40] of each field. We employ elastic Luecene's scoring function with the exception of "IDF" score to score partial matches.
Handling Street ID Priorities
We use boosting to prioritize more important fields. Each field has a boosting level ranging from 0 to 10.
Dynamic Priorities
We detect users selection to find out which fields have yielded the best search results. We collect the matched fields of the selected result from our front-end and increment the occurrence of that specific field in our back-end. If the occurrence exceeds one of the higher priority fields (say A), the field's priority increases while decreasing A's priority for preventing boosting overflow.
Challenges we ran into
Since we are heavily relying on our Elasticsearch to run the queries, we spent lots of time setting up the best practices when creating the index, inserting data, and running queries in order to avoid any bottlenecks on that side. Before this event, we had no experience with Wix which caused us to struggle a little bit and even not being able to implement what we had in mind using the tools. We later found out that some of the very useful features are in premium category and tried to get the best of what we have.
Accomplishments that we're proud of
Applied advanced practices of Elasticsearch in order to get the best possible results . Got familiar with Wix and how to use it which can be very helpful for backend developers like us.
Sample Queries
You can use Lucene query syntax to interact with the system.
BGG
h2 AND BGG
h2 OR bgg
Bloomberg:416339 # Search for exact and partial match in the field
H2\:N2 # Search for a string containing :
Built With
- docker
- domain.com
- elasticsearch
- flask
- python
- wix


Log in or sign up for Devpost to join the conversation.