Inspiration

Remember the first time you wrote a for loop, you felt proud right? What if you had to loop through 27 000 sets of data? Managing big data is a fascinating process, especially if you need to determine anomalies, then for loops would be a too-heavy process. Being able to compute such quantity of information has inspired us to create this app, RDD (ReadyDoneData).

What it does

This project displays graphs according to the information the user needs. Depending on their input, the API will be called to sort the needed data using Apache Spark and return it to the front end for display.

How we built it

Front End: NextJS, ChartJS (for charts), TailwindCSS (styling)

Back End: Python Flask (API), Python Apache Spark aka pyspark (Data management)

Challenges we ran into

To begin, we are first-year university students, hence our knowledge is really limited to the existing documentation. We have never implemented big quantity of data using pyspark, hence making it work was our main challenge. In order to be able to maintain real-time, we couldn't use our conventional loops, therefore we had to look through the built-in functions of spark and other libraries.

Accomplishments that we're proud of

With the given time, we are proud to have at least one working graph. This requires the pyspark to be able to sort through all three files, get all the count of occurrences of the different "MessageType", and having it separated according to the different exchanges. Afterward, the API must be able to convert all that data into JSON format, in a way that the front end only needs to take all the points and draw the graph. And we made it!!!

But on another note, we are also proud to be able to determine predictions according to past occurrences from a symbol. Though, it has not been implemented in the API, therefore not presentable.

What we learned

We learned how to manage big data, precisely sorting and doing the necessary financial calculations according to the information we need.

What's next for ReadyDoneData

We will be implementing the spark ready functions in our API and into the front end graphs. We will also try to determine new anomalies, precisely look at relations between quantity of requests, type of requests, prices to find price drops or influence predictions.

Adding a functionality to stream live data through Kmeans clusteirng, as well as manage late responses.

Built With

  • apachespark
  • bigdata
  • chartjs
  • flask
  • nextjs
  • tailwindcss
Share this project:

Updates