Skip to content

christophermccall/Beatstream

 
 

Repository files navigation

BeatStream: A Musically-Focused Data Pipeline

ABOUT:

Welcome to BeatStream! This project is a comprehensive data pipeline designed for predictive modeling, real-time analysis of music data, and interactive visualization.
Whether you're a music enthusiast, a data scientist, or someone intrigued by the intersection of music and technology, BeatStream offers an exciting platform to explore.
By leveraging advanced predictive modeling techniques and real-time data analysis, BeatStream offers personalized song recommendations tailored to individual user preferences.
BeatStream recommends the next song based on the user's current track and selected predictive models. These models are chosen specifically for each user, enhancing the accuracy and relevance of recommendations.
### TECH STACK: BeatStream is built on the following technologies: -Python 3 -Zookeeper -Kafka -MySQL

KEY FEATURES:

-Personalized Song Recommendations: BeatStream recommends the next song based on the user's current song and a selection of predictive models chosen specifically for each user.

-Model Performance Ranking: BeatStream employs a detailed scoring system to rank the performance of predictive models, facilitating continuous improvement through training dataset iterations.

-Adaptability: BeatStream is highly adaptable and can be integrated seamlessly with any music service or personal library to provide next song recommendations.

-Million Song Database Integration: Leveraging the Million Song Database, BeatStream offers users a vast selection of artists and tracks to choose from, allowing users to access a diverse range of musical data for analysis and exploration.

-Data Visualization: When combined with the BeatStream Dashboard front end, users gain access to visual and statistical comparisons of analytical and machine learning models, as well as insights into the characteristics of popular artists and songs.

GETTING STARTED:

  1. Clone the Repository: Begin by cloning this repository to your local machine.
  2. Setup MySQL: Set up a local MySQL environment and request access to the Million Song Database from the project's authors. Configure MySQL environment variables and create a database named "beatstream".
  3. Install Kafka and Zookeeper: Install and set up Kafka and Zookeeper through Bootstrap.
  4. Virtual Environment: Ensure that a virtual environment is set up using the provided requirements.txt file.
  5. Install Dependencies: Navigate to the project directory and install the necessary dependencies by running:
pip install -r requirements.txt
  6. Start Kafka and Zookeeper: In two separate PyCharm terminal windows, first run start-zookeeper.sh, and then run start-kafka.sh. If Kafka fails to start, a computer reboot may help.
  7. Run Consumers: Execute the consumers.py script.
  8. Run Producers: Run the producers.py script.
  9. Real-time Analysis and Recommendations: As batches of user song choices are generated by producers, consumers will analyze the data, make predictions using various models, and update the BeatStream MySQL database with recommendations.
  10. Database Testing: Utilize database_test.py to confirm that database entries are being updated as expected.
  11. Explore and Enjoy: Once BeatStream is running, refer to Beatstream_dashboard to explore the available features, customize predictive models, and enjoy personalized music recommendations!

LICENSE:

BeatStream is licensed under the MIT License. See the LICENSE file for details.

ACKNOWLEDGEMENTS:

We would like to express our gratitude to the creators of the Million Song Database for providing such a rich resource for music analysis. Additionally, we extend our thanks to the open-source community for their invaluable contributions to the tools and libraries used in BeatStream. BeatStream is the creation of @christophermccall, @deepa-kakade-git, and @lydiastonekonstanski, Zip Code Wilmington Data Engineering 5.0, spring 2024.

Thank you for choosing BeatStream! Happy listening and exploring!

About

A Musically-Focused Data Pipeline for Predictive Modeling, Analysis of Data in Motion, & Data Visualization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 96.3%
  • PowerShell 1.8%
  • Rich Text Format 1.2%
  • Other 0.7%