BeatStream: A Musically-Focused Data Pipeline

ABOUT:

Welcome to BeatStream! This project is a comprehensive data pipeline designed for predictive modeling, real-time analysis of music data, and interactive visualization.
Whether you're a music enthusiast, a data scientist, or someone intrigued by the intersection of music and technology, BeatStream offers an exciting platform to explore.
By leveraging advanced predictive modeling techniques and real-time data analysis, BeatStream offers personalized song recommendations tailored to individual user preferences.
BeatStream recommends the next song based on the user's current track and selected predictive models. These models are chosen specifically for each user, enhancing the accuracy and relevance of recommendations.
### TECH STACK: BeatStream is built on the following technologies: -Python 3 -Zookeeper -Kafka -MySQL

KEY FEATURES:

-Personalized Song Recommendations: BeatStream recommends the next song based on the user's current song and a selection of predictive models chosen specifically for each user.

-Model Performance Ranking: BeatStream employs a detailed scoring system to rank the performance of predictive models, facilitating continuous improvement through training dataset iterations.

-Adaptability: BeatStream is highly adaptable and can be integrated seamlessly with any music service or personal library to provide next song recommendations.

-Million Song Database Integration: Leveraging the Million Song Database, BeatStream offers users a vast selection of artists and tracks to choose from, allowing users to access a diverse range of musical data for analysis and exploration.

-Data Visualization: When combined with the BeatStream Dashboard front end, users gain access to visual and statistical comparisons of analytical and machine learning models, as well as insights into the characteristics of popular artists and songs.

GETTING STARTED:

Clone the Repository: Begin by cloning this repository to your local machine.
Setup MySQL: Set up a local MySQL environment and request access to the Million Song Database from the project's authors. Configure MySQL environment variables and create a database named "beatstream".
Install Kafka and Zookeeper: Install and set up Kafka and Zookeeper through Bootstrap.
Virtual Environment: Ensure that a virtual environment is set up using the provided requirements.txt file.
Install Dependencies: Navigate to the project directory and install the necessary dependencies by running: pip install -r requirements.txt
Start Kafka and Zookeeper: In two separate PyCharm terminal windows, first run start-zookeeper.sh, and then run start-kafka.sh. If Kafka fails to start, a computer reboot may help.
Run Consumers: Execute the consumers.py script.
Run Producers: Run the producers.py script.
Real-time Analysis and Recommendations: As batches of user song choices are generated by producers, consumers will analyze the data, make predictions using various models, and update the BeatStream MySQL database with recommendations.
Database Testing: Utilize database_test.py to confirm that database entries are being updated as expected.
Explore and Enjoy: Once BeatStream is running, refer to Beatstream_dashboard to explore the available features, customize predictive models, and enjoy personalized music recommendations!

LICENSE:

BeatStream is licensed under the MIT License. See the LICENSE file for details.

ACKNOWLEDGEMENTS:

We would like to express our gratitude to the creators of the Million Song Database for providing such a rich resource for music analysis. Additionally, we extend our thanks to the open-source community for their invaluable contributions to the tools and libraries used in BeatStream. BeatStream is the creation of @christophermccall, @deepa-kakade-git, and @lydiastonekonstanski, Zip Code Wilmington Data Engineering 5.0, spring 2024.

Thank you for choosing BeatStream! Happy listening and exploring!

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.idea		.idea
beatstream_dbt		beatstream_dbt
connections		connections
data		data
machine_learning		machine_learning
models old		models old
pred_models		pred_models
producers		producers
rec_models		rec_models
spark_utils		spark_utils
topic_utils		topic_utils
ui		ui
venv1		venv1
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
MillionSongLoadingNotes.txt		MillionSongLoadingNotes.txt
README.md		README.md
mysql history.rtf		mysql history.rtf
requirements.txt		requirements.txt
start-kafka.sh		start-kafka.sh
start-zookeeper.sh		start-zookeeper.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BeatStream: A Musically-Focused Data Pipeline

ABOUT:

KEY FEATURES:

GETTING STARTED:

LICENSE:

ACKNOWLEDGEMENTS:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BeatStream: A Musically-Focused Data Pipeline

ABOUT:

KEY FEATURES:

GETTING STARTED:

LICENSE:

ACKNOWLEDGEMENTS:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages