What it does
The model submitted is explicitly trained for Cyclica's challenge question. In the submission, you will find our code for the deep learning algorithm that we have come up with and our solution to the unlabeled dataset.
How we built it
Multiple models were tried: MLP, logistic, linear, random forest, voting, etc. Random forest is used at last in the end with F1 score = 71.89.
Challenges we ran into
Imbalanced data (much more negative than positive)
Accomplishments that we're proud of
Understood and devised multiple models to tackle the issue, then decided the best approach for the given dataset.
What we learned
Fundamental skills in PyTorch, sklearn, pandas, and other libraries. We also gain experience in developing deep learning models and training them. Last but not least, we understood that not all models are suitable for a given dataset and that pre-processing the data beforehand is essential.
What's next for Data-Hackthon-2023
We want to develop the model further/train a new model for more accurate predictions. Moreover, we also want to try out the questions we did not have time to complete.
Log in or sign up for Devpost to join the conversation.