Skip to content

martaalcalams/FINAL-PROJECT

Repository files navigation

FINAL-PROJECT

· Overview:

  • Downloaded 5 datasets from Kaggle(Footlocker, ASOS, H&M, Mango and Zara). -

  • Webscrapped "Massimo Dutti" website and downloaded the relevant data for both men and women.

  • Imported all the necessary libraries.

  • Created a dataframe for each dataset.

  • Cleaned and organize each dataset separately.

  • Calculated the mean price for each dataset.

  • Created csv for each dataset after being cleaned (for later usage in Tableau).

  • Concatenated the 6 datasets (brands_df) and drop the necessary columns.

  • Combined data from various columns into new consolidated columns, then clean up redundant columns to refine the Dataframe.

  • Renamed brands_df as df_cleaned and drop innecesary columns.

  • Managed all the null values in different ways (dropping columns, filling with mode, filling with strings).

  • Created a csv for df_cleaned.

    · EDA:

  • Computed mean price for each group per section and display it in a bar chart.

  • Computed average price for each brand and display it in a bar chart.

  • Displayed in a barchart and dataframe for Footlocker's top 10 brands.

  • Created a pie chart representing the relationship between sales volume and promotion in Zara.

  • Displayed in a barchart H&M's top 10 colors.

  • Displayed in a barchart ASOS top 10 colors.

  • Displayed in a barchart and dataframe for Mango's top 10 categories by product.

  • Displayed in a barchart and dataframe for Massimo Duti's top 10 categories by average price.

  • Created a histogram for the price of the whole dataset (df_cleaned).

    · HYPOTHESIS TESTING:

  • Computed One Sample T-Test, with its corresponding hypotheses, significance level and its values well defined and calculated in order to compare it with alpha and reject or accept H0.

  • Computed two more tests but this time Two Sample T-Test with independent variables.

  • The last test was a Proportion Z-Test with both hypotheses and values well defined in order to compare it and reject H0.

    · MACHINE LEARNING: (Different notebook)

  • Imported the necessary libraries.

  • Imported and defined as "brands" brands_csv.

  • Created a Correlation Matrix dropping all the columns and just taking into account 'price', 'promotion' and 'section' .

  • Applyed feature scaling to numerical columns (price).

  • Train-test split process, defining the feature and target.

  • Linear regression model and its corresponding evaluation (R², MAE and RMSE).

  • Decision Tree model and its corresponding evaluation (R², MAE and RMSE).

  • Random Forest model and its corresponding evaluation (R², MAE and RMSE).

  • Gradient Boosting and its corresponding evaluation (R², MAE and RMSE).

  • Hyperparameter Tuning for Random Forest to look for the best params and score.

  • Hyperparameter Tuning for Gradient Boosting to look for the best params and score.

  • Compared the two hyperparameters tunning and create a csv with the best model (Gradient Boosting).

  • Displayed the results of the actual price and it's prediction.

    · SQL:

  • Imported both "brands.csv" and "gradient_boosting_predictions.csv" into SQL workbench.

  • Created a database for each csv.

  • Confirmed that each csv was well imported and display their corresponding columns and values.

  • Saved both SQL files.

    · TABLEAU:

  • Imported the 6 cleaned csv into different Data Sources.

  • Analyzed each database in two different ways. (we have changed from dimension to measure some of the fields when needed).

  • Created a dashboard for each database showing both sheets with their corresponding filters.

    · APP CREATION:

  • Created a copy of the "brands_final_project.ipynb" file and converted it into a Python script for improved modularity and usability.

  • Defined a "load_data()" function within this Python file to handle loading the entire "brands.csv" dataset.

  • Developed an additional Python file dedicated to app creation using Streamlit.

  • Imported Streamlit to build an interactive user interface.

  • Imported the data-handling script to utilize the "load_data()" function.

  • Implemented filter options to enable users to filter brands by key attributes: Brand, Section, Price Range, and Category.

  • Executed the Streamlit script in the terminal, generating a link to our fully functional app, complete with filtering capabilities for user exploration of the dataset. This README section provides a clear summary of the project structure, functionality, and app deployment steps.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors