Udacity project for Wrangle and Analyze Data
In the real world, data rarely comes clean. I will be using Python and its libraries to gather data from various sources and in different formats. My task will be to assess the quality and tidiness of the data and then proceed to clean it. This process is referred to as data wrangling. I will document all my efforts in a Jupyter Notebook and showcase them through analyses and visualizations using Python (and its libraries) and/or SQL.
The dataset that I will be wrangling, analyzing, and visualizing is the tweet archive of my favorite Twitter user, @dog_rates, also known as WeRateDogs. I adore this Twitter account that rates people's dogs with humorous comments about the dogs. What's interesting is that these ratings almost always have a denominator of 10. However, the numerators are almost always greater than 10. For instance, 11/10, 12/10, 13/10, and so on. The reason? Because "they're good dogs Brent." WeRateDogs has a massive following of over 4 million and has received media coverage worldwide.
WeRateDogs kindly shared their Twitter archive with Udacity, who passed it along to me via email for exclusive use in this project. This archive provides basic tweet data (tweet ID, timestamp, text, etc.) for all 5000+ of their tweets as they appeared on August 1, 2017. I'll be delving into this archive soon to extract valuable insights.
Project Steps Overview Your tasks in this project are as follows:
Step 1: Gathering data
Step 2: Assessing data
Step 3: Cleaning data
Step 4: Storing data
Step 5: Analyzing, and visualizing data
Step 6: Reporting
your data wrangling efforts your data analyses and visualizations