Presenters: Mahdi Belcaid (HDSI), Sean Cleveland(UH), Ron Merrill (UH), David Schanzenbach (UH), and Jennifer Geis(UH)
This workshop focuses specifically on the Python skills necessary for data analysis -- as opposed to software development -- and introduces some of the libraries that have made Python a popular alternative for working with data at any scale.
- Work with the Pandas library to conduct essential data analysis tasks such as reading, exploring, filtering, and summarizing data.
- Slice, shape and pivot tables.
- Implement calculations on rows, columns, and tables.
- Use split-apply-combine to summarize data
- Merge, concatenate and filter data from multiple sources.
- Visualize data using matplotlib
Participants should bring their laptops and plan to participate actively. Laptops will require a browser application for accessing jupyter notebooks resources.
Python is a popular language for research computing, and great for general-purpose programming as well. Installing all of its research packages individually can be a bit difficult, so we recommend Anaconda, an all-in- one installer. Regardless of how you choose to install it, make sure you install Python version 3.6. We will extensively use the Jupyter programming environment that runs in a web browser. For this to work you will need a reasonably up- to-date browser. The current versions of the Chrome, Safari and Firefox browsers are all supported (some older browsers, including Internet Explorer version 9 and below, are not).
Browse to http://continuum.io/downloads Download the Python installer for Windows Install Python 3.6 using all of the defaults for installation except make sure to check “Make Anaconda the default Python”
Browse to http://continuum.io/downloads Download the Python 3.6 installer for OS X Install using all of the defaults for installation
- 9AM BEGIN WORKSHOP
- 10:30AM BREAK
- 10:45 RESUME
- NOON LUNCH
- 1PM RESUME
- 2:30PM BREAK
- 2:45 RESUME
- 4PM STOP FOR THE DAY
- 9AM BEGIN WORKSHOP
- 10:30AM BREAK
- 10:45 RESUME
- NOON LUNCH
- 1PM RESUME
- 2:30PM BREAK
- 2:45 RESUME
- 4PM STOP FOR THE DAY
Preliminaries.ipynb https://bit.ly/2H5N9Xl
Introduction_to_Python.ipynb https://bit.ly/2vuWAP1
Intro_to_pandas.ipynb https://bit.ly/2J3Y3xp
Plotting_and_visualization.ipynb https://bit.ly/2EW4Erk
Exploring_data.ipynb https://bit.ly/2J3Y4RZ
Missing_values.ipynb https://bit.ly/2voIHl6
Data Files ALL DATA FILES
ZIP OF FILES Once unzipped all the files are in the "data" folder
Grouping Dataframes https://bit.ly/2Ha9lE3
Merging Joining Data https://bit.ly/2JXCgZN
Plotting with Seaborn https://bit.ly/2HyqjLC
Please fill out the demographic Survey


