The following exercise is designed to help us get a sense of how you write and communicate data insights with both stakeholders and other data team members. It is split into two parts: one that tests your SQL skills and the other focused on your ability to analyze data and present findings in a clear, engaging, and actionable way.
- We don’t expect you to spend more than 90 minutes on these two exercises, so just work through the questions within the time limit and see what you can complete.
- We think the provided data is sufficient to answer some really interesting questions and do not expect you to incorporate any additional data sources.
- We don’t expect you to be familiar with our filing data - please make assumptions, but be sure to document them in your code so that we can follow your thought process.
- When you’re ready, please share the link to your Github repository. It should include the following:
Exercise Part 1: SQL Refactor
- A .sql or .txt file containing your refactored SQL query
- Comments to explain your thought process, reasoning behind changes, what assumption did you make
Exercise Part 2: Storytelling with Data
- If using a data visualization tool: a link to data visualizations and analysis conducted or screenshots of visualization and analysis included in repo
- If using Python or R include notebook file for your analysis, OR an R or Python file with your code + comments to help us follow your work
- In your repository's README file provide any additional context on the analysis you performed and any insights you found. Also provide a brief description of your analysis method of choice and how this may have helped you answer questions in this exercise.
Objective: Refactor this SQL Query so that it is easily readable by your teammates and fix any syntax errors you come across, clean up formatting, and add context where needed.
Background:
- We store our data in a redshift database, and while we’ve done a lot of good work modeling this raw resource to help ourselves and teams across the organization quickly answer basic questions about fundraising patterns on our site, many of the deeper explorations our team does for the organization starts with manipulating the data stored in our redshift cluster using SQL.
- This query is a messy version of the query we used to pull the data used in exercise two of this take home. All of the where clauses, and table sources are correct and don’t need to be updated, but this is obviously not a great work product to share with colleagues - it’s hard to scan for quick understanding of the work, has inconsistent syntax, and there are some syntactical errors that would cause this query run to fail.
Suggested Time: 30 minutes
What we're looking for:
- Clear and readable formatting. If you are not familiar with Common Table Expressions (CTEs) check out this resource
- Everyone has different SQL styles and we are intentionally not sharing our internal style guide, so prioritize consistency over a specific style
- Highly suggested that you leave comments throughout the file to explain your work, express uncertainty, suggestions on how to improve the query, or ask questions
Objective: Draw insights from real world campaign finance data like ActBlue’s (publicly available) FEC filing using a data visualization tool of your choice e.g Tableau Public, Looker Data Studio, etc.
Materials:
Dataset #1: A sample of contribution data from ActBlue’s FEC filings:
Data_Science_Technical_FEC_Filing_Sample.csvis a sample of records derived from 3 of ActBlue's monthly filings in the 2020 cycle: February Monthly 2020, March Monthly 2020, April Monthly 2020.
Dataset #2: FEC committee data from the 2020 cycle:
Data Science_Technical_FEC_Committee_Data_2020.csvis derived from the Committee Master file for 2019-2020 provided on the FEC Bulk Data page.
Tools: Python or R and data visualization tool
Instructions:
- Begin with data hygiene: describe the data integrity checks you'd typically use before analyzing a new dataset
- Join the datasets: merge contribution and committee data
- Analyze the data and tell a story: Using a tool of your choice visualize the FEC datasets. Show any statistical or exploratory analysis you performed (are there any trends you see, any takeaways)
- Ask 1-2 questions based on this data to highlight anything that stands out to you or something people may have missed with these datasets
Suggested Time: 60 minutes