Columbia University, Lede Program
Tuesdays and Thursdays, May 24th 2016 through July 7th 2016, 10am
Allison Parrish, instructor.
Office hours: By appointment only.
For FERPA reasons, I ask that you e-mail me at my Columbia address when discussing any matters related to this class or your grade. Personal or professional inquiries can go to my personal address.
Consideration of both the scientific and social implications of counting, of turning the world into bits. Through the process of gaining fluency in the use of Python, students will spend some time thinking through representations of core "data types" like time, location, text, image, sound and relationships (or networks), and the computational "affordances" associated with each. Students will study several common metaphors for organizing and storing data – from structureless key-value stores, to a single table or spreadsheet, to the "multiple tables" of a relational database. We will also discuss ideas behind publishing or sharing data, moving from HTML documents and Web 1.0 to data services and APIs in Web 2.0, to semantics in Web 3.0. Student work and discussion will underscore the reality that data are plentiful and circulate and interact in a kind of informational ecosystem. As researchers, our students will be called on both to access and to publish data products.
Notes for previous versions of the course:
There will be six homework assignments in this class, each assigned on Thursday and due the following Tuesday before the beginning of class. The homework assignments are designed to test and expand your knowledge of the technical concepts introduced in class. Each homework assignment is worth 10% of your grade.
With the exception of the first assignment, all homeworks will take the form of an IPython Notebook that you fill in and send to a TA for grading. (We'll discuss the specifics of this in class.)
- 40% Attendance and participation
- 60% Homework assignments (10% each)
- Orientation
- Student introductions
- SQL basics
Homework #1 (due May 31): Read and respond to the following.
- Relational and Non-Relational Models in the Entextualization of Bureaucracy by Michael Castelle
- Literature is not Data: Against the Digital Humanities by Stephen Marche
- Machine Bias by Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner
These essays each address the limits and consequences of data-driven analysis and public policy. Your response should take the form of a brief e-mail (no more than 3-5 paragraphs) sent to me. In your response, describe the critique of one or more of the essays and discuss how (if at all) you might incorporate their critique(s) into your practice as a journalist. Also in your e-mail, include and comment on a link to an essay or article that you feel "speaks to" the points raised in one or more of the essays (e.g., agrees with, provides a counterexample, expands upon, responds to).
- Installing Python Libraries
- Using SQL in Python
- IPython/Jupyter Notebooks
- SQL and CSVs
Homework #2 (due June 7): TBD.
Homework #3 (due Jun 14): TBD.
- Scraping and SQL (Notes TK)
- Using 3rd party libraries for API access (Twitter) (Notes TK)
Homework #4 (due June 21): TBD.
- Working with unstructured data
- Regular expressions (Notes TK)
Homework #5 (due June 28): TBD.
- Making a Flask app (Notes TK)
- Designing a web API (Notes TK)
Homework #6 (due July 5): TBD.
- Twitter bots (Notes TK)
- Selected topics