Data and Databases

Columbia University, Lede Program

Tuesdays and Thursdays, May 24th 2016 through July 7th 2016, 10am

Office hours: By appointment only.

For FERPA reasons, I ask that you e-mail me at my Columbia address when discussing any matters related to this class or your grade. Personal or professional inquiries can go to my personal address.

Description

Consideration of both the scientific and social implications of counting, of turning the world into bits. Through the process of gaining fluency in the use of Python, students will spend some time thinking through representations of core "data types" like time, location, text, image, sound and relationships (or networks), and the computational "affordances" associated with each. Students will study several common metaphors for organizing and storing data – from structureless key-value stores, to a single table or spreadsheet, to the "multiple tables" of a relational database. We will also discuss ideas behind publishing or sharing data, moving from HTML documents and Web 1.0 to data services and APIs in Web 2.0, to semantics in Web 3.0. Student work and discussion will underscore the reality that data are plentiful and circulate and interact in a kind of informational ecosystem. As researchers, our students will be called on both to access and to publish data products.

Notes for previous versions of the course:

2014
2015

Homework assignments

There will be six homework assignments in this class, each assigned on Thursday and due the following Tuesday before the beginning of class. The homework assignments are designed to test and expand your knowledge of the technical concepts introduced in class. Each homework assignment is worth 10% of your grade.

With the exception of the first assignment, all homeworks will take the form of an IPython Notebook that you fill in and send to a TA for grading. (We'll discuss the specifics of this in class.)

Grading

40% Attendance and participation
60% Homework assignments (10% each)

Schedule and notes

Week 1 (May 24 and 26)

Orientation
Student introductions
SQL basics

Homework #1 (due May 31): Read and respond to the following.

Relational and Non-Relational Models in the Entextualization of Bureaucracy by Michael Castelle
Literature is not Data: Against the Digital Humanities by Stephen Marche
Machine Bias by Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner

These essays each address the limits and consequences of data-driven analysis and public policy. Your response should take the form of a brief e-mail (no more than 3-5 paragraphs) sent to me. In your response, describe the critique of one or more of the essays and discuss how (if at all) you might incorporate their critique(s) into your practice as a journalist. Also in your e-mail, include and comment on a link to an essay or article that you feel "speaks to" the points raised in one or more of the essays (e.g., agrees with, provides a counterexample, expands upon, responds to).

Week 2 (May 31 and June 2)

Homework #2 (due June 7): TBD.

Week 3 (June 7 and 9)

Scraping HTML with Beautiful Soup

Homework #3 (due Jun 14): TBD.

Week 4 (June 14 and 16)

Scraping and SQL (Notes TK)
Using 3rd party libraries for API access (Twitter) (Notes TK)

Homework #4 (due June 21): TBD.

Week 5 (June 21 and 23)

Working with unstructured data
Regular expressions (Notes TK)

Homework #5 (due June 28): TBD.

Week 6 (June 28 and 30)

Making a Flask app (Notes TK)
Designing a web API (Notes TK)

Homework #6 (due July 5): TBD.

Week 7 (July 5 and 7)

Twitter bots (Notes TK)
Selected topics

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
CSV_to_SQL.ipynb		CSV_to_SQL.ipynb
Markup.ipynb		Markup.ipynb
README.md		README.md
SQL_in_Python.ipynb		SQL_in_Python.ipynb
SQL_notes.md		SQL_notes.md
SQL_notes_part2.md		SQL_notes_part2.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data and Databases

Description

Homework assignments

Grading

Schedule and notes

Week 1 (May 24 and 26)

Week 2 (May 31 and June 2)

Week 3 (June 7 and 9)

Week 4 (June 14 and 16)

Week 5 (June 21 and 23)

Week 6 (June 28 and 30)

Week 7 (July 5 and 7)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data and Databases

Description

Homework assignments

Grading

Schedule and notes

Week 1 (May 24 and 26)

Week 2 (May 31 and June 2)

Week 3 (June 7 and 9)

Week 4 (June 14 and 16)

Week 5 (June 21 and 23)

Week 6 (June 28 and 30)

Week 7 (July 5 and 7)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages