#Algorithms, Summer 2016 ##LEDE Program, Columbia University, Graduate School of Journalism
###Instructor:
Richard Dunks: rad2184 [at] columbia [dot] edu
####Room Number: Pulitzer Hall 601B
####Course Dates: 12 July - 25 August 2016
###Navigation
- Course Overview
- Learning Objectives
- Course Requirements
- Course Readings
- Assignments
- Course Policies
- Resources
- Course Outline
This course presents an overview of algorithms as they relate to journalistic tradecraft, with particular emphasis on algorithms that relate to the discovery, cleaning, and analysis of data. This course intends to provide literacy in the common types of data algorithms, while providing practice in the design, development, and testing of algorithms to support news reporting and analysis, including the basic concepts of algorithm reverse engineering in support of investigative news reporting. The emphasis in this class will be on practical applications and critical awareness of the impact algorithms have in modern life.
######back to top
- You will understand the basic structure and operation of algorithms
- You will be familiar with basic descriptive statistics
- You will understand the primary types of data science algorithms, including techniques of supervised and unsupervised machine learning
- You will be practiced in implementing basic algorithms in Python
- You will be able to meaningfully explain and critique the use and operation of algorithms as tools of public policy and business
- You will understand how algorithms are applied in the newsroom
######back to top
###Course Requirements All students will be expected to have a laptop during both lectures and lab time. Time will be set aside to help install, configure, and run the programs necessary for all assignments, projects, and exercises. Where possible, all programs will be free and open-source. All assigned work using services hosted online can be run using free accounts.
######back to top
###Course Readings The required readings for this course consist of book chapters, newspaper articles, and short blog posts. The intention is to help give you a foundation in the critical skills ahead of class lectures. All required readings are available online or will be made available to you electronically. Recommended readings are suggestions if you wish to study further the topics covered in class. Suggested readings will also be provided as appropriate for those interested in a more in-depth discussion of the material covered in class. Readings assigned in class are to be completed before the next class.
######back to top
###Assignments This course consists of programming and critical response assignments intended to reinforce learning and provide you with practical applications of the material covered in class. Completion of these assignments is critical to achieving the learning objectives of this course. Assignments are intended to be completed during lab time or for homework. Generally, assignments will be due before the start of the next class, unless otherwise stated. For example, assignments given on Tuesday will be due before class on Thursday. Time will be set aside in class to review assignments and provide feedback to you on your work.
- Programming assignments will be submitted via Github. Please follow the tutorial for submitting assignments on Github. The exercises should be standalone for each assignment, not a combination of all assignments. This allows them to be tested and scored separately.
- Programming assignments not following the naming convention
<lastname>_<firstname>_<class_num>_<assignment_num>.ipynbwill not be counted as completed. - Response questions should be clear, concise, and use the elements of good grammar. This is an opportunity to develop your ability to explain algorithms to your audience. You will receive further direction on how to submit these assignments.
######back to top
###Class Format Class runs from 10am to 1pm Tuesday and Thursday. Lab time will be from 2pm to 5pm Tuesday and Thursday. The class will be broken up into two blocks of approximately 85 minutes each, with a 10-minute break between each block. Class will be a mix of lecture and practical exercise work, emphasizing the application of skills covered in the lecture portion of the class. Lab time is intended for the completion of exercises, but may also include guided learning sessions as necessary to ensure comprehension of the course material.
######back to top
- Attendance and Tardiness: I expect you to attend every class, arriving on time and staying for the entire duration of class. Absences will only be excused for circumstances coordinated in advance and you are responsible for making up any missed work.
- Participation: I expect you to be fully engaged while you’re in class. This means asking questions when necessary, engaging in class discussions, participating in class exercises, and completing all assigned work. Learning will occur in this class only when you actively use the tools, techniques, and skills described in the lectures. I will provide you ample time and resources to accomplish the goals of this course and expect you to take full advantage of what’s offered.
- Late Assignments: All assignments are to be submitted before the start of class. Assignments posted by the end of the day following class will be marked down 10% and assignments posted at the end of the day following will be marked down 20%. No assignments will be accepted for a grade after three days following class.
- Office Hours: I won’t be holding regular office hours, but am available via email and Slack to answer whatever questions you may have about the material and to arrange a time to meet. Please feel free to also reach out to the Teaching Assistants as necessary for support and guidance with the exercises, particularly during lab time.
######back to top
- Stack Overflow - Q&A community of technology pros
####(Some) Open Data Sources
- New York City Open Data Portal
- New York State Open Data Portal
- Hilary Mason’s Research Quality Data Sets
####Visualizations
####Data Journalism and Critiques
####Suggested Reading Conway, Drew and John Myles White. Machine Learning for Hackers. O'Reilly Media, Inc., 2012.
Knuth, Donald E. The Art of Computer Programming. Addison-Wesley Professional, 2011.
MacCormick, John. Nine Algorithms That Changed the Future: The Ingenious Ideas That Drive Today's Computers. Princeton University Press, 2011.
McCallum, Q Ethan. Bad Data Handbook. O'Reilly Media, Inc., 2012.
McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O'Reilly Media, Inc., 2012.
O'Neil, Cathy and Rachel Schutt. Doing Data Science: Straight Talk from the Front Line. O'Reilly Media, Inc., 2013.
Russell, Matthew A. Mining the Social Web. O'Reilly Media, Inc., 2013.
Sedgewick, Robert and Kevin Wayne. Algorithms. Addison-Wesley Professional, 2011.
Steiner, Christopher. Automate This: How Algorithms Came to Rule Our World. Penguin Group, 2012.
######back to top
###Course Outline (Subject to change)
####Week 1: Introduction to Algorithms
####Week 2: Statistics/Introduction to Machine Learning
####Week 3: Supervised Learning - Feature Engineering and Decision Trees
####Week 4: Supervised Learning - Random Forest and Naive Bayes
####Week 5: Unsupervised Learning - Clustering and k-NN
####Week 6: Unsupervised Learning - Neural Networks, Natural Language Processing and Algorithms in Everyday Life
####Week 7: Open Lab/Final Projects
######back to top