Skip to content

DemystData/code-kata

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Data Engineering Coding Challenges

Judgment Criteria

  • Beauty of the code (beauty lies in the eyes of the beholder)
  • Testing strategies
  • Basic Engineering principles

Problem 1

Parse fixed width file

  • Generate a fixed width file using the provided spec (offset provided in the spec file represent the length of each field).
  • Implement a parser that can parse the fixed width file and generate a delimited file, like CSV for example.
  • DO NOT use python libraries like pandas for parsing. You can use the standard library to write out a csv file (If you feel like)
  • Language choices (Python or Scala)
  • Deliver source via github or bitbucket
  • Bonus points if you deliver a docker container (Dockerfile) that can be used to run the code (too lazy to install stuff that you might use)
  • Pay attention to encoding

Problem 2

Data processing

  • Generate a csv file containing first_name, last_name, address, date_of_birth
  • Process the csv file to anonymise the data
  • Columns to anonymise are first_name, last_name and address
  • You might be thinking that is silly
  • Now make this work on 2GB csv file (should be doable on a laptop)
  • Demonstrate that the same can work on bigger dataset
  • Hint - You would need some distributed computing platform

Choices

  • Any language, any platform
  • One of the above problems or both, if you feel like it.

About

Coding exercise

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors