Regular expressions
Length: 2 days
Description
Description: Regular expressions (“regexps”) make it possible to find patterns inside of text. Whether you’re trying to find all of the URLs in a document, or the IP addresses in a logfile, or telephone numbers in an address book, regular expressions can be invaluable — and thus are included in most modern programming languages, such as Python, Ruby, and JavaScript, as well as .NET, Java, and C++. Many utilities, such as Unix’s famous “grep” command, are popular because they use regular expressions. However powerful regular expressions might be, they are also famously difficult to write, and even more difficult to read. Many experienced programmers find themselves frustrated by the syntax of regular expressions, and either avoid them entirely or use pre-packaged recipes they find on the Internet.
This course introduces regular expressions, and provides numerous insights into their many features and uses. It will also point to differences between dialects of regular expressions, ways in which they should (and shoudn’t) be used, advanced techniques such as named groups and lookahead/lookbehind.
By the end of the course, participants should feel comfortable using regular expressions to find and analyze text.
A large part of the course will be hands-on exercises, which will help participants to learn and understand the regular expression syntax.
Like all of my courses, this is taught without slides. Instead, I live-code into a Jupyter notebook that is available in real time and which I distribute to participants at the end of the course.
Let’s talk about how to customize this course for your team! Set a meeting at https://savvycal.com/reuven/corp-training.
Audience
Audience: This course is aimed at experienced programmers who wish to unlock the power of regular expressions in their day-to-day work. While nearly all exercises will be in the Python language, little or no knowledge of Python is necessary; the course will begin with a very brief introduction to the Python features needed to do the exercises.
Syllabus
• Minimal Python for text processing: Strings, lists, loops, and files
• Overview of regular expressions: re.find, re.search, re.findall, and match objects
• Characters and metacharacters
• Multiple runs of a character with +, *, ?, and {min,max}
• Character classes: [], ^, $, and –
• Built-in character classes: \w, \W, \s, \S, \d, \D
• Special cases: Greediness, anchors, start/end of line, start/end of string, start/end of word
• Options: Case, line endings, and extended with comments
• Alternation
• Parentheses for combining. Capturing. Non-capturing parentheses. Named groups. Backreferences. Lookahead and lookbehind.
• Compiling regexps. Replacing regexps. Match context. Strategies.
• Regexps in Unix vs. Python
• Unsupervised learning and clustering.
•
