This section is scheduled for 25 minutes (see course_timetable.md: 11:50–12:15).
By the end of this section, participants should be able to:
- Explain what “strings are objects” means, and use string methods with dot notation.
- Recognise that strings are immutable and that methods return a new string.
- Choose between
find()/index()/ theinoperator depending on the task. - Use common “checking” methods like
startswith()/endswith(). - Clean and normalise text using
strip(),lower(),upper(),title(),capitalize(), andreplace(). - Split and join text using
split()andjoin(). - Apply these tools in a small “text processing pipeline” exercise.
- 11:50–11:53 (3 min) — Strings are objects + methods + immutability
- 11:53–11:59 (6 min) — Finding/searching:
find()vsindex()+in - 11:59–12:03 (4 min) — Checking:
startswith()/endswith()(+ case sensitivity) - 12:03–12:08 (5 min) — Transforming:
replace(), case methods,strip() - 12:08–12:11 (3 min) — Splitting/joining:
split()+join() - 12:11–12:15 (4 min) — Exercise: text processing pipeline + recap
Open the first few cells of section 7.
Say:
- “In Python, a string isn’t just text — it’s an object with built-in behaviour.”
- “That behaviour shows up as methods:
some_string.method(...).” - “Important property: strings are immutable. String methods return a new string — they don’t modify the original.”
Run the .upper() example.
Ask:
- “After
result = text.upper(), what do we expecttextto be?”
Key message:
- “If you want to keep the modified value, assign it (e.g.
text = text.strip()).”
Run the find('fox') / index('fox') example.
Say:
- “Both tell us where a substring starts.”
- “The key difference is what happens when it’s missing.”
Run find('wombat').
Say:
- “
find()returns-1when it can’t find the substring.” - “
index()would raise an exception if the substring isn’t present — that can be useful when ‘missing’ should be treated as an error.”
Tie-back to exceptions:
- “So,
index()is a nice example of: ‘this situation is exceptional, stop and tell me’.”
Run the startswith/endswith cell.
Say:
- “These return booleans — great for validation and quick filtering.”
- “They are case-sensitive, like most string operations.”
Run the “Case sensitivity matters” cell.
Optional note (if someone asks about case-insensitive matching):
- “A common pattern is normalise first:
line.lower().startswith('the').”
Run the if "fox" in line: example.
Say:
- “If you don’t need the index, this is usually the clearest way to ask: ‘is this substring present?’”
Run the replace example.
Say:
- “
replace(old, new)is the simplest way to do straightforward substitutions.”
Run the case-normalisation example (upper(), title(), capitalize()).
Say:
- “These are useful for normalising messy text.”
- “Be aware
title()can be a little opinionated (it tries to capitalise each word).”
Run the strip() example.
Say:
- “Real-world text often comes with whitespace —
strip()is a quick, safe cleanup step.” - “This is very common before validation (emails, usernames, file names, etc.).”
Run the split() example.
Say:
- “
split()turns a string into a list. With no argument, it splits on whitespace and handles multiple spaces nicely.” - “With a separator, it splits on that exact substring.”
Run the join() examples.
Say:
- “
join()goes the other way: list → single string.” - “Read it as: separator
.join(list_of_strings).” - “This is the canonical way to build text output (more efficient and cleaner than repeated
+).”
Go to the “Text Processing Pipeline” cell.
Say:
- “This is a miniature version of what you’ll do in real data cleaning: strip, extract the part you care about, validate, normalise.”
Prompt participants (2–3 minutes):
- “Fill in the TODOs:
strip()- extract the email from
Name <email>format - normalise to lowercase and store valid emails
- bonus: extract unique domains”
Ask while they work:
- “Which string methods are you using where?”
- “Why do we normalise (lowercase) before comparing/storing?”
If time, recap:
- “These string methods are small building blocks — combined, they cover a large fraction of everyday text processing.”
for i, entry in enumerate(user_data, 1):
cleaned = entry.strip()
if "<" in cleaned and ">" in cleaned:
email_part = cleaned.split("<")[1].split(">")[0]
else:
email_part = cleaned
if "@" in email_part and "." in email_part:
normalized = email_part.lower()
valid_emails.append(normalized)
domains = {email.split("@")[1] for email in valid_emails}