Skip to content

Latest commit

 

History

History
171 lines (103 loc) · 5.66 KB

File metadata and controls

171 lines (103 loc) · 5.66 KB

Speaker notes — 7. Advanced string manipulation (11:50–12:15)

This section is scheduled for 25 minutes (see course_timetable.md: 11:50–12:15).

Outcomes (say this upfront)

By the end of this section, participants should be able to:

  • Explain what “strings are objects” means, and use string methods with dot notation.
  • Recognise that strings are immutable and that methods return a new string.
  • Choose between find() / index() / the in operator depending on the task.
  • Use common “checking” methods like startswith() / endswith().
  • Clean and normalise text using strip(), lower(), upper(), title(), capitalize(), and replace().
  • Split and join text using split() and join().
  • Apply these tools in a small “text processing pipeline” exercise.

Timing plan (25 minutes)

  • 11:50–11:53 (3 min) — Strings are objects + methods + immutability
  • 11:53–11:59 (6 min) — Finding/searching: find() vs index() + in
  • 11:59–12:03 (4 min) — Checking: startswith() / endswith() (+ case sensitivity)
  • 12:03–12:08 (5 min) — Transforming: replace(), case methods, strip()
  • 12:08–12:11 (3 min) — Splitting/joining: split() + join()
  • 12:11–12:15 (4 min) — Exercise: text processing pipeline + recap

Talk track (what to say)

1) Strings are objects + methods (11:50–11:53)

Open the first few cells of section 7.

Say:

  • “In Python, a string isn’t just text — it’s an object with built-in behaviour.”
  • “That behaviour shows up as methods: some_string.method(...).”
  • “Important property: strings are immutable. String methods return a new string — they don’t modify the original.”

Run the .upper() example.

Ask:

  • “After result = text.upper(), what do we expect text to be?”

Key message:

  • “If you want to keep the modified value, assign it (e.g. text = text.strip()).”

2) Finding/searching: find() vs index() (11:53–11:59)

Run the find('fox') / index('fox') example.

Say:

  • “Both tell us where a substring starts.”
  • “The key difference is what happens when it’s missing.”

Run find('wombat').

Say:

  • find() returns -1 when it can’t find the substring.”
  • index() would raise an exception if the substring isn’t present — that can be useful when ‘missing’ should be treated as an error.”

Tie-back to exceptions:

  • “So, index() is a nice example of: ‘this situation is exceptional, stop and tell me’.”

3) Checking: startswith() / endswith() + case sensitivity (11:59–12:03)

Run the startswith/endswith cell.

Say:

  • “These return booleans — great for validation and quick filtering.”
  • “They are case-sensitive, like most string operations.”

Run the “Case sensitivity matters” cell.

Optional note (if someone asks about case-insensitive matching):

  • “A common pattern is normalise first: line.lower().startswith('the').”

4) Canonical substring search: in (quick, readable)

Run the if "fox" in line: example.

Say:

  • “If you don’t need the index, this is usually the clearest way to ask: ‘is this substring present?’”

5) Transforming strings: replace/case/strip (12:03–12:08)

Run the replace example.

Say:

  • replace(old, new) is the simplest way to do straightforward substitutions.”

Run the case-normalisation example (upper(), title(), capitalize()).

Say:

  • “These are useful for normalising messy text.”
  • “Be aware title() can be a little opinionated (it tries to capitalise each word).”

Run the strip() example.

Say:

  • “Real-world text often comes with whitespace — strip() is a quick, safe cleanup step.”
  • “This is very common before validation (emails, usernames, file names, etc.).”

6) Splitting and joining strings (12:08–12:11)

Run the split() example.

Say:

  • split() turns a string into a list. With no argument, it splits on whitespace and handles multiple spaces nicely.”
  • “With a separator, it splits on that exact substring.”

Run the join() examples.

Say:

  • join() goes the other way: list → single string.”
  • “Read it as: separator .join(list_of_strings).”
  • “This is the canonical way to build text output (more efficient and cleaner than repeated +).”

7) Exercise: practical text processing pipeline (12:11–12:15)

Go to the “Text Processing Pipeline” cell.

Say:

  • “This is a miniature version of what you’ll do in real data cleaning: strip, extract the part you care about, validate, normalise.”

Prompt participants (2–3 minutes):

  • “Fill in the TODOs:
    1. strip()
    2. extract the email from Name <email> format
    3. normalise to lowercase and store valid emails
    4. bonus: extract unique domains”

Ask while they work:

  • “Which string methods are you using where?”
  • “Why do we normalise (lowercase) before comparing/storing?”

If time, recap:

  • “These string methods are small building blocks — combined, they cover a large fraction of everyday text processing.”

Reference solution (teacher-only)

for i, entry in enumerate(user_data, 1):
    cleaned = entry.strip()

    if "<" in cleaned and ">" in cleaned:
        email_part = cleaned.split("<")[1].split(">")[0]
    else:
        email_part = cleaned

    if "@" in email_part and "." in email_part:
        normalized = email_part.lower()
        valid_emails.append(normalized)

domains = {email.split("@")[1] for email in valid_emails}