8) Text and Strings Lesson

Introduction: Python String

8 min to complete · By Martin Breuss

Working with text is a productive and exciting use of your budding programming skills. There is an endless and ever-growing amount of text data out there.

Text Data Examples

Some examples of text data are your personal emails, all digitized novels that were ever written, scientific publications, countless tweets, and other social media data that are produced daily, or also more inconspicuous data, such as readings from medical devices.

Old Typeface - Photo by Annie Spratt on http://unsplash.com/@anniespratt

Photo by Annie Spratt on http://unsplash.com/@anniespratt

Text is everywhere, and humans use text to communicate all sorts of information. It's probably the most widely used way of recording information. Being able to work with text programmatically can therefore be fun, challenging, and rewarding---all at once! You'll start by defining how Python knows that something is text.

Playground: Text is Wrapped in Quotes

The data type that Python uses to encode text is called string (str). Everything that is wrapped within single ('') or double quotation marks ("") is considered a string in Python:

print("this is a string") 
print('this is also a string')

You can choose which types of quotation marks you want to use. Both of them work fine as long as you don't mix them together. Go ahead and add the following code into the playground and watch it respond with an error:

print("help!')  # SyntaxError

Now that you know how to define a string in Python, take a moment to ponder what a string might look like from Python's perspective.

String is a Sequence of Characters

Python thinks of a string as a sequence of characters. Okay, but what does that mean? Put yourself into Python's shoes. Well, maybe that's a strange request, given that snakes aren't really known for their footwear... ¯\_(ツ)_/¯

When Python processes a word, for example "hello", it thinks of it differently than of a number, such as 200. The number 200 is just that, a number, a single unit of information. However, "hello" is not a single piece of information for Python. Instead, it sees it as a sequence of characters:

"h" + "e" + "l" + "l" + "o"

Programs always need to break things into their smallest units; that's just something they love doing. Computers understand numbers well, but they don't have a conceptual understanding of words. While for you, as a human, it is easy to understand that the word "hello" is correct and "hlelo" isn't quite correct, your computer has no way of knowing any of that.

Colorful illustration of a light bulb

Info: There's an interesting field of Machine Learning that deals with the processing of human languages. It's called Natural Language Processing (NLP) and it has been an active research topic for many decades. More recently, there have been some big improvements made that can help computers to correctly interpret, process, and even produce an impressively large amount of human language.

But keep your phone's speech assistance offline for yet another moment and skip on that latest ChatGPT conversation. Continue to think about how text gets processed at the programming level.

As mentioned above, Python doesn't think of text as a unit of words or sentences, but instead at the lower level of characters. So, if it's a sequence of characters, that means that Python should know which ones there are and how many of them.

Playground: Python len() Function

If you're wondering how many characters your string consists of, then you can use the handy len() function to find out:

name = 'codingnomads'
print(len(name)) 

With the len() function, you can find out how many characters make up your string.

Getting the length of your string is only one of the many things that you can do with strings in Python. It's also probably not the most exciting one. However, if you keep in mind that you can ask how long a string is using len(), this can be a good way to remember that Python thinks of strings as a sequence of characters.

Learn by Doing

To get a better understanding of Python strings, complete the first exercise in your Python labs on strings.

Please be sure to push your work to GitHub when you're done.

When you get the feeling that you went over the topic of string length len("too many") times, then you're ready to move on to the next lesson and start learning about what you can do with strings.

Summary: Introduction to Python String

  • Python represents text data as the data type string (str)
  • Strings are wrapped in quotes
  • You can use single or double quotation marks but don't mix them
  • Strings are collections of individual characters
  • len() will return the length of a str