After using your browser and your dev tools to get a good idea about the structure of the page you want to scrape, you're ready to start your web scraper by writing some code to fetch that HTML structure using Python's requests library.
Install Requests
First, you'll need to create a virtual environment and install the requests library since it's an external package:
python3 -m venv venv
source venv/bin/activate
python3 -m pip install requests
Executing these three commands in succession will create and activate a new virtual environment named venv and install requests into that virtual environment.
What is the requests Library?
The requests library is a widely used external Python library that allows you to interact with the Internet. It has a very user-friendly interface, which is acknowledged also in its catchphrase:
HTTP for Humans
You've used the requests library before, and often, you won't need more than a few lines of code to get what you want. For your Python web scraping needs in this example, that holds true as well:
import requests
BASE_URL = "https://codingnomads.github.io/recipes/"
page = requests.get(BASE_URL)
print(page.text)
This short code snippet will fetch the content of the main page of the CodingNomads recipe collection and print the HTML content to your console.
Info: When you see the output in your console, you'll understand why it's much more user-friendly to inspect the HTML structure using your dev tools in your browser :)
There's your page content! With just a bit of code, you've got access to all the HTML of the page inside of your Python script. So, what type of content are you working with here:
print(type(page.text)) # OUTPUT: <class 'str'>
Looks like this is one big str! Well, that's kind of hard to work with! You could use Python's string methods that you've learned about at the beginning of this course and identify the interesting parts inside of this soup of HTML code, or you could learn to use regular expressions to pick information from this text.
But there's an easier way! Like so often, someone else already did the work for you, and you can rely on Python's extensive package ecosystem to provide a well-tested solution for your needs.
In the next lesson, you'll use the Beautiful Soup package to parse the HTML soup that you gathered using requests.
Additional Resources
- Requests Documentation: Requests: HTTP for Humans™
Summary: Python Web Scraping
- Python's
requestslibrary allows you to fetch the HTML content of a static website with a single line of code - The
requestslibrary is one part of your Python web scraper - The
requests.models.Response()object gives you access to the HTML of the page - The
.textattribute of the response object provides the HTML of the page as a string