4) Web Scraping Lesson

Developer Tools: Inspect Element

11 min to complete · By Martin Breuss

Before you can start to write an application to scrape a particular website, it's important that you understand the HTML structure of the website you'd like to scrape. A very quick and easy way to start this process is to "inspect" the website using common browser-based developer tools.

Use Your Browser

Whenever you get started to scrape information from a website, you always need to start by inspecting that page in your browser, which can be done in many browsers. Head over to the CodingNomads Recipe Collection and look at the page, click on the links, and get a feeling for the underlying structure of the site:

CodingNomads recipes index page

You can see that there is an index page with a title and a list of links that take you to individual pages, each containing the information of a single recipe:

CodingNomads recipes detail page

That's a common structure of many websites, where you have an overview page as well as many detail pages.

Illustration of a lighthouse

Tasks

  • What implications does this structure have for scraping the content? You probably don't know much about web scraping yet, so don't think too much about the technical implications. But what impact could that have conceptually?
  • Do you think you can scrape everything in one go?
  • What programming logic that you've learned might come in handy for this process?
  • Get out your notebook and write down some thoughts and ideas on how you could approach scraping this page structure.

Why Inspect a Page

Inspecting a website as a normal user gives you an idea of how the site was intended to be viewed. After all, the HTML on these pages was built for human eyes, and that is how you're currently consuming the content.

But it's not what you ultimately want to do. You want to build a bridge between the structure made for human eyes and translate it to something that you can handle programmatically using Python code. So, it's now time to look a level deeper into the site's structure while still staying within your browser.

Open Browser Developer Tools

Modern browsers all come with developer tools built-in, whether you're using Firefox, a Chromium-based browser, or Safari. Developer tools are an incredibly powerful way to learn more about any website that you're viewing with your browser right inside the browser.

How to Open Developer Tools

You can open your browser's developer tools through the browser's context menu or through keyboard shortcuts:

Context menu to open the browser developer tools in Brave

For example, in the Chromium-based Brave Browser, you can open it by clicking on View / Developer / Developer Tools, or through the keyboard shortcut + + i on MacOS. Your browser and operating system might have slightly different commands to open the browser developer tools, so make sure to find how to get there with a quick web search.

Once you've opened your browser's developer tools, you'll see something similar to this:

Chromium developer tools panel showing HTML and CSS

If you already have some experience with HTML and CSS, you'll recognize that you're viewing the code that makes up the website in the Elements tab of your developer tools.

Illustration of a lighthouse

Tasks

  • Explore the dev tools in your browser.
  • Click on the different tabs that it offers. Which of them can you already understand? Which of them looks like complete gibberish?
  • Focus on the Elements tab and explore the HTML of the page you're viewing. Use the small arrow symbols next to the code to expand and contract HTML sections.

This tool gives you insight into the structure of the website you're viewing. You can hover over lines of HTML code in the Elements tab and see them get highlighted in the normal view of your page:

Devtools showing a list item selected and highlighted on the page

This makes it possible to connect the HTML code that makes up the page with what it'll eventually display to a user of the page.

Illustration of a lighthouse

Note: You can also use toggle a mode that allows you to hover over and click elements on the page, and the dev tools will show you which line of HTML code the element responds to. Try to find the button that allows you to do that in your own tools.

Focus on this functionality for now. You won't need to know about the other tabs in your dev tools in order to be able to scrape content from a static website from the Internet.

Gather Structural Information

Get familiar with the Elements tab and how it allows you to pick out specific areas of the page and understand the HTML code that builds it.

Illustration of a lighthouse

Tasks

  • How are the links to the different detail pages constructed?
  • Also, navigate to the detail pages of each recipe.
  • Where can you find the name of the person who wrote the recipe?
  • Where is the actual text of the recipe located?

Use your browser's developer tools to understand the website you want to scrape in more detail. You'll want to find the elements from the page that you're interested in scraping and understand the HTML that makes them up.

Take notes about how they can be identified:

  • On what page URL can you find what information?
  • What HTML element is the relevant information wrapped in?
  • Does it have any HTML classes associated with it? Maybe a HTML id? The better you understand the underlying structure of the page that you want to scrape, and the clearer you are about what information you want to get from it, the better you'll be able to locate and scrape that information using your Python script.

Next, you'll write Python code to gather the information from the page you've been inspecting.

Summary: How to Inspect Element

  • The first step is to understand the structure of the page you want to scrape
  • Using the browser is the best way to understand the structure
  • The browser allows you to view the site as a normal user and inspect its structure using the developer tools
  • The browser's developer tools can be opened through the context menu or through a keyboard shortcut