Scraping Quotes using Python

Last Updated : 18 Feb, 2026

Web scraping is the process of automatically extracting data from websites and converting it into a structured format such as tables or files. In this article, we will learn how to scrape quotes from a website using Python libraries like Requests and BeautifulSoup and store the extracted data in a DataFrame for analysis.

Prerequisites

Installed the following Python libraries:

Python
pip install requests beautifulsoup4 pandas tqdm

Implementation

Step 1: Import Required Libraries and Connect to the Website

We first import the required libraries such as requests and beautifulSoup. It sends a request to the website.

  • requests.get(): fetches the webpage.
  • BeautifulSoup(): parses the HTML content so we can extract data.

website used in this article: https://quotes.toscrape.com/

Python
import requests
from bs4 import BeautifulSoup

link = 'https://quotes.toscrape.com/'
res = requests.get(link)
soup = BeautifulSoup(res.text, 'html.parser')

Step 2: Extract All Quotes Text

Now, we extract all the quote texts present on the page.

Python
quotes = []

for quote in soup.find_all('span', class_='text'):
    quotes.append(quote.text[1:-1])
    print(quote.text[1:-1], "\n")

Step 3: Extract Author Names

Finds all author names and stores them inside a list.

Python
authors = []
for i in soup.find_all('small', class_='author'):
    authors.append(i.text)

In this step, we extract all information related to each quote. Extracts quote text, Extracts author name, Extracts author profile link, extracts all tags, Prints everything for verification.

Python
for sp in soup.find_all('div', class_='quote'):
    quote = sp.find('span', class_='text').text[1:-1]
    authors = sp.find('small', class_='author').text
    details = sp.find('a').get('href')
    tags = []
    for tag in sp.find_all('a', class_='tag'):
        tags.append(tag.text)
    tags = ','.join(tags)
    print(quote)
    print(authors)
    print(details)
    print(tags)
    print("*" * 127)

Output:

Step 5: Store Extracted Data in a List

We now store all extracted values together so they can be converted into a table later.

Python
data = []
for sp in soup.find_all('div', class_='quote'):
    quote = sp.find('span', class_='text').text[1:-1]
    authors = sp.find('small', class_='author').text
    details = sp.find('a').get('href')
    tags = []
    for tag in sp.find_all('a', class_='tag'):
        tags.append(tag.text)
    tags = ','.join(tags)
    data.append([quote, authors, details, tags])

Step 6: Collect author elements

This step collects all author HTML elements from the page, which can be useful for further inspection or advanced data extraction.

Python
authors_1 = []
for i in soup.find_all('small', class_='author'):
    authors_1.append(i)
authors_1

Output:

q1
Output

Step 7: Extract tags

Here, we extract all tags associated with a quote to understand the themes or categories linked to it.

Python
tags = []
for tag in sp.find_all('a', class_='tag'):
    tags.append(tag.text)

tags = ','.join(tags)

Step 8: Extract only quote text

This step focuses on isolating just the quote text from the HTML structure for clean and direct use.

Python
for sp in soup.find_all('div', class_='quote'):
    quote = sp.find('span', class_='text').text[1:-1]
quote

Output:

A day without sunshine is like, you know, night.

Step 9: Convert data into a DataFrame

Converts scraped data into a table. Makes it easier to analyze and store.

Python
import pandas as pd
df = pd.DataFrame(data, columns=['Quote', 'Author', 'details', 'Tags'])
df.head()

Output:

f4
Output

Step 10: Scrape multiple pages

In this step, we automate the scraping process across multiple pages to gather a larger and more complete dataset.

Python
from tqdm import tqdm
Multiple_Pages = []
for page in tqdm(range(1, 11)):

    link = ('http://quotes.toscrape.com/page/' + str(page))
    res = requests.get(link)
    soup = BeautifulSoup(res.text, 'html.parser')

    for sp in soup.find_all('div', class_='quote'):

        quote = sp.find('span', class_='text').text[1:-1]
        authors = sp.find('small', class_='author').text
        details = sp.find('a').get('href')

        tags = []

        for tag in sp.find_all('a', class_='tag'):
            tags.append(tag.text)

        tags = ','.join(tags)
        Multiple_Pages.append([quote, authors, details, tags])

Step 11: Create final DataFrame

Converts all scraped pages into a DataFrame. Renames columns and builds full author profile URLs.

Python
Multiple_Pages_df = pd.DataFrame(data=Multiple_Pages)
Multiple_Pages_df = Multiple_Pages_df.rename(
    columns={0: 'Quote', 1: 'Author', 2: 'Author_id', 3: 'Tags'})
Multiple_Pages_df['Author_Link'] = 'http://quotes.toscrape.com/' + \
    Multiple_Pages_df['Author_id']
Multiple_Pages_df.head()

Output:

f5
Output

Now we have created a dataframe and it can be further used for analysis and model making.

Comment
Article Tags: