Website-FAQ-Generator/FetchWeb.py at main · TheCodingEnthusiast/Website-FAQ-Generator

15 lines (14 loc) · 786 Bytes

from bs4 import BeautifulSoup
import requests
def fetch_website_content(url):
    """Fetch and clean text content from a website to replicate Ctrl+A and copy behavior."""
    response = requests.get(url)
    if response.status_code != 200:
        raise Exception(f"Failed to fetch the website. Status code: {response.status_code}")
    soup = BeautifulSoup(response.text, "html.parser")
    # Extract all visible text as if copying directly from the browser
    for script in soup(["script", "style", "noscript"]):
        script.extract()  # Remove script, style, and noscript tags
    text = soup.get_text(separator="\n")  # Get all text with line breaks
    text = "\n".join([line.strip() for line in text.splitlines() if line.strip()])  # Remove excess blank lines
    return text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

FetchWeb.py

Latest commit

History

FetchWeb.py

File metadata and controls