AutomatePython/11-web/downloadXkcd.py at patch-1 · GKPython/AutomatePython

37 lines (30 loc) · 1.12 KB

# downloadXkcd.py - Downloads every single XKCD comic.
import requests, os, bs4
url = 'https://xkcd.com' # starting url
os.makedirs('xkcd', exist_ok=True) # store comics in ./xkcd
while not url.endswith('#'):
    # Download the page.
    print('Downloading page %s...' % url)
    res = requests.get(url)
    res.raise_for_status()
    soup = bs4.BeautifulSoup(res.text, 'html.parser')
    # Find the URL of the comic image.
    comicElem = soup.select('#comic img')
    if comicElem == []:
        print('Could not find comic image.')
        comicUrl = comicElem[0].get('src')
        # Download the image.
        print('Downloading image %s...' % (comicUrl))
        res = requests.get(str('http:' + comicUrl))
        res.raise_for_status()
        # Save the image to ./xkcd
        imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
        for chunk in res.iter_content(100000):
            imageFile.write(chunk)
        imageFile.close()
    # Get the Prev button's url.
    prevLink = soup.select('a[rel="prev"]')[0]
    url = 'http://xkcd.com' + prevLink.get('href')
print('Done.')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

downloadXkcd.py

Latest commit

History

downloadXkcd.py

File metadata and controls