Mehrere PDF Dateien mit einem Python Skript herunterladen

Möchte man mehrere PDFs schnell von einer Website herunterladen, so kann man in Abhängigkeit der Anzahl der PDF Dokument sehr lange mit dieser Tätigkeit beschäftigt sein. Es ist daher nur von Vorteil sich mit einem Python Skript, die Aufgabe zu erleichtern 😃🐍

Mit der Python Bibliothek request kann man auf den Websiteinthalt zugreifen und mithilfe von beautifulsoup sowie wget lassen sich die PDF Dateien innerhalb weniger Sekunden herunterladen, das Skript sieht wie folgt aus:

import requests
from bs4 import BeautifulSoup as soup
import os

# Define Website to Download pdf
url = 'website to download pdfs'

# Get Website content
r = requests.get(url)

# Create soup object of requests object
soup = soup(r.text, 'html.parser')

# Loop through all elements of the website with the tag a
for link in soup.find_all('a'):
    # Download pdf if the name pdf is in the hyperlink and
    # is not a None Object
    if link.get('href') is not None and '.pdf' in link.get('href'):
        # Download pdf with wget
        os.system('wget '+ link.get('href'))

Das Skript befindet sich auch auf Github als Gist here.