This Python script performs web scraping on a website to extract links, emails, and WhatsApp links from the specified domain (stei.itb.ac.id). It uses the requests library to fetch web pages and BeautifulSoup for parsing HTML content.
-
Ensure you have the required libraries installed:
pip install requests beautifulsoup4
-
Modify the script to specify the target domain (
DOMAIN), home URL (HOME_URL), and other settings as needed. -
Run the script:
python script.py
-
The script will perform the following actions:
- Visit the home URL (
HOME_URL) and extract all links from the specified domain (DOMAIN). - Collect email addresses (
mailto:links) and WhatsApp links (api.whatsapp.com). - Save the extracted data to separate log files (
scrape-links-stei.log,scrape-email-stei.log,scrape-whatsapp-stei.log).
- Visit the home URL (
-
The script will recursively follow links within the specified domain to gather additional URLs.
-
The extracted links, emails, and WhatsApp links will be saved in their respective log files.
-
You can modify the
HOME_URL,DOMAIN,TIMEOUT, or other settings in the script to target different websites or adjust the scraping behavior. -
To specify a different starting URL, change the value of
HOME_URLin the script.
- Extracted links from the specified domain are saved in
scrape-links-stei.log. - Extracted email addresses are saved in
scrape-email-stei.log. - Extracted WhatsApp links are saved in
scrape-whatsapp-stei.log.
This script is provided under the MIT License.
Please adapt the script and README.md to your specific use case or requirements.