| page_type | sample | ||
|---|---|---|---|
| languages |
|
||
| products |
|
||
| name |
|
||
| description |
|
This sample shows how to crawl a website via a Python Azure Function using BeautifulSoup4 and extract specific information for manipulation/storage.
-
Download Python 3.10 or higher.
-
Download Azure Function Core Tools.
-
An Azure Subscription.
-
Visual Studio Code.
-
For Local Testing:
- Azurite Extension on VS Code.
- Postman.
- git clone https://github.com/Azure-Samples/functions-python-web-crawler.git
- Open the project folder from Visual Studio Code.
Local Testing:
-
If not created, create local.settings.json.
-
In local.settings.json, add the following:
{ "IsEncrypted": false, "Values": { "AzureWebJobsStorage": "UseDevelopmentStorage=true", "FUNCTIONS_WORKER_RUNTIME": "python", "AzureWebJobsFeatureFlags": "EnableWorkerIndexing" } } -
If not installed, download the "Azurite" extension on Visual Studio Code. Info: Azurite is an open-source emulator providing a free local environment for testing Azure storage applications.
-
At the top of the Visual Studio Code menu, select: View -> Command Palette... -> Azurite: Start
-
At the top of the Visual Studio Code menu, select: Run -> Start Debugging OR Start Without Debugging
-
If not installed, download Postman.
-
On Postman:
- Set the request type to "POST"
- Paste the following to the URL Request: http://localhost:7071/api/search_site
- Select "Body" -> "raw"
- Paste the following to the JSON body:
{ "url": "{ANY_URL_YOU_WANT_TO_TRY}" }Sample URLs to try:
Azure Testing:
-
If not installed, download the "Azure" extension on Visual Studio Code.
-
Click the "Azure" extension and authenticate using your Azure credentials.
-
Hover the mouse over "WORKSPACE Local", and click on the Azure Function icon (a yellow Lightning bolt between blue arrows) and select "Create Function App in Azure..."
-
Follow the prompts: Select your Azure Subscription, name your Function App, select a runtime (ex. Python 3.11).
-
When the Function App is provisioned, hover the mouse over "WORKSPACE Local", and click "Deploy to Function App..." -> Select your Subscription and Function App resource.
-
After your code is deployed, navigate to the Function App.
-
Select the function name "search_site" under "Functions" in the main Azure Function blade.
-
Select "Get Function URL" and copy the link.
-
On Postman:
- Set the request type to "POST"
- Paste the link you copied to the URL Request
- Select "Body" -> "raw"
- Paste the following to the JSON body:
{ "url": "{ANY_URL_YOU_WANT_TO_TRY}" }Sample URLs to try:
Additional information:
- Python Azure Functions:
- Azure Functions Python Developer Guide:
- BeautifulSoup4:
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.