Navigate into the folder where you've saved your web scraper script. What's the name of the file you used? In this lesson, I'll assume that it's named rescrape.py.
How to Write a Unittest
Create a new Python file in the same folder and name it appropriately, following the testing guidelines described in the previous lessons. For example, in this case, it could be test_rescrape.py.
1. Set Basic Structure
It's time to set up the basic unittest test structure in that file:
import unittest
import rescrape
class TestRescrape(unittest.TestCase):
pass
if __name__ == "__main__":
unittest.main()
2. Check for Errors
Great, the basic structure is done. Make sure that the import statements work and that there's no SyntaxError anywhere by executing your tests:
python3 -m unittest test_rescrape.py
If you run the test suite with the basic structure set up but no test cases in your class yet, you should see this response:
----------------------------------------------------------------------
Ran 0 tests in 0.000s
OK
No dots, but also no complaints. Everything is OK and following the plan so far :)
3. Write the Pseudocode
Now you can transfer your notes on what you might want to test in your web scraper into the testing file as pseudocode:
import unittest
import rescrape
class TestRescrape(unittest.TestCase):
# requests can establish a connection and receive a valid response
# the response contains HTML code
# the HTML can be successfully converted to a Beautiful Soup object
# can identify all links from the index page
# can identify the author of a recipe
# can get the main recipe text
pass
if __name__ == "__main__":
unittest.main()
These are a couple of ideas for the functionality of your scraper that you might want to test. Keep in mind that your pseudocode doesn't have to be the same, and it doesn't have to be complete and in order. These are notes to help you think and structure the code you'll write.
Info: You'll start by focusing on the first idea, which is why the following code blocks won't show the rest of the pseudocode anymore.
With your code comments and ideas in place, you can start to tackle the first one.
4. Write a Test Case
You'll want to make sure that your web scraping code can connect to the server and receive a valid response. Now, you'll try to encode this task into a test:
import unittest
import rescrape
class TestRescrape(unittest.TestCase):
# requests can establish a connection and receive a valid response
def test_get_valid_html_response(self):
BASE_URL = "https://codingnomads.github.io/recipes/"
index_page = rescrape.get_page_content(BASE_URL)
self.assertEqual(index_page.status_code, 200)
You wrote your first test case, and you named it descriptively as test_get_valid_html_response(). Inside this method, you wrote three lines of code:
- You defined a
BASE_URLvariable that holds the URL of the recipe collection's index page. - You called a non-existing function from your
rescrape.pyscript,get_page_content(), and passed it theBASE_URLas an argument. You also saved the return value of this imaginary function to the variableindex_page. - Finally, you called
self.assertEqual()to check whether the.status_codeattribute of yourindex_pagevariable is equal to the HTTP success code200.
5. Run Your Test
There are quite some assumptions you've made with this test case, and now it's time to check up on these assumptions by running your test suite:
python3 -m unittest test_rescrape.py
And because Python tries to be a good friend to you, it'll right away let you know that something's not quite right:
E
======================================================================
ERROR: test_get_valid_html_response (__main__.TestRescrape)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_rescrape.py", line 9, in test_get_valid_html_response
index_page = rescrape.get_page_content(BASE_URL)
AttributeError: module 'rescrape' has no attribute 'get_page_content'
----------------------------------------------------------------------
Ran 1 test in 0.000s
FAILED (errors=1)
Debug Errors
unittest presents you with an error (E). This is even before a failed test, which would be represented with a capital F. Python attempted to import the function you mentioned from rescrape.py, but it wasn't there:
AttributeError: module 'rescrape' has no attribute 'get_page_content'
It's not there, because you haven't written it yet. From a test-driven development perspective, you've received exactly the message you expected.
You start with failing tests, then you build or refactor your production code until the test passes. By writing your test method test_get_valid_html_response(), you created a blueprint for the function you'll build in your scraper code in the next lesson.
Summary: Write a Python Unit Test
- There are several steps to writing a
unittest - The initial tests are supposed to fail
- From a failed test, you build or refactor your production code until the test passes
Steps to Write a Unit Test
- Set basic structure
- Check for errors
- Write the pseudocode
- Write a test case
- Run your test