2) Data Structures Lesson

Binary Search Tree in Python

15 min to complete · By Jon Fincher

It's time to dig into inserting, finding, and removing data in a BST in Python.

Inserting Data

Unlike adding data to a linked list, you don't need to pick where to insert data in a tree. Adding data always starts at the root of the tree:

def insert(self, data):
    """
    Inserts data into an existing BST

    :param data The data item to add to the tree
    """
    if not self.root:
        self.root = Node(data)

    else:
        __insert(data, self.root)

This public .insert() method first checks if there is a root node already created -- if not, it creates one and adds the data to it. Otherwise, it calls a second, overloaded, .__insert() method, which is recursive in nature. Why? You'll see in a moment:

def __insert(self, data, subtree):
    """
    Recursive Helper Function for insert()
    Inserts data into the subtree specified as root

    :param data The data item to add
    :param subtree The root of the current subtree to explore
    :return The updated subtree
    """
    if not subtree:
        subtree = Node(data)

    # Should you explore the left subtree?
    if data < subtree.data:
        # Recursively call __insert() on the left subtree
        subtree.left_child = self.__insert(data, subtree.left_child)

    # Should you explore the right subtree?
    elif data > subtree.data:
        # Recursively call __insert() on the right subtree
        subtree.right_child = self.__insert(data.subtree.left_child)

    # If you get here, you've either found duplicate data
    # Or the other recursive calls are done
    # In either case, you can return the current subtree

    return subtree

This recursive dunder method first checks if you are inserting data into an empty sub-tree. If so, it creates a new Node object with the data provided.

It then checks to see if data is less than or greater than the data in subtree.

  • If it's less, it tries to insert the data in the left child sub-tree.
  • If it's greater, it tries to insert the data in the right child sub-tree.
  • If it's equal (neither less nor greater), then there's no need to do anything -- it's a duplicate, or it's the node you just created. In either case, you can return the current node, which connects to the tree as the recursive calls unwind.

So why use a hidden recursive .__insert() method? There are three reasons:

  1. The nature of a binary search tree lends itself well to recursive methods. Remember that each node in a tree is also the root of another tree, called a sub-tree. This means you can treat each node the same because their structures are the same.

  2. The non-recursive method actually does some filtering for you. If the tree is empty, then there is no need to look at all the nodes in the tree at all. You create a new root node and populate it.

  3. The user of your tree doesn't need to know the specifics of how to work with the tree. They just want to add data to the tree and shouldn't need to specify the root of the tree. The non-recursive method enables this, leaving the details of the recursion and returning Node objects to the recursive dunder method.

Now that you've added data to the tree, you should know how to find it again.

Searching for Data

Searching a binary tree for data is similar to inserting data without the need to create new nodes. Here's what that looks like:

def search(self, data):
    """
    Searches a BST for a piece of data

    :param data The data to search for
    :return True if the data is found, False otherwise
    """
    if not self.root:
        return False

    return self.__search(data, self.root)

def __search(self, data, subtree):
    """
    Searches a given subtree for a piece of data

    :param data The data to find in the subtree
    :param subtree The root of a subtree to search
    :return True if the data is found, False otherwise
    """

    # If the subtree is empty, then you didn't find anything
    if not subtree:
        return False

    # Is the data here?
    if data == subtree.data:
        return True

    # Where to search next?
    if data < subtree.data:
        return self.__search(data, subtree.left_child)
    elif data > subtree.data:
        return self.__search(data, subtree.right_child)

    # If you're here, something's gone wrong, so return False

    return False

As with .insert(), you have two methods for searching -- one which the user calls with the data they want to find, and a second recursive dunder method which does the actual work. The recursive method ends when either the data item is found or there are no more nodes to search.

The technique used by both .insert() and .search() is called a tree traversal. Like traversing a linked list, traversing a tree starts at the root and then visits each node to locate a specific data item. Unlike a linked list, traversing a BST may not visit every node during a search or insert operation because of the criteria used to use either the left or right branches. However, there are tree traversals that will visit every node.

Deleting Data

By now, you may expect that deleting data from a BST requires two overloaded methods. Please consider your expectations met:

def delete(self, data):
    """
    Delete data from a BST

    :param data The data to delete, if it exists
    """

    if self.root:
        self.root = self.__delete(data, self.root)

def __delete(self, data, subtree):
    """
    Delete data from a given subtree, if it exists

    :param data The data to delete
    :param subtree The subtree to search for the data
    :return The subtree with the data removed
    """

    # If ths subtree is Nothing, you can just return Nothin
    if not subtree:
        return Nothing

    # Should you look to the left or right?
    if data < subtree.data:
        subtree.left_child = self.__delete(data, subtree.left_child)

    elif data > subtree.data:
        subtree.right_child = self.__delete(data, subtree.right_child)

    # This is the data you want to remove
    elif data == subtree.data:

        # How many children does this node have?
        # Zero children -- you can just remove the node
        if not subtree.left_child and not subtree.right_child:
            return Nothing

        # One child -- you can replace it with the other child
        elif not subtree.left_child:
            return subtree.right_child

        elif not subtree.right_child:
            return subtree.left_child

        # You have two children -- find the node to promote
        else:
            # Find the smallest node which is larger than the left child
            # This will be the leftmost child of the right subtree3
            smallest = subtree.right_child
            while smallest.left_child:
                smallest = smallest.left_child

            # Swap this value with the subtree root
            swap = smallest.data
            smallest.data = subtree.data
            subtree.data = swap

            # Now you can remove the node from the right subtree
            subtree.right_child = self.__delete(data, subtree.right_child)

    # If you're here, all the recursive calls are done. Return the subtree
    return subtree

Like deleting data from a linked list, deleting data from a BST involves reassigning references to remove nodes from the tree. This is more complicated in a tree because each node can have zero, one, or two references. Simply reassigning a left_child or right_child may remove more of the tree than expected.

When you have found the node you want to remove, you need to figure out how many children it has so you can do the right thing:

  • If the node has no children, it's called a leaf node. Since it has no children, you can simply remove it by returning Nothing.
  • If the node has a single child, you can remove it by returning the existing child. This is similar to the linked list case, where you skip over a node using node.next.next.
  • If the node has two children, you can't just bypass it. You need to find a node which can replace it.

You can see how this works with a diagram - consider the following tree:

Binary Search Tree

You want to delete the node labeled 31, which has two children. You must find another node to replace it that satisfies the requirements of the BST. Because you are working with the sub-tree starting at 31, your choices are limited to the children of that sub-tree. So, how do you find another node to replace 31?

In fact, there are two such nodes:

  • Find the largest node in the left sub-tree, which is the largest number smaller than 31 (in this case, 28).
  • Find the smallest node in the right sub-tree, which is the smallest number larger than 31 (in this case, 39).

You do this by either:

  • Starting with the left sub-tree, follow the right_child links until you hit Nothing.
  • Starting with the right sub-tree, follow the left_child links until you hit Nothing.

The sample code follows the second path, starting with the right sub-tree and finding the smallest item in that tree. It swaps that value with the current node and then continues to search for and delete that node. Implementing the other path is left as a lab for the student.

Summary: Binary Search Tree Operations

This section covered how to insert, delete, search, and list all the data in a binary search tree in Python. You learned how to start from the root of a tree and, through recursion, add data to a tree, as well as find data in a tree. You learned some of the challenges of removing data from a tree and how to address them. You also learned how to traverse a tree to retrieve all the data in the tree.