Skip to content

Binary Search Trees

What is a binary search tree?

A Binary Search Tree (BST) is just a special type of Binary Tree that satisfies the following property:

The left subtree has smaller elements than the right subtree and it's parent. It's a sorted data structure.

  • Most library implementations don't allow duplicates in a BST.

Binary search trees are a 'workhorse' of many data structures and can be used to solve most problems fairly efficiently.

Complexity

BST node prototype

python
class BSTNode:
	def __init__(self, val=None, left=None, right=None):
		self.val, self.left, self.right = val, left, right

# Then build up the tree, like so...
root = BSTNode(1)
root.left = BSTNode(2)
root.right = BSTNode(3)
root.left.left = BSTNode(4)
root.right.left = BSTNode(5)
class BSTNode:
	def __init__(self, val=None, left=None, right=None):
		self.val, self.left, self.right = val, left, right

# Then build up the tree, like so...
root = BSTNode(1)
root.left = BSTNode(2)
root.right = BSTNode(3)
root.left.left = BSTNode(4)
root.right.left = BSTNode(5)

Searching is the most fundamental application of BSTs.

Tip
  • Avoid using mutable objects in a BST, if required, when updating, first remove the item, update it, and add it back
  • BSTs are good for recursive implementations
  • Some problems require both a hash table and a BST, where the hash table is used as a lookup to avoid traversing the whole tree
  • Use the sortedcontainers module, methods are shown below
    • SortedDict: pop(key), update(pairs), setdefault(key, default), get(key), keys(), items(), values()
    • SortedList: add(val), update(iterable), discard(value)

Problems

14.1 Test if a binary tree satisfies BST property

A few main alternatives work well here. This was a fairly normal problem. but the issue is that I don't fully understand the usage of a BST, I suppose this was mostly to determine whether a BST is valid, and consolidate the BST property, which itself was good since I was a little confused.

  • This problem can be solved using upper and lower bounds that you manage manually to resolve, in each recursive call, adding those values to the parameters of the recursive function.

After redo (a couple hours later), my solution to this problem is good and well-formed.

review this problem
Lessons Learned

In order traversal of a binary search tree visits keys in sorted order

  • Recall, Breadth first search BFS is done iteratively with a queue. Review the BT chapter for different methods of achieving this.
  • Don't be afraid to use additional entries in the BST node to encode additional information, such as upper and lower bounds.
  • Understanding recursion, especially for binary trees is about understanding the call stack, and how those interact. Don't forget that you need a stopping condition when using the function itself as the recursion candidate.

14.2 Find the first key greater than a given value in a BST

This was a tough problem, I seem to be having difficulties with the binary search trees. Though the in-order traversal does work, it's clunky and not optimzed.

review this problem
Lessons Learned
  • You can rely on the Noneness of leaves to terminate iterative searches through the tree
  • Remember that subtrees that you don't have to visit can be discarded, so you can actually move your subtree tracker through the tree-chain with calls like subtree = subtree.right so that it iteratively breaks down the search space.

14.3 Find the k largest elements in a BST

Again, a solid brute force solution was obtained. A small tweak that would have improved is to use a capacity-queue such that only k-elements were ever present in the queue, evicting the lowest ones as new ones are added. The better solution is clear and I was getting close, the key really came with the hint, which made me realize I needed to do reverse in-order traversal.

Lessons Learned (some good ones here)
  1. I learned a new, better way to write tree-traversals that allow one to terminate before the end of the recursion. The key is to set an if statement at the top of the function that wil just return None is it isn't satisfied, e.g., len(maxs) < k
python
def inorder_traverse1(tree):
    if len(maxs) < k:
        if tree.right:
            traverse(tree.right)
        maxs.append(tree.val)
        if tree.left:
            traverse(tree.left)

def inorder_traversal2(tree):
    if tree and len(maxs) < k:
        traverse(tree.right)
        # Following if isn't needed for full traversal
        if len(maxs) < k: # need this because we have to traverse back up the call-stack
            maxs.append(tree.val)
            traverse(tree.left)
def inorder_traverse1(tree):
    if len(maxs) < k:
        if tree.right:
            traverse(tree.right)
        maxs.append(tree.val)
        if tree.left:
            traverse(tree.left)

def inorder_traversal2(tree):
    if tree and len(maxs) < k:
        traverse(tree.right)
        # Following if isn't needed for full traversal
        if len(maxs) < k: # need this because we have to traverse back up the call-stack
            maxs.append(tree.val)
            traverse(tree.left)
  1. You could potentially write traversals iteratively, you just have to write your own stack instead of letting the system manage the recursion stack itself.
  2. In the past, using traversals like this with a count didn't work, however, if you use something like a 1-element (or N-element) array or a hash table (something mutable, that is) then you can still write code like above and increase the counter by accessing the array entry.

EPI 14.4 OR LC 235 Lowest Common Ancestor (LCA) of a Binary Search Tree (BST)

Solution: Leverage the invariant implicit in the binary search tree property (e.g. that the elements of the left sub-tree are smaller than those of the right sub-tree) to evaluate whether both the target nodes are less than or greater than the current node, if neither, then the current node is the lowest common ancestor. Use a while True loop with simple conditions.

Tricks: Don't try and use recursion here. While it can be done fairly simply, the iterative approach is much easier since you don't have to manage the effects of returning values as you recurse back up the stack.