6)(Binary) Trees:

(Binary) Trees:

  • A tree is in part defined as a node (the root), which holds some value, together with references to other nodes (the children, themselves trees). Because this also defines a directed graph, these conditions are added:
    • at most 1 reference can point to any given node (i.e., a node has no more than 1 parent)
    • and no node in the tree points toward the root.
  • As such, a tree is really just a special case of a directed graph (digraph):
    • The graph definition: a connected (directed in this context, but undirected in, e.g., the context of a spanning tree) graph with no cycles.
      Formally, directed means that the tree is a rooted tree.
  • Binary trees (branching factor = 2) are recursively built from nodes that have pointers to a left and right child.
  • Binary trees aren't necessarily binary search trees (BSTs), which require that the left children are less than or equal to the current node which is less than all the right nodes.
  • Traversals:
    • Depth-first traversal (DFS) prefixes are in reference to when the root/current node is visited. So pre-order traversal means first root, then left children, then right; post-order means first left, then right, then root. In-order traversal will give a sorted list if it's a valid BST. Make sure to recurse fully when traversing.
      • DFS generally will hit leaves first/faster.
      • In-order traversal pseudocode is simple:
        • inorder(left)
        • visit(root)
        • inorder(right)
      • Same for the other traversals:
        • visit(root)
        • preorder(left)
        • preorder(right)
    • Only one kind of breadth-first traversal—the level order traversal. This traversal visits nodes by levels from top to bottom and from left to right. Uses a queue.
    • To choose between the two, we want to use DFS if we need to hit a leaf fast and BFS if we're more concerned with a "middle" level closer to the root. But if we need to print all leaf nodes, then by necessity they're the same.
  • Balanced trees are better. It also doesn't mean perfectly balanced.
  • Depth and height:
    • A node's depth = the number of edges from the node to the tree's root node. A root node will have a depth of 0.
    • A node's height = the number of edges on the longest path from the node to a leaf. A leaf (root?) node will have a height of 0.
    • The depth and height of a tree are equal.
  • Full and complete is a tree where all leaves are at the bottom and each non-leaf node has exactly 2 children (sometimes called a perfect tree as in http://www.cs.gettysburg.edu/~ilinkin/courses/Fall-2012/cs216/notes/bintree.pdf).
    Rare because then must have exactly (2^n) - 1 nodes where n is the number of levels, or (2^ (h+1)) - 1 where h is the height.
    • Full = every node other than the leaves has 2 children. I.e., every node either has 0 or 2 children.
    • Complete = every level (except maybe the last) is completely filled and all nodes are as far left as possible. I.e., the tree is as compact as possible.
    • In a perfect tree, h = O(log n) because h =  log2 (n + 1) − 1 and we drop the less significant terms.
  • Algo tips:
    • For DFS, use a stack as always.
    • To validate BST, in-order traversal and comparing to sorted version is O(N lg N)*, but the better O(N) way is to recurse down the tree and set floor and ceiling conditions.
      • *Actually we can check if a list is sorted in O(N) time (we don't need to resort) so same thing.
    • For lowest common ancestor (LCA), the LCA is just the first node that is in between the keys I'm looking for. E.g., if a root node is greater than both key nodes, then I know the LCA has to be on the left side of the tree.
  • In-order successor:
    • Important in a lot of binary tree operations.
    • The next node in an in-order traversal; i.e., the node with the smallest key that's still greater than the current node's key.
    • Pseudo-pseudocode:
      • If current node has a right child, go to it then as far left as possible. Must be smallest key in right subtree.
      • Otherwise, must travel up the tree (its left child / subtree is by definition smaller, i.e., can't be successor):
        • If current node is the left child of its parent, then the parent must be the successor.
        • Otherwise, keep going up until escape the right subtree and hit the middle. I.e., when the current node is a left child, its parent must be the in-order successor. We're looking for its ("upper-right") parent.
  • Heaps:
    • A natural implementation of the priority queue ADT (the other being a balanced binary tree).
      • The highest priority element (whether min or max) recursively sits at the top of the heap.
      • Works by maintaining a partial ordering, so less expensive than fully sorted, but more valuable than no ordering.
    • A complete tree where the highest (or lowest) priority element is always stored at the root—hence a heap. A max heap is one where a node is always greater than or equal to its children; a min heap is one where a node is always lesser than or equal to its children.
      • Are used to implement priority queues—queues where the ordering is based on priority, not on position in queue—and to do heap sort.
      • There is no implied ordering within a level (that is, no ordering between siblings) and so a heap is not a sorted structure.
    • Insertion and deletion (need to respect the heap ordering property) are both O(log N) where N is the number of nodes.
    • (Binary) heaps are implemented in arrays.
      • Note that any binary tree can be so implemented, but that it makes particular sense for heaps, because heaps are guaranteed to be complete trees and thus the array implementation will be compact.
      • If a non-complete tree is implemented in an array, then there will be empty slots because a node may have only 1 child and the tree may go on quite a bit deeper than that level.
      • Saves space because can use array position to implicitly maintain the ordering property, instead of storing pointers.
    • In the array implementation:
      • Let n be the number of elements in the heap and i be an arbitrary valid index of the array storing the heap. If the tree root is at index 0, with valid indices 0 through n − 1, then each element a at index i has:
        • children at indices 2i + 1 and 2i + 2
        • and its parent at floor((i − 1) ∕ 2).
      • Alternatively, if the tree root is at index 1, with valid indices 1 through n, then each element a at index i has:
        1-basing sacrifices a tiny amount of space to simplify index arithmetic.
        • children at indices 2i and 2i +1
        • and its parent at index floor(i ∕ 2).
    • Operations:
      • Both insert and remove — remove only removes the root since that's all we care about — are done to an end of the heap to maintain the compact shape. Then, to modify the ordering property, the heap is traversed (respectively, up-heap / sift-up and down-heap / sift-down). Both operations take O(log N) time.
      • Up-heap (or sift-up) is used, e.g., to restore the heap ordering when a new element is added to the end/bottom:
        • Compare the added element with its parent; if they are in the correct order, stop.
        • If not, swap the element with its parent and return to the previous step.
      • Down-heap (or sift-down) is used, e.g., to restore the heap ordering when the root is deleted/removed and replaced with the last element (on the last level):
        • Compare the new root with its children; if they are in the correct order, stop.
        • If not, swap the element with one of its children and return to the previous step (for the newly ordered subtree). (Swap with its smaller child in a min-heap and its larger child in a max-heap.)
          So basically, swap with higher priority child.
      • Heapify, given a complete binary tree, turns the data structure into a heap:
        • We want the heap property to obtain, so we need to ensure that each parent node is greater (or lesser, if min heap) than its children.
        • Starting at the last parent node (take the last node and use index arithmetic to figure out the last parent), call down-heap on it. This will make the subtree rooted at this parent node a heap.
        • Continue on for all the other parent nodes.
        • Where the heap is 1-based implemented:
          Mostly from visualgo.
          • for (i = arr.length / 2; i >= 1; i--)
            • DownHeap(arr[i])
    • Heapsort just entails copying elements to be sorted into an array, heapifying it, then taking the max (root) out each time and putting it at the end of the result array.
      • This can be done in-place (in the same array) where one section is for the heap and the other for the sorted list. Thus, each iteration of the algorithm will reduce the heap by 1 and increase the sorted list by 1.
      • Heapsort is similar to insertion sort in that each step of the algorithm takes the max element from the unsorted section and moves it to the sorted section.
    • Note that heaps are not binary search trees, so we can't efficiently find a given key using binary search.
      • This is because the order property only works between parent and children, not between siblings.
  • Self-balancing binary trees:
    • Binary trees have performance proportional to their height, so keeping the height small maximizes performance. In the worst case, a tree approximates a linked list; a maximally unbalanced tree is just a linked list — where the height is 1 less than the number of nodes.
      • Specifically, a balanced tree has performance O(log n) (think binary search), but an unbalanced tree has performance O(n) (think linked list).
    • Self-balancing trees thus guarantee balanced-ness and consequently efficient performance.
    • Red-black trees:
      http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_rbtree.aspx
      • Guarantees searching, insertion, and deletion in O(log n).
      • Each node has a color, red or black, associated with it. How the tree is "painted" ensures balance. When modified, the tree is re-arranged and re-painted to keep the necessary color properties.
      • Originally an abstraction of symmetric binary B-trees, also called 2-3-4 or 2-4 trees. Red-black trees are really isomorphic to 2-4 trees; i.e., they are equivalent data structures.
        https://www.cs.princeton.edu/~rs/AlgsDS07/09BalancedTrees.pdf https://www.wikiwand.com/en/2%E2%80%933%E2%80%934_tree
        • In a 2-4 tree, there are 3 kinds of nodes:
          • 2-node: 1 key and (if internal etc.) 2 children
          • 3-node: 2 keys and 3 children
          • 4-node: 3 keys and 4 children
        • All leaves are at the same depth, the bottom level. So, every path from root to leaf is the same length ensuring perfect balance.
      • Leaf nodes have no data.
      • Red-black trees have the following properties:
        https://www.topcoder.com/community/data-science/data-science-tutorials/an-introduction-to-binary-search-and-red-black-trees/
        • 1) every node is either red or black;
        • 2) all leaves are black;
          Leaves can either be explicit nodes with no data or just NIL nodes.
        • 3) if a node is red, both its children are black; and
        • 4) every path from a given node n to a descendant leaf has the same number of black nodes (not counting node n). We call this number the black height of n, which is denoted by bh(n).
      • These explicit properties give rise to this critical property: the path from the root to the farthest leaf is no more than twice as long as the path from the root to the nearest leaf, ensuring that the tree is roughly height-balanced.
      • This limits the height of any red-black tree, ensuring that even in the worst case, search, insert, and deletion are still efficient.
      • The balance property is guaranteed through the combination of properties 3 and 4:
        • For a red–black tree T, let B be the number of black nodes in property 4 (since every path has the same number of black nodes). Let the shortest possible path from the root of T to any leaf consist of B black nodes. Longer possible paths may be constructed by inserting red nodes. However, property 3 makes it impossible to insert more than 1 consecutive red node. Therefore, ignoring any black NIL leaves, the longest possible path consists of 2*B nodes, alternating black and red (this is the worst case).
        • The shortest possible path has all black nodes, and the longest possible path alternates between red and black nodes. Since all maximal paths have the same number of black nodes, by property 4, this shows that no path is more than twice as long as any other path.
      • A red-black tree can be converted to a 2-4 tree by remembering that red nodes are part of ("horizontally" linked within) a logical 2-4 B-tree node and black nodes separate the logical 2-4 nodes with normal vertical links:
        • So to convert a red-black tree to a 2-4 form, just move the red nodes up so that they become horizontally linked with their parent black node to form a logical 2-4 node.
        • All leaf nodes will be at the same depth.
        • See http://www.wikiwand.com/en/Red%E2%80%93black_tree for an example.
      • Search and other read-only operations are the same as those used to BST. However, insertion and deletion are much more complex, because they may violate red-black properties and then must be fixed with color changes and rotations.
        These fixes are quite quick in practice. E.g., the color changes are amortized O(1).
        • A tree rotation is a binary tree operation that changes the structure without affecting the order of the elements. That is, the in-order traversal is the same pre- and post-rotation.
        • http://www.wikiwand.com/en/Tree_rotation
        • Essentially, a right rotation (the root moves right) on a root node X has X become the right child of the new root, which will be X's former left child.
        • Order in the descendants is maintained through realizing that the right child of a left child of a root node can become the left child of the root without violating any binary tree constraints.
      • Insertion and deletion have a number of cases that must be handled.
    • AVL trees:
      http://www.wikiwand.com/en/AVL_tree
      • More rigidly balanced than red-black trees, leading to slower insertion and removal, but faster search. So better for read-heavy use cases.
      • Critical property: the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing (with tree rotations) is done to restore this property.
      • Retrieval, insertion, and deletion, as with red-black trees, are O(log n) in the worst case.
  • A trie (or radix tree or prefix tree) is a n-ary tree variant in which characters are stored at each node, such that each path down the tree may represent a word:
    • Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated.
    • All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string.
    • https://www.wikiwand.com/en/Trie
    • Note that tries do not have to be keyed by character strings; other ordered lists can work too, e.g., permutations on a list of digits.
    • Can just implement as a dictionary of (nested) dictionaries . Or of course using classes and OO. E.g. for dicts:
      • root = {}
      • for word in strs:
        • node = root
        • for char in word:
          • if char not in node:
            • node[char] = {}
          • node = node[char]
        • # mark end of word
        • node[''] = 1
      • node = root
      • prefix = ''
      • # if there's more than 1 key, then divergence / not common ancestor
      • while len(node.keys()) == 1:
        • # if reached the end of any word, the LCA can't be longer
        • if '' in node: return prefix
        • char = list(node.keys())[0]
        • prefix += char
        • node = node[char]
      • return prefix
  • Links:
    • http://www.wikiwand.com/en/Tree_(data_structure)
    • this is a pretty good general reference: http://www.cs.cmu.edu/~adamchik/15-121/lectures/Trees/trees.html.
darkmode