Thursday, December 29, 2011

Saving a Binary Search Tree to a File

Source

Describe an algorithm to save a Binary Search Tree (BST) to a file in terms of run-time and disk space complexity. You must be able to restore to the exact original BST using the saved format.


Hint:
Remember the big three tree traversal algorithm? Yes, it’s pre-order, in-order, and post-order traversal. Only one of them is useful for storing a BST.
Assume we have the following BST:
    _30_
   /    \   
  20    40
 /     /  \
10    35  50
In-order traversal
If we do an in-order traversal, we will output 10 20 30 35 40 50. There is no way of telling how the original BST structure looks like, as the following unbalanced BST is one of the possible BST sets from the output 10 20 30 35 40 50.
          _50
         /      
        40 
       /   
      35
     /
    30
   /
  20
 /
10
Post-order traversal:
If we do a post-order traversal, we will output the leaf nodes before its parent node. Specifically, we will output 10 20 35 50 40 30 using post-order traversal. Imagine we’re reading the nodes to construct the tree back. See the problem that we are reading the leaf nodes before its parent? There’s too much trouble to create the child nodes before its parent. Remember that the BST insertion algorithm works by traversing the tree from the root to the correct leaf to insert.
Pre-order traversal:
Pre-order traversal is the perfect algorithm for making a copy of a BST. The output of a pre-order traversal of the BST above is 30 20 10 40 35 50. Please note the following important observation:
A node’s parent is always output before itself.
Therefore, when we read the BST back from the file, we are always able to create the parent node before creating its child nodes. The code for writing a BST to a file is exactly the same as pre-order traversal.

A Binary Search Tree (BST) is useful for storing phone book records in a memory limited device, such as a cell phone. The records are always maintained in sorted order, inserting and deleting a record takes O(lg n) time (slower than linked list, but much better than array).
Reading a BST from a file:
The question now is, how would you reconstruct a BST back from the file? Assume that we have the following pre-order traversal output of the BST earlier: 30 20 10 40 35 50.
Hint:
When we encounter a new node to insert, we should always place it into the first empty space which it will fit using a pre-order traversal. How do we determine the first empty space which a node will fit in? We can use the ordinary BST insertion algorithm, but the run-time complexity will be O(n lg n), since inserting a node takes O(lg n) time. Not efficient enough!
Solution:
Remember my post: Determine if a Binary Tree is a Binary Search Tree (BST)?
We use the same idea here. We pass the valid range of the values from the parent node to its child nodes. When we are about to insert a node, we will check if the insert value is in the valid range. If it is, this is the right space to insert. If it is not, we will try the next empty space. Reconstructing the whole BST from a file will take only O(n) time.
void readBSTHelper(int min, int max, int &insertVal,
                   BinaryTree *&p, ifstream &fin) {
  if (insertVal > min && insertVal < max) {
    int val = insertVal;
    p = new BinaryTree(val);
    if (fin >> insertVal) {
      readBSTHelper(min, val, insertVal, p->left, fin);
      readBSTHelper(val, max, insertVal, p->right, fin);
    }
  }
}
 
void readBST(BinaryTree *&root, ifstream &fin) {
  int val;
  fin >> val;
  readBSTHelper(INT_MIN, INT_MAX, val, root, fin);
}
Further thoughts:
How about storing a binary tree (not BST) to a file? Does the algorithm above works? Why and why not?

No comments:

Post a Comment