This is Part 2 of Binary Search Trees in Ruby.
Ruby’s native Hash implementation more or less follows the basic Hash table principle, wherein keys are hashed
(using Object#hash
) and stored in the appropriate ‘buckets’. In this exercise, we will look at an alternate implementation
of a hash-like key-value store using binary search trees.
The basic structure of the nodes will be as follows:
- We will use the existing
hash
method to generate integer values from objects. These will be the node values. - The key object itself will be stored in the node as a separate property.
- The value object will also be stored in the node.
The binary tree hash will be a simple drop-in replacement for Ruby’s Hash class. It won’t have enumerable methods, but
it will have square bracket notation for looking up and storing values, an implementation of fetch
, and a default_proc
feature.
1. Retrieving a value
We will use our implementation of the basic binary tree as a starting point (but renaming
the classes). The first major difference will be that the value
of the node will no longer be its address in the tree – instead, we use the
hashed key.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Ideally, we would like to be able to use this as a drop-in replacement for Hash
, so next let’s implement the square-bracket
notation used to retrieve a value. Also, why not write tests for this?
1 2 3 4 5 6 |
|
1 2 3 4 5 6 7 |
|
Done! Almost. lookup
will be a protected method very similar to the old include?
that we used for the basic binary tree. The
square bracket notation will simply be an interface that accepts the raw key, but lookup
will work with the hashed key (so
we don’t need to run the hashing algorithm for every node we traverse).
1 2 3 4 5 6 7 8 9 10 11 12 |
|
…and the test passes. Let’s try looking up a key that hasn’t been set:
1
|
|
In order for this test to pass, we will need to traverse the entire tree until we end up at an empty node.
That node should respond to the lookup
method with nil
:
1 2 3 4 5 6 7 |
|
2. Storing a value
So now let’s try inserting a key-value pair using the []=
notation:
1 2 3 4 5 6 7 8 9 |
|
Similar to the []
method, we’ll implement []=
as an interface to the protected method store
, which will accept the
hashed key and raw value:
1 2 3 4 5 6 7 |
|
There are two possibilities here: inserting a new value and overwriting an existing value. Both covered by our two new tests. We
can handle both of these in one store
method, but we need to change value
from attr_reader
to attr_accessor
so we
can update its contents:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
But now we have a problem – we’ve been passing the hashed key down the tree looking for the correct place to store it,
but when we get there, we don’t have the original key to pass to the constructor of HashNode
. In fact, we don’t even really
need it, because all lookups and stores operate on the hashed value anyway, but if we want to be able to inspect the tree,
it would be nice to see the keys we’ve assumed we were using. It’s not the nicest looking solution, but since we’re
operating with protected methods, I’m not very concerned about simply tacking another argument onto the end of store
, store_left
, and store_right
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
3. Fetch
Ok! We’re actually almost done. Let’s add a whole slew of tests and see how well this conforms to Ruby’s native Hash implementation:
1 2 3 4 5 6 7 8 9 10 11 |
|
Both of these tests pass without any further modifications to the HashNode
. However, these:
1 2 3 4 |
|
…use the powerful fetch
method, which we have not implemented. fetch
provides several ways of handling missing values in a hash:
specifying a default missing value, returning a value from a block, or raising a KeyError
. On its face, fetch
is a modified version
of the lookup
method, implemented as a series of guard clauses:
1 2 3 4 5 6 7 8 9 |
|
… and the tests pass.
4. default_proc
The last hash feature I use regularly is called the default_proc
, which stores a block that is called when a given key is
not present in the hash table. It is used like this:
1 2 |
|
Since our binary tree hash is a distributed recursive data structure, at first glance this presents a special challenge — each node will need to store the default proc and pass it on to new node instances as they are created. But that’s not actually true. The recursive nature of the structure means that lookups bubble up through the tree and are returned through the root node’s lookup function, meaning the proc only has to be stored at the root.
Let’s start with some tests. Our hash will automatically fire up some arrays for us when a new key is given:
1 2 3 4 5 6 7 |
|
In Ruby’s hash implementation, the default proc is only called for []
lookups — fetch
still works the same way regardless of the presence
of the constructor block. In addition to altering []
, We need to modify initialize
and add a new accessor for default_proc
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
The block passed to Hash#new
uses the hash and a key as formal arguments, so we can simply call the proc using the current
node and the given k. The recursive structure of the binary tree will take care of the rest:
my_hash = BinaryTree::HashNode.new(:test, 100){|hash, key| hash[key] = 100 }
my_hash[:my_array] << 1 << 2 << 3
# => {test => []:{my_array => [1, 2, 3]:{}|{}}|{}}
Appendix
Here is the complete source code for the binary tree hash:
And the tests: