UNC-CH COMP 410

Hash Map Implementation

Basic Hash Map

You will write code that implements and exercises a basic Hash Map. The HashMap is a MAP ADT built on a hash table. We will do collision management in the hash table by chaining (hashing into lists, or buckets).

There are plenty of implementations of "hash" stuff out there. Please do not copy those. You may read them for ideas and understanding, but I want you to write your own code and follow the structure I outline below.

The heart of a hash table (and anything built on one, or using one) is a good hash function for the data type of the hashed keys. For basic learning, I encourage you to use the JavaScript code I posted (and demonstrated in class) to experiment with various hash function ideas (on String data); but for this assignment, you will use the hash function I supply below in the code template. It is the "pretty good string" function I showed and that your book says is similar to that used in Java.

There is a built-in hash function in Java but do not use it here. I want you to build your hash table with a hash function in it -- the specific function given here. Note that the hash function provided takes two parameters (String for key, and int for table array size); you will find that this second parameter will allow you to easily use the hash function to re-hash keys into a new table array when you double the table size (in the "extend" function). Details are below.

In our program, we will have one object (class) as "the HashMap". We will call the various MAP interface functions such as put, get, hasKey, extend, etc. on that object. The operations in a hash table (and so a HashMap) are best done iteratively, not recursively, since we are manipulating an array.

The Map Interface

As before, the code templates below contains two top level interfaces: Map, and HashMap. The Map interface is exactly the one we used in Assignment 2 (TreeMap). The difference here in Assignment 3 is that we are implementing the Map ADT using a hash table (rather than a BST). Since the Map interface is unchanged over Assignment 2, the functions are defined (as before) without reference to a hash table, an array, or any other implementation structure.

Using a Hash Table to build a Map

As noted, we will use a hash table to provide the various behavior characteristics of a Map ADT, including the requirement that keys stored are unique. We will see that a hash table gives us O(1) expected times for access operations (for put, get, hasKey); this speed comes at a cost in other areas... specifically you cannot do one of the things a TreeMap allows -- getting keys out in sorted order. Our HashMap will not give us sorted keys without doing an actual sorts (and bearing that cost). We still will provide a method to get all the keys, but the specs say the order of those keys in the returned array is "unspecified", meaning unpredictable (or "random" in the popular vernacular).

The HashMap Interface

The HashMap interface extends the Map interface and adds a few operations that are possible because the implementation method is a hash table. These include minKey, maxKey, and getKeys (in unspecified order). They also include operations reelated to the table itself: lambda (the ratio of elements stored to array slots); extend (double the size of the table array and rehash all the keys stored); and of course, hash (public access to the hash function itself).

What to Hand-in

Hand in your code as per the instructions under Assignment 3 in Sakai. There will be zipping and naming instructions there.

Code Templates

For uniformity, and to aid timely grading, you must write your program with these specific structures.

Interface Map

package assignment3_f20;

/* Behavior: A Map will provide this collection of operations:

  put: 
    in: a string (the key) and a Value object (associated with the key)
    return: a Value object, or null
    effect: 
      if key is NOT in the structure
         add a new cell to the structure and put the key into it, the value into it
         and return null
      if key IS in the structure
         do not add a new cell
         instead, replace the Value object in the existing cell
                  with the new Value object
                  return the old Value object (the one being replaced)

  get: 
    in: a string (the key) 
    return: a Value object, or null
    effect: 
      if key is NOT in the structure
         return null (this includes when the structure is empty, size 0)
      if key IS in the structure
         return the Value object from the cell containing the key

  remove: 
    in: a string (the key of the pair to be removed)
    return: nothing in all cases (return void)
    effect: 
      if the key IS in the structure
        take the whole cell out of the structure... the (key,value) pair will not be in the
        structure after
      if the key is NOT in the structure
        make no changes to the state of the structure 
     
  hasKey: 
    in: a string (the key)
    return: boolean
    effect: 
      if key IS in the structure (meaning there is a (key,value) pair for the key), 
        return true
      if key is NOT in the structure, return false
      in both cases, no change to the structure tate 

  size:
    in: nothing
    return: int, the number of (key,value) pairs stored in the map structure
    effect: no change to state of tree nodes
*/

// ADT operations

public interface Map {
  public Value put ( String k, Value v ); 
  public Value get (String k);            
  public void remove (String k);          
  public boolean hasKey(String k);        
  public int size();
}

Interface HashMap


package assignment3_f20;

/* Behavior: A HashMap will provide Map operations, and also these:

  minKey: 
    in: none
    return: string, the key from the hash table that is smallest
    effect: no hash table state change
      if hash table size if 0, return null

  maxKey: 
    in: none
    return: string, the key from the hash table that is largest
    effect: no hash table state change
      if hash table size if 0, return null

  getKeys 
    in: nothing
    return: an array of strings, containing just the keys from the hash table cells
    effect: the array of strings is produced in unspecified, which means it is ok 
      to just go through the table subscript 0 up and pull keys from the cells

  hash
    in: the key (a String), and table size (an int)
    return: integer hash value for the given key and table size
    effect: no change to hash table state
  
  lambda
    in: nothing
    returns: the lambda value for the table (a double)
    effect: no change in the state of the hash table

  extend
    in: nothing
    returns: a new lambda value, after the table array has been extended
    effect: the array that is the hash table is doubled in size, and the elements 
      in the old table are rehashed (using the new array size) and stored 
      in the new table array
      number of elements stored is unchanged, array is doubled in size 
      (and so lambda gets cut in half)
  
  // leave this in... for grader
  // also specific to tree structure
  public HMCell[] getTable();


*/

// ADT operations

public interface HashMap extends Map {
  public String maxKey(); 
  public String minKey(); 
  public String[] getKeys();  // in random order
  
  public int hash(String key, int tabSize);
  
  public double lambda();  // compute lamda load factor
  public double extend();    // double table size, rehash, return new lambda
  
  // leave this in... for grader
  // also specific to hash table structure
  public HMCell[] getTable();
}

Hash Map Cell (Interface and Implementation)

//-------------------------------------------------------------------
// put this Interface and Implementation into your code as is
// the grader may need these methods to examine the structure
// that your code creates
//-------------------------------------------------------------------
// you may add methods and structure as you need to
// for example, you may want a toString to help with your debugging
// but to not remove what we have given here
//-------------------------------------------------------------------

package assignment3_f20;

public interface HMCell { // these will be used by the grader to 
                          // examine your data structure contents
  public void setKey(String newKey);
  public String getKey();
  public void setValue(Value newValue);
  public Value getValue();
  public void setNext(HMCell nextCell);
  public HMCell getNext();
}

public class HMCell_imp implements HMCell {
  public String key;
  public Value val;
  public HMCell next;

  HMCell_imp(String k, Value v) { key=k;  val=v; next=null; }

  @Override
  public void setKey(String newKey) { key = newKey; } 
  @Override
  public String getKey() { return key; }
  @Override
  public void setValue(Value newValue) {  val = newValue; }
  @Override
  public Value getValue() { return val; }
   @Override
  public void setNext(HMCell nextCell) { next = nextCell; }
  @Override
  public HMCell getNext() { return next; }
}

Value object (Interface and Implementation)

//-------------------------------------------------------------------
// put this Interface and Implementation into your code as is
// the grader may need these methods to examine the structure
// that your code creates
//-------------------------------------------------------------------
// you may add methods and structure as you need to
// for example, you may want a toString to help with your debugging
// but to not remove what we have given here
//-------------------------------------------------------------------

package assignment3_f20;

public interface Value {
  public void setIdNum(int n);
  public int getIdNum();
  public void setScore(double s);
  public double getScore();
  public void setAge(int n);
  public int getAge();
}

public class Value_imp implements Value {
  int idNum;
  double score;
  int age;

  public Value_imp(int n, double s, int a) {
    this.idNum=n; this.score=s; this.age=a;
  }
  
  @Override
  public void setIdNum (int n) { this.idNum = n; }
  @Override
  public int getIdNum() { return idNum; }
  @Override
  public void setScore (double s) { this.score = s; }
  @Override
  public double getScore() { return score; }
  @Override
  public void setAge (int n) { this.age = n; }
  @Override
  public int getAge() { return age; }
}

Class HashMap_imp


package assignment3_f20;

public class HashMap_imp implements HashMap { 
  HMCell[] tab;
  int nelts;
  
  //-------------------------------------------------------------

  HashMap_imp (int num) { 
    this.tab = new HMCell[num];
    // for (int i=0; i<num; i++) { tab[i] = null; }
    // we can rely on the Java compiler to fill the table array with nulls
    // another way would be Array.fill()
    this.nelts = 0; 
  }

  //-------------------------------------------------------------
  
  public int hash (String key, int tabSize) {
    int hval = 7;
    for (int i=0; i<key.length(); i++) {
      hval = (hval*31) + key.charAt(i);
    }
    hval = hval % tabSize;
    if (hval<0) { hval += tabSize; }
    return hval;
  }
  
  //-------------------------------------------------------------

  // dont change 
  @Override
  public HMCell[] getTable() { return this.tab; }
  
  //-------------------------------------------------------------

    
  //-------------------------------------------------------------
  // here down... you fill in the implementations for
  // the other interface methods
  //-------------------------------------------------------------
  
}

Class Playground

package assignment3_f20;

public class HashMap_Playground {
/*
 * you will test your own HashMap implementation in here
 *
 * we will replace this with our own when grading, and will
 * do what you should do in here... create TreeMap objects,
 * put data into them, take data out, look for values stored
 * in it, checking size, and looking at the HMCells to see if they 
 * are all linked up correctly chains in the hash table
 * 
*/
  
  public static void main(String[] args) {
    // sample manual tests are shown
    // it is up to you to test it thoroughly and make sure
    // the methods behave as requested above in the interface
  
    // do not simple depend on the oracle test we will give
    // use the oracle tests as a way of checking AFTER you have done
    // your own testing

    HashMap hm = new HashMap_imp(40);
    System.out.println(hm.getTable().length); // expect 40
    System.out.println(hm.size()); // expect 0

    Value v1 = new Value_imp(12345, 87.3, 21);
    Value v2 = new Value_imp(23456, 75.54, 25);
    Value v3 = new Value_imp(34567, 99.013, 19);
    Value v4 = new Value_imp(45678, 55.70, 35);
    Value v5 = new Value_imp(56789, 77.91, 32);
    Value v6 = new Value_imp(67890, 83.03, 24);
    
    hm.put("Jenny", v1);  hm.put("Bill",v2); 
    hm.put("Steve",v3);   hm.put("Alan",v4);
    hm.put("Gail",v5);    hm.put("Zed",v6);
    hm.put("Wilma",v6);   hm.put("Lauren",v6);
    hm.put("Xray",v6);

    System.out.println(hm.size()); // expect 9
    System.out.println(hm.hasKey("Bill")); // expect true
    System.out.println(hm.hasKey("Zorro")); // expect false
    hm.extend();
    System.out.println(hm.size()); // expect 9
    System.out.println(hm.hasKey("Bill")); // expect true
    System.out.println(hm.hasKey("Zorro")); // expect false

    // etc...

  }

}

Class RunTests

package assignment3_f20;
import gradingTools.comp410f20.assignment3.testcases.Assignment3Suite;

public class RunTests {
  public static void main(String[] args){ //runs Assignment 3 oracle tests
    Assignment3Suite.main(args);
  }
}

Implementation Notes

Remember that just because the Oracle gives you 100%, you are not guaranteed any specific grade on the assignment. Testing your programs thoroughly is your responsibility; try to think of all the many edge cases and extremes for the data values possible. Think of degenerate structures, small structures, large ones... The main goal of the Oracle is to provide some direction. However, it is not the end-all be-all determiner of correctness. Please do not use this tool as a crutch.

TIPS:

Extend:
Use normal, built-in, fixed length Java arrays for this implementation.

The extend function requires you to allocate a new array twice as long as the one currently installed in the HashMap object. You must then go through the old table array and find every cell (every key) and rehash the key using the new twice-as-large array size, and link the cell with that key into the new table array. If you do this by unlinking it from the old table and then "moving it" to the new table you will not create a lot of garbage. However, the method would also work by allocating new cells for the new table array (and copying in the keys and the pointers to the value objects). I think it is less complicated to unlink the cell from the old table and link into the new table.

When re-hashing into the new table array is complete, you must make the HashMap object reference the new extended array as the table array in the object.

Also recall that since we already know all the keys (cells) in the old table array are unique, there is no need to check for uniqueness when putting the cells into the new table. You can just link them in at the head of whatever list they need to go in.

Also note that the extend operation returns the new lambda value (which should be 1/2 of the lambda value before the extend).

Chaining:
Implement your hash table to handle collisions via chaining (not probing). In the put operation, put a check on lambda and if lambda becomes 1 or greater after any put, perform the extend operation automatically (before the put returns).

String keys
When comparing strings note that the String.compareTo method compares two strings lexically and should be used in this assignment.

Supporting Code Examples

Random strings

You may find this code useful. You can build larger trees for testing by generating a lot of (random) string inut rather than doing a ton of input typing. It is a class that generates random strings of characters from "a" to "z". It will generate strings that are 5 to 25 characters in length by default if you call MyRandom.nextString() with no arguments. If you wish to control the string lengths better, then you can call it with arguments. Calling MyRandom.nextString(3,11) for example will generate string that vary in length from 3 to 11 characters. The nextString methods are declared static so you can call them as shown with the class name qualifier and don't need to instantiate an object in your code.