Java's LinkedHashMap – demystified. Sort of.

A few days back I came across a problem : Design a system with following APIs :

  1. put(n) : Puts the value n
  2. get() : Gets the first added value

Obviously, this was a queue because we want get() operation to return the element added first. Here the order of adding elements is important. So far, things look good. We create a Queue using a doubly linked list with head and tail pointer so that get() and put(n) are both O(1) operations.

How about adding one more operation to this system? We want to add an API – removeElement(n) – that removes an element n from the queue in best possible time. Hmm, we have a problem. What options do we have ?

  1. Do nothing. Removal of the element from the linkedlist based queue would be O(n) in the worst case.
  2. Use a self-balancing BST instead of a linkedlist to implement this. In this case, removeElement(n) will be O(log n) but managing the insertion order would be very difficult and hence get() and put(n) would be increasingly difficult.

A bit more consideration would make it very clear that what we are looking for is actual an LRU Cache(a slightly modified) and want to make sure that all operations are as optimal as possible.

There is a way by which we can introduce a bit of hashing with the linked list. Lets have a hashmap where key is the number ‘n’ and the value associated with the key be the address of the node in the linked list. Now our operations look like:

  • put(n) : Normally add the node at the head of the doubly linked list and make the entry of ‘n’ as key in the hashmap and store the address of the newly added node as value. This operations is o(1).
  • get() : Simply remove the node at the tail in the doubly linked list and remove the node reference from the Hashmap. Again a o(1) operation.
  • removeElement(n) : Do a o(1) lookup from hashmap using the key ‘n’ – get the value which is the address of the node in the linked list. Now that you have the node to be removed, removing the node in the doubly linked list is very easy – infact its o(1).

 

Funny thing is that Java has an interesting collection called – LinkedHashMap that does exactly this.There is a very useful implementation which shows how LinkedHashMap’s logic can be coded using a HashMap and a doubly linked list. After wasting a lot of time I found this little article explaining how LinkedHashMap is implemented. Check the graphic below.

linkedhashmap

 

 

 

 

As per the original article – LinkedHashMap provides a convenient data structure that maintains the ordering of entries within the hash-table. This is accomplished by cross-cutting the hash-table with a doubly-linked list, so that entries can be removed or reordered in O(1) time. When operating in access-order an entry is retrieved, unlinked, and relinked at the tail of the list. The result is that the head element is the least recently used and the tail element is the most recently used. When bounded, this class becomes a convenient LRU cache.

 

Ved

 
  • http://curiosityhealsthecat.blogspot.com/ saurabh hirani

    Nice post. Thanks for sharing the LinkedHashMap article. However the first thing that came to my mind after reading this is:

    1. If you delete element 2 (assume that hash map has 4 elements – key 0, 1, 2, 3), and delete its mapping from the hash you are left with keys – 0,1,3 and 3 elements in your list

    2. Now the user will have do removeElement(3) – he can’t do a removeElement(2) as even though there is an element present at index 2 (the one present at 3 after deletion) – he cannot delete it as hash lookup for 2 has been removed.

    If you are going to make the user do a key based deletion – then why not use only a hash? Insert is O(1), removal is O(1) order is maintained internally as hash keys are added in increasing order.

    And as far as knowing which is the first element goes – keep a unique key in the hash – min_idx which is updated if the first element is removed i.e. if someone removes key 0 then min_idx is updated to 1 and so on.

    I may have misunderstood something. Please correct me if I am wrong.

    • Ved

      Saurabh, removeElement(2) is not “remove element from 2nd index” it means “find 2 and remove it from list” – so finding from the linkedlist is O(n) operation. In this case, if I removeElement(3), I will use the Hashmap to do a lookup of key 3 and using the value (link to the node) will remove the entry. If I use ONLY hash, hash structures inherently do not keep any insertion order. Your idea of just keeping a hash and a min_index will be able to keep only the first insertion and lose other insertions (hence the queue).
      e.g.: put(100) – min_index = 0
      put(4) – min_index = 0
      put(5) – min_index = 0
      get() – remove 100, now as hashes are inherently unordered so how will you determine which element is the next to be removed ?

  • http://crashpoint-zeros-brain.blogspot.com Saurabh Hirani

    If removeElement(2) is not “remove element from 2nd index” – then it all makes sense – my bad. I was thinking of removal by providing the index part. If that was the case, then doing a put and a get won’t be destructive. But doing a removeElement will lead to making the indices beyond the index that was deleted obsolete – as they need to be reduced by 1 or pushed up.

  • https://dhruvbird.com/ Dhruv

    In fact, you don’t even need to remove the element from the linked list when you remove it from the hash map. The next time you do a get(), just check if the head of the linked list is present in the hash map. If not, delete it and look at the next element. Repeat till you find an element in the hash map. This is amortized O(1) per operation.