Saturday, July 11, 2009

Synchronization issues related to Java collections when accessed concurrently

Being a java developer or an application server administrator interested in learning how the code works like me :) , you must have seen the javadoc http://java.sun.com/javase/6/docs/api/java/util/HashMap.html warning: Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map.

So here are some things that might happen depends on what kind of Map you are using like LRUMap or TreeMap etc. I have seen these behaviors in runtime where the Map is not synchronized but concurrently accessed, and though of publishing it in the blog which might help others to debug similar issues or correlate the symptoms to the failure they might have seen but not having a clue.



1) LRUMap Memory leak : If you are using a LRUMap and multiple threads are putting and getting values concurrently, some kind of memory leak happens and the object grows beyond the specified maximum size. Every time you access the LRUMap either by get or put, the object you are getting or putting had to be moved to the MRU(Most recently used) place within the Map. Also when the LRUMap is full it has to remove the LRU object to provide room for the new object. I believe this is accomplished by some kind of a link list data structure within the map. Hence there is a lot of overhead during the process and there is quite a chance to get the link list corrupted when multiples threads are get/put’ing concurrently particularly when the map is full.

I noticed the “size()” as returned by the object sometimes gets higher than the maximum size and sometimes show less than the maximum size when it’s supposedly be full depending on how and where the corruption happens within the link list .Further accessing the object after it’s corrupted, the “size()” is starting to get reduced every time a further corruption happens and leads into negative size (-1…-20) and keep increasing in the negative side. Note the “size()” that I am referring is the size returned by the counter that the object is using to track and not the actual object size. so I took a heap dump and found that any object that is added after Is never discarded at all , I guess because of the logic that size never gets bigger than the maximum size after corruption.It looks like one the object is corrupted it’s starting to leak and then increasing the overall size(byte size) and the behavior in erroneous.
I am attaching the lru.jsp that I used to simulate so that you can try and see how it behaves. The below picture shows how the object java.math.BigDecimal grows in consecutive heapdumps even though the max size is set to 2.


Note that this behavior happened on WebSphere 5.1 Application Server on JDK 1.4.2 not sure what might be the behavior on other version of JDK or AppServer. Please post in my comment if you are seeing any different behaviors for any other types of Map.

2) TreeMap High Cpu Utilization: This behavior happened particularly when the interator iterating through the treemap collection is not synchronized, where one thread is changing (add or delete) the element and then other iterating through it concurrently. I didn't do much analysis but it looks like the underlying data structure of the TreeMap some how gets corrupted and causes a circular reference between two elements causing the interator to go into an infinite loop which caues the CPU to spike and eventually the ApplicationServer will become unresponsive. You probably want to kill the process to reocver from this state. Here is the thread stacktrace that you would see if take a thread dump, Attaching the tree.jsp for you to test and see.

at java.util.TreeMap$NavigableSubMap$SubMapIterator.nextEntry(TreeMap.java:1570)
at java.util.TreeMap$NavigableSubMap$SubMapKeyIterator.next(TreeMap.java:1629)
at org.apache.jsp.tree2_jsp._jspService(tree2_jsp.java:120)

Also note many programmers may forget to synchronize on the collection when iterating eventhough the collection is synchronized. Hence, it is imperative that the user manually synchronize on the returned collection when iterating over it to prevent any non-deterministic behavior.

Collection c = Collections.synchronizedCollection(myCollection);
...
synchronized(c) {
Iterator i = c.iterator(); // Must be in the synchronized block
while (i.hasNext())
foo(i.next());
}

No comments: