I’ve always thought that Java is in the wrong by not allowing generic collections of primitive types. I’ve experienced in the past some situations where boxing and unboxing was the reason of a bottleneck in a performance critical code and always felt when working with C# that for example Dictionary<int,int> is very useful and gives that extra edge with performance So today I’ve done some experiments:

I declared ArrayList<Integer> on which I performed 100 000 inserts followed by removing all elements one-by-one (always removing the first element):

  • 6 ms. for inserts (includes amortized array resizing)
  • 3790 ms. for removes

Oddly enough I found that the removals do not cause the shrinking of the array. Canonical implementations resize by a factor of two up whenever the space runs out, but also shrink the underlying array by a factor of two if it’s 3/4 empty
I checked the Java source code and indeed, there is no shrinking of the array with removals (also the increase is not by a factor of 2 but 3)

But, back to the original thought
Here’s the code i used for my resizable list of primitive integer values:

It uses the same System.arraycopy method to move elements around and Arrays.copyOf for growing as the ArrayList implementation

And the results?

  • 3 ms. for inserts
  • 3117 ms. for removes

Clearly the cost is dominated by remove and in particular by shifting all elements by one ‘left’ which each removal (no surprise there). The shrinking of the array doesn’t help since the move is made only for elements up to current size regardless of the actual array size (it actually adds a little)
So, to make the test more about arraylist of primitive types rather than about why I should never remove the first element from an arraylist and use some other structure instead, I modified the test to always remove the last element, thus making shifting of the elements unnecessary.I also increased the number of elements put to the list to 10 000 000 since it was running much faster now. The result startled me a little:

  • ArrayList add: 2096 ms.
  • ArrayList remove: 467 ms.
  • IntList add: 181 ms.
  • IntList remove: 85 ms.

Now that’s a bigger difference (especially for inserts). So I ran the profiler and here’s what I saw (I actually had to copy the ArrayList code to a separate class in order for the visualVM to show the stats):

  • ArrayList add: 461 ms.
  • IntList add: 296 ms.
  • ArrayList remove: 433 ms.
  • IntList remove: 189 ms.

Of course these numbers are ms calibrated to the system and are so-so accurate, but they show the scale. Without boxing taken into account the numbers are significantly different.I actually tested it again with a slightly modified version where the list of integers inserted was pre-generated and generation time was not added to insert time:

  • ArrayList add: 233 ms.
  • IntList add: 77 ms.
  • ArrayList remove: 205 ms.
  • IntList remove: 160 ms.

The differences here can be accounted for by the lack of error checking in the test application.

So again – back to original point – Is the boxing costly ? Yes, very.

This is an example that was shown to me back in the day when I was at the university. I took a great class ‘Anatomy of Java virtual machine’ taught absolutely brilliantly by prof. Grzegorz Czajkowski  (at the time working for SUN but soon after moving to Google) where we got to play with raw java bytecode and write our own (simple) garbage collectors. Quite challenging for a student but a lot of fun nevertheless.

Anyway, here’s a piece of code:

What will be the result of executing it ?

As expected – a lot of console output.

What if we change the value of the integers a bit ?

Will the output change ?

After running this the program terminates instantly with no output!  (By the way, the incrementation  is not necessary, we could simply start with 128)

It’s the only known to me example where the value has such impact on flow control (if you know any other interesting ones, drop me an email).

It’s all got to do with Boxing/Unboxing and a certain specific caching feature of Java. Let’s dissect it a bit. First of all, let’s look at the conditional statement:

i<j || i>j || i==j

remember, those are Integers, not ints i < j and i > j will force auto unboxing so the comparison will happen with unboxed ints. The == equality comparison however will happen for Integer types and will compare the references. That explains why we get no output for 128 – i and j are simply two different Integer objects, so the reference comparison produces false.

Why then if we set the value to 127 (or 10 for example) do we get an infinite loop ? JVM caches the Integers between -128 and 127 and when you use one of those, it uses a cached one instead of creating new Integer object every time. It’s actually a part of JLS (Java Language Specification) – 5.1.7 Boxing Conversion states “If the value p being boxed is truefalse, a byte, a char in the range \u0000 to \u007f, or an int or short number between -128 and 127, then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.”

Java 1.6 from certain update allows for specifying the upper threshold of the cached range – by using -XX:AutoBoxCacheMax=<size> JVM option

Some use this example as an argument that you should always use Integer.valueOf() instead of autoboxing. I think you just need to be careful and aware of such behaviors.

EDIT: Check out this blog post: http://martykopka.blogspot.com/2010/07/all-about-java-integer-cache.html It’s explaining how the Integer caching has changed through different versions of Java

 

As mentioned last time, there is another behavior that may be a surprise (it certainly was to me when I discovered it). Consider the following minimal TCP server code:

It’s as simple as it gets and seems to be quite correct. The lock on the type is an overkill but should do the trick making sure our buffer is consistent.

Client code is just sending single bytes and flushing the stream every time (200 single byte messages).
What is the output? Here it is (a relevant part of it at least):

Notice lines 3 and 4 – the data is printed in reversed order. How come ?!

Let’s look again at the server code:

Console.Write is protected by the lock and it is guaranteed that only one thread will call it at the time, buffer is already copied, what can be wrong ?

The only thing that may rise an eyebrow is the call to BeginRead happening before the processing of the data read from buffer. It cannot interfere with the data though as there is a lock and our data is already in a copied byte array. “When you have eliminated the impossible, whatever remains, however improbable, must be the truth“. The only remaining option is that the thread that is doing the out-of-order write is the same one that is asynchronously called to process the incoming data, but instead of calling BeginRead and returning it actually synchronously calls the async callback. MSDN documentation fails to mention that, but actually that’s exactly what happens. We can verify that easily by adding the following piece of code:

and here is the relevant output:

Notice lines 1 and 8 – it’s the same thread that is calling the rcv method twice. Since in C# locks are reentrant, it doesn’t protect the EndReceive from executing (if they weren’t it would be a self-deadlock).

In the software where I observed it it didn’t happen a lot – the bigger the load the more likely was it to surface. Though when it did happen, the recursion depth was around 5-6.  The reason is that under the hood an OverlappedAsyncResult is used which can detect that there is another async IO in progress and piggyback on it.

Hope you’ll find it useful, I learned the hard way.