Java Boxing Benchmark

Page content

This article continues the dive into the Java Microbenchmark Harness aka. JMH by comparing Java’s boxing performance in a for loop vs using primitive types to achieve the same.

Setting Up a Project

Clone the project: https://github.com/kimsaabyepedersen/jmh-boxing.

What Will Be Benchmarked?

The two for loops below will be benchmarked twice. Both benchmarks start with an array of length 128. They then proceed to add consecutive Integers to the array. In one benchmark the values added start from 0 and in the other benchmark the values added start from 200.

The array length is 128, because there are 128 numbers from 0 to 127, both inclusive. The numbers 0 to 127 are interesting because Java caches Integer values between -128 and 127 (both inclusive). The boxing itself, calls the valueOf method of the Integer class and the Javadoc for the method gives the details .

 1// Initialization code left out, creates these arrays
 2int[] ints;
 3Integer[] integers;
 4
 5@Benchmark
 6public Integer[] boxing() {
 7  for (int i = 0; i < integers.length; i++) {
 8    // boxing happens here as the result of the addition is an int
 9    // which is boxed to an Integer to match type of integers array
10    integers[i] = i + startValue; 
11  }
12  return integers;
13}
14
15@Benchmark
16public int[] nonBoxing() {
17  for (int i = 0; i < ints.length; i++) {
18    // no boxing as ints array and result of addition is of the same primitive type
19    ints[i] = i + startValue;
20  }
21  return ints;
22}

So the question is: how much faster is and int vs an Integer when the values are small (cached) and what if the values are larger (non-cached)? Here faster will be measured as the number of operations per second, where more means faster.

The Output

Human-readable output from running benchmarks looks like this:

1Benchmark                  (startValue)   Mode  Cnt         Score        Error  Units
2BoxingBenchmark.boxing                0  thrpt    9   2174940,666 ±   7201,642  ops/s
3BoxingBenchmark.boxing              200  thrpt    9    689941,327 ± 107816,476  ops/s
4BoxingBenchmark.nonBoxing             0  thrpt    9  19186481,525 ± 120700,617  ops/s
5BoxingBenchmark.nonBoxing           200  thrpt    9  19207011,149 ±  58883,470  ops/s

Looking at the boxing benchmark first, the cached version performs 2174940,666 operations per second. The non-cached performs 689941,327. The cached version performs roughly 3.1 more operations per second than the cached.

The non-boxing benchmarks, which uses the primitive type int, shows that the is practically no difference when starting from 0 or from 200, as expected as there is no cache for primitive types.

Comparing the non-boxing and the boxing benchmark, the non-boxing version is 8.8 times faster when the boxing version uses caching and 27 times faster than the boxing version when the Integers are not cached.

Note that microbenchmarks are difficult to get right: