Generics are now a must-have in any statically typed programming language. Yet, there is a strong tension between the uniform interface exposed to programmers and the low level implementation, which has to deal with data of different sizes and semantics: booleans, integers, floating-point numbers and heap objects. Different languages have taken in very different paths to implementing generics, each with its own advantages and drawbacks. In Java and other JVM languages the standard is boxing primitive values, meaning that heap objects are used to carry primitive values. This leads to significant slowdowns and increased heap consumption, especially when using generic collections for primitive types.
Scala proposes specialization as an alternative to boxing: the compiler can duplicate and adapt user code for each primitive type, thus using unboxed data. Specialization has been shown to reliably speed up code and is extensively used in established community libraries, such as spire (non/spire) and breeze (scalanlp.org). But statically duplicating code can result in significant jar sizes and long compilation times. For example, specializing a 3-element tuple, which takes three type parameters, yields 1000 almost-identical classes.
In the miniboxing project we set out to reduce the static bytecode size while maintaining optimality: we build on the idea of tagged union, thus offering a single variant of the code for all primitive value types. This means the 3-element tuple can now be specialized with just 8 classes. But matching the performance of specialization was a long and convoluted journey. In this presentation I will explain the basics of miniboxing and show what we had to do to match the speed of specialized code.
The project's website, scala-miniboxing.org, contains all you need to get started: the miniboxing compiler plugin, documentation, usage examples and benchmarks.