20 February 2013
This is the first in a series of posts about Garbage Collection (GC). I hope to be able to cover a bit of theory and all the major collectors in the hotspot virtual machine over the course of the series. This post just explains what garbage collection is and elements common to different collectors.
Your Java virtual machine manages memory for you - which is highly convenient - but it might not be optimally tuned by default. By understanding some of the theory behind garbage collection you can more easily tune your collector. A common concern is collector efficiency, that is to say how much time your program spends executing program code rather than collecting garbage. Another common concern is long that application pauses for.
There's also a lot of hearsay and folklore out there about garbage collection and so understanding the algorithms in a bit more detail really helps avoid falling into common pitfalls and traps. Besides - for anyone interested in how computer science principles are applied and used, JVM internals are a great thing to look at.
Your program (or mutator in GC-Speak) will be allocating objects as it runs. At some point your heap needs to be collected and all of the collectors in hotspot pause your application. The term 'stop-the-world' is used to mean that all of the mutator's threads are paused.
Its possible to implement a garbage collector that doesn't need to pause. Azul have implemented an effectively pauseless collector in their Zing virtual machine. I won't be covering how it works but there's a really interesting whitepaper if you want to know more.
Simply stated: Most allocated objects die young 1. This concept was demonstrated by empirically analysing the memory allocation and liveness patterns of a large group of programs during the 1980s. What researchers found was that not only do most objects die young but once they live past a certain age they tend to live for a long time. The graph below is taken from a SUN/Oracle study looking at the lifespan of objects as a histogram.
The young generational hypothesis has given rise to the idea of generational garbage collection in which heap is split up into several regions, and the placement of objects within each region corresponds to their age. One element that is common to the above these garbage collectors (other than G1) is the way that heap is organised into different spaces.
When objects are initially allocated, if they fit, they are stored in the Eden space. If the object survives a collection then it ends up in a survivor space. If it survives a few times (your tenuring threshold) then the object ends up in the tenured space. The specifics of the algorithms for collecting these spaces differs by collector, so I'll be covering them seperately in a future blog post.
This split is beneficial because it allows you to use different algorithms on different spaces. Some GC algorithms are more efficient if most of your objects are dead and some are more efficient if most of your objects are alive. Due to the generational hypothesis usually when it comes time to collect most objects in Eden and survivor spaces are dead, and most objects in tenured are alive.
There is also the permgen - or permanent generation. This is a special generation that holds objects that are related to the Java language itself. For example information about loaded classes is held here. Historically Strings that were interened or were constants were also held here. The permanent generation is being removed in favour of metaspace.
The hotspot virtual machine actually has a variety of different Garbage Collectors. Each has a different set of performance characteristics and is more (or less) suited for different tasks. The key Garbage Collectors that I'll be looking at are:
I've given a few introductory points of thought about garbage collection, in the next post I'll be covering the Parallel Scavenge collector - which is currently the default collector. I'd also like to provide a link to my employer who have a GC log analyser which we think is pretty useful.