Thursday, June 17, 2021 » CrateDB

The Circuit Breaker mechanism in CrateDB

In this article we’ll take a look at the circuit breaker mechanism in CrateDB. CrateDB uses circuit breakers to prevent a node from running out of memory or stalling due to high garbage collection load.

Query execution requires memory

CrateDB is written in Java and Java is a garbage collected language. That means memory management is done by the runtime and not manually by the programmer. Most of the time that’s convenient, but it can also be problematic.

As a SQL database, CrateDB allows to execute a wide range of queries. To process queries it is necessary to load data into memory. CrateDB tries to avoid loading more data than necessary into memory, but for certain types of queries it is mandatory to materialize the entire result set. There would be an option to spill over to disk, but CrateDB (as of 4.5) doesn’t implement that. There would be an option to operate on off-heap, and CrateDB supports that in a few places, but not everywhere.

So what happens if a query requires more memory than available? If memory is allocated on the HEAP, and there isn’t enough free memory available the JVM will trigger a garbage collection. Ideally this garbage collection run will be able to free up memory - but what if all live objects are still referenced? It won’t be able to free up memory.

Garbage collection

The JVM contains multiple garbage collector implementations. CrateDB 4.5 by default chooses the G1GC implementation. G1GC is a generational garbage collector. It is called generational because it splits the heap into different regions (generations) called Eden space, survivor space and old space. The premise is that most objects die young, and splitting the heap into generations allows the garbage collector to make the cleanup of objects that die young fairly cheap.

There are different events that trigger a garbage collection run, one of them is not having enough free memory for further allocations. I’m simplifying things a bit, but the gist is this: First try to run the cheaper young, mixed and concurrent garbage collections and if they don’t work out, move on to the nuclear option, an expensive full garbage collection.

By default the JVM is fairly stubborn and attempts to run GCs repeatedly until it finally gives up if successive runs don’t manage to free up memory. If it does give up, it raises a OutOfMemory error - which depending on application code and the ExitOnOutOfMemoryError option can lead to the termination of the application.

CrateDB nodes dying with OutOfMemory errors doesn’t sound appealing. Spending a lot of time on garbage collection also doesn’t sound nice. What does CrateDB do to prevent that?

Circuit breakers

CrateDB uses so called circuit breakers to short-circuit operations which may take up too much memory. The concept was inherited from Elasticsearch. The idea is to help out the garbage collector and prevent it from getting into a situation where it repeatedly tries to free up memory - only to fail.

Let us look at an example to elaborate how this works. Suppose we have a GROUP BY query with some aggregations. To compute the correct results, CrateDB needs to hold onto the groups and their aggregation state until it processed all records.

Suppose the full result does not fit into memory. What would happen without circuit breaker is that the node would experience a spike in load due to frequent garbage collections, only to eventually die. Why? Because the application code tries to add another entry to its result-set. This result-set occupies the majority of the memory and cannot be freed until the operation is complete. This is a situation a garbage collector cannot solve.

So how does the circuit breaker prevent this?

The circuit breaker mechanism adds its own memory accounting. CrateDB estimates the memory consumption of the records and if they end up consuming too much memory, it short-circuits the operations and returns an error to the user. The idea is that the circuit breaker kicks in before the garbage collectors runs into a situation it cannot solve.

Ideally this keeps a node alive and operational, without impacting the system performance too much - allowing other queries to finish successfully.

Memory Accounting

Unfortunately, Java doesn’t provide any primitives out of the box that would make it easy to get the size of objects. But there are libraries like Jol that provide methods to guesstimate the size and Lucene also contains logic to estimate the size of objects - that’s what CrateDB uses.

Unfortunately, there are cases where this gets tricky. Suppose you’ve an ArrayList and add one element, how much memory does it use? Do you account the memory of the element plus the required pointers? If you keep adding elements, the ArrayList will eventually have to resize the underlying array. Resizing results in array copy operations, and doing a copy operation each time you add a new element would be expensive, so internally the ArrayList grows the array by more than one. The details of the growth policy are not specified and could vary from one JDK implementation to another. The end effect is that CrateDB may end up accounting for one more element, but the real memory use suddenly doubled.

You could re-compute the size of the ArrayList by walking the object graph after each add operation, but doing that all the time is expensive. Another option would be to look at the concrete implementation and to predict and account for the internal resizes when estimating the memory usage. Depending on the concrete structures this can get complex. Because of these reasons CrateDB doesn’t aspire to do fully accurate memory accounting, but instead opts for a best-effort approach.

To mitigate these inaccuracies, CrateDB uses thresholds to have error tolerance and combines the memory accounting with the used heap memory information from the runtime.

There are some issues with that approach too: The used heap memory information isn’t reliable. It does tell you how much memory is currently in use, but it cannot tell you how much of it could be freed by a garbage collection. If CrateDB were to rely entirely on the used heap memory information, it would start short-circuiting all operations and preventing garbage collection triggers from happening.

Because of that, CrateDB combines the information with the manual memory accounting. If there is a big gap between the two numbers, CrateDB assumes that a garbage collection would be able to free up memory and allows operations to continue.

Wrap up

I hope this article gave you some insights into the circuit breaker mechanism in CrateDB and that it illustrated some of the implications when building an application like CrateDB with a runtime using garbage collection and automatic memory management.