Observe and Report: Kotlin Coroutines on Backend Services

osha1
Outbrain Engineering
6 min readNov 22, 2018

--

In this post, I would like to explore coroutines, from the perspective of backend application use cases. Most of the posts about coroutines I saw were either not specific to platform or Android-related.

Why coroutines are hard to grasp?

I am working with coroutines for a year+ and still, sometimes I find myself puzzled. I think there are some reasons for that.

Coroutines cover many use cases and scenarios, the API is rich and flexible. With that power comes the need for knowledge. Coroutines have a learning curve.

In addition, coroutines evolved fast. It means learning it is trying to hit a moving target. For example, just recently structured concurrency was added to the public API. I hope that now that coroutines are GA there will be fewer changes to the API, although some significant parts are still in an experimental state (Channels, Actors and selectors for example).

Coroutines framework is based on keywords and conventions. It looks like magic sometimes. You write code but it is doing something different. It reminds me frameworks that use annotations for meta-programming (like spring) as it obscures the flow of the code and makes learning and reasoning a bit harder.

On the other hand, the code is clean and easy to read (but not as easy to write). I think that unlike other frameworks, in coroutines the magic is required. The idea is to prevent the business logic from being interlaced with concurrency details. In other frameworks like callbacks (hell), those details are all over the place. You can see an example in my previous blog post:

In order to implement a reactive application, we will need some building blocks.

Let’s try to learn some of the basic concepts and see what they are good for and how they can help us in the backend.

async/await

Maybe the most famous use case of coroutines. In the backend, it is used to do a computation in the background and getting the result later.

coroutineScope {
val userDataA = async { getUserDataFromServiceA() }
val userDataB = async { getUserDataFromServiceB() }
val userData = UserData(userDataA.await(), userDataB.await()
}

async starts working, and await suspend the execution until it is ready. This enables parallelism.

Why do we need scope? I will get to that later.

launch

launch is similar to async, but is more like “fire and forget”.

launch { sendUserPageViewNotificationToService() }

suspend

The only keyword for coroutines in the Kotlin language itself. Suspend is used as a marker to indicate a method might take a long time to return. Suspending is the equivalent to blocking in the coroutines world. Roughly It’s like blocking but more efficient by using a state machine with callbacks under the hood.

Cancellation and errors

This is an important point to know. The happy path is always easy, but what happen on an exception? Recall the coroutineScope call from the example above. This is the boundary for coroutines, define hierarchy and relation for cancellation and error handling…

Try to guess what will happen in the following example:

val userData = coroutineScope {
try {
val userDataA = async { getUserDataFromServiceA() }
val userDataB = async { getUserDataFromServiceB() }
UserData(userDataA.await(), userDataB.await())
} catch (e: Exception) {
logger.warn(e) { "something bad happened" }
DEFAULT_USER_DATA
}
}

vs

val userData = try {
coroutineScope {
val userDataA = async { getUserDataFromServiceA() }
val userDataB = async { getUserDataFromServiceB() }
UserData(userDataA.await(), userDataB.await())
}
} catch (e: Exception) {
logger.warn(e) { "something bad happened" }
DEFAULT_USER_DATA
}

These are two similar examples, the only difference is the order of coroutineScope and exception catching nesting.

If no exceptions we’re all good, but what happens in case of exception in one of our services (pick one answer below)?

  • In both cases we catch the exception and get DEFAULT_USER_DATA.
  • First example catches the exception second doesn’t.
  • Second example catches the exception first doesn’t.

Can you tell?

Actually, none of the answers is completely accurate. In the first example exception is caught and reported, but the scope also re-throw it because of a failed child. In the second example it is working as you might expect and userData will get the default value.

In general, you should never catch exception around await because it doesn’t behave like regular sequential code.

Bottom line: this makes the code a bit harder to write, but reading remains easy in regards to the business logic itself and that is what I like about coroutines. With callback it is both hard to write and hard to read.

EDIT: it turns out that there is a lot of confusion and an open issue about it: https://github.com/Kotlin/kotlinx.coroutines/issues/763

Understanding threads in a backend application with coroutines

With coroutines, we can implement an efficient reactive backend web server. The easiest analogy is a node.js server which has one thread and is doing everything with callbacks.

For our service, we would have thread-per-core so if our server is having 30 cores it will be like 30 node.js servers running in parallel.

In order to achieve that we should keep to some principals.

Reactive backend building blocks

Use an async server like vertex or ktor

Unlike Jetty or Tomcat, those servers are not creating a thread for each user request. Instead ktor uses an event loop to handle incoming requests. When a request arrives and is ready for processing an event is triggered, it will process until we have to wait for something (like an external service) then it will suspend, means it will stop executing without blocking a thread but instead add a callback. When we get the result from the external service an event will be triggered (like in callbacks). And it continues like that until a response is ready to be written to the socket.

Don’t use blocking operation

We should take good care to make sure all our third-party libs and the API’s we are using does not use blocking API. Instead, we should use API’s that are either using coroutines or using callbacks with wrappers for coroutines.

I have to do some self-promotion for jasync-sql: a non-blocking db driver that I contribute to. Avoid blocking libraries.

Use only one thread pool

Many frameworks come with their own thread pools. Ideally, we want our application to have only one thread pool. Since the operations are not blocking the only thing those threads do will be CPU bound tasks. I think the best alternative is to use the Fork-Join Pool from Java.

See the discussion here for more references and details.

Don’t use locks

According to coroutines writes instead of locks and thread synchronization structs we should use communication via channels. I am afraid this makes the code harder to read and write. The thread locks might be subject to concurrency bugs like race conditions. However, in some scenarios this code is easy to reason about:

synchronized(this){
// do something
}
barrier(10){
// do something else
}

Mutex is implemented in kotlinx.coroutines, but is not recommended to use. And this is only the basic, there are no semaphores or read/write locks etc’.

I am still not sure if it’s not a good idea to write suspending locks (that can use channels under the hood) to enjoy both worlds of code readability and efficiency. Channels increase the learning curve dramatically in my opinion, and I am not sure their usage is less error-prone in all cases.

Check yourself in this example:

Move ThreadLocal state between threads (manually)

Some frameworks that involve cross-cutting concerns like logging and tracing are using ThreadLocal under the hood. We need to keep an eye on that and make the required state move between threads manually.

See an example with logging here.

Conclusion

I believe that if we adhere to those guidelines we can implement very efficient web servers and backend services that are also possible to maintain and to understand their code.

Coroutines are still growing up and I have the urge to work with it and get all those goodies for our servers, and to see how all those parts sums up to a full fledged micro-service.

I do feel that some of the abstractions need to be simplified by higher level libraries for common use cases.

Thanks for reading, Hope you like it. I hope in future posts I will be able to shed more light on some of the details of coroutines.

Photo by Tyler B on Unsplash

--

--