Scaling Play 2.7.x (with Akka HTTP Server) for a large number of concurrent SSE connections

Hello,

We are facing issues in Play Framework 2.7.x with a use case where we try to serve a large number of concurrent clients with an SSE event stream.

We started to see the issue after upgrading from Play 2.4.x to Play 2.7.x where the major change introduced was the introduction of Akka HTTP as the default HTTP server and the replacement of Iteratees with Akka streams for feeding the Play EvenSource

As for the stream implementation (the alternative for the Iteratees implementation we had before), we have already tried out 2 different implementations

  1. using Akka Streams BroadcastHub
  2. using plain actors (based on the reference implementations used by the LinkedIn team)

We believe at the moment that both implementations of the stream are not the issue, BUT that the problems are introduced by default limits in Akka HTTP server which we appear not to be able to tweak to our advantage.

We have noticed that by default the Akka HTTP server has the following values

akka.http.server.max-connections = 1024
akka.http.server.backlog = 100

I am mentioning the above settings as in a 2016 blog by some LinkedIn engineers (https://engineering.linkedin.com/blog/2016/10/instant-messaging-at-linkedin--scaling-to-hundreds-of-thousands-) , the latter one was explicitly mentioned as a critical parameter for scaling up.

we are running 2 instances of Play in our test environment. From that we conclude that with the default settings there exists some limit of 2048 concurrent connections. Obviously we would like to scale up dramatically on a single node to accomodate for a much higher number of users, just as LinkedIn is doing.

The LinkedIn blog did not have to mention the max-connections, because this article was written at a time when Netty was still the default HTTP server underneath Play.

Today however, Akka HTTP has become the default HTTP server, and it appears to have a default value for max-connections of only 1024.

So, we decided to increase this default value, but somehow we have the impression that any attempt to apply this change either is not applied, or somehow there is still some other hidden limitation that prevents us from scaling up.

In the Play documentation, we find that only a subset of the default AKKA http parameters can be altered (https://www.playframework.com/documentation/2.7.x/SettingsAkkaHttp). To our surprise, the max-connections parameter is not listed by the Play configuration. We tried to set the parameters as outlined below

play.server.akka.max-connections =
play.server.akka.backlog =

This does not appear to have any effect.

A parallel change we applied was to also add these parameters in the play.akka.config section we are using.

play.akka.config = “play-akka”

play-akka {
http.server.max-connections =
http.server.backlog =
}

Unfortunately we are still seeing our HTTP server come to a halt once our test program exceeds the supposed max limit based on the default settings (2 x 1024 = 2048).

When we run the same load test on our old software (with Netty underneath), everything works fine.

We are scratching our heads right now with the following questions

  1. Are we doing this correctly ?
  2. Can someone tell us from their own experience whether there are limitations on the Akka HTTP server that we cannot control ? Is Netty the only valid option for this kind of use case ?
  3. Did we miss something else ?

Thanks in advance for giving this a thought.

1 Like

If I were in your shoes probably I would start looking at the play/akkahttp source code and investigate how these settings are used. Thanks for including the linkedin article. It’s very interesting and pretty crazy of how much tuning they had to do. According to the article, they forked Play! Other thoughts that I had: you can still use the Netty back-end, https://www.playframework.com/documentation/2.7.x/SettingsNetty, though I don’t know how it affects your code. Also, did you try version 2.8.x? I wonder if it makes any difference. Please let us know if you find a resolution to your issues.

Hello,

Thanks for your reply.

Going back to Netty is really the last thing we consider to get out of the woods.

I am still hoping the Akka HTTP team can help us get around this.

The Play framework has been used by LinkedIn in the past and scaled to serve hundred thousands of users. I would expect that with a new version of the same system, we should be able to scale equally well.

I am at the moment still hoping some people from the Akka Team will be able to provide an answer to my questions.

If engineers of the Play Framework decide to replace Netty with Akka HTTP, then you would expect them to require Akka HTTP to perform and scale equally well… and at the moment I still expect that to be the case.

I am looking for my lack of expertise here and I am hoping they will be able to shed some light on this observation.

Either the solution is simple and I am to be blamed for my lack of understanding, or I think there is something for the Akka HTTP team to look at.

What about this: https://doc.akka.io/docs/akka-http/current/server-side/low-level-api.html#controlling-server-parallelism

Can you explain specifically what you did? Setting akka.http.server.max-connections should allow you to increase the number of connections, but keep in mind that in dev mode this will need to be put in devSettings since it needs to be applied to the server before your application is created. In production it can be in application.conf

It looks like @wsargent also has an older example project that tries to test a large number of connections using Play’s akka-http backend: https://github.com/wsargent/play-connection-test/

Hello Greg,

First of all. Thanks for joining the discussion.

For all I know we are running Play in PROD mode, so, that’s why we applied this to the application.conf (well not exactly : we are running this inside Docker and we are passing the -Dconfig.file that points at our custom conf file).

Inside the Docker container there is a script that runs - to do some specific preps to include our web client resources - which eventually uses the java command to launch the main class play.core.server.ProdServerStart).

So, for all I know : we are running in PROD mode.

We are using the java command to start not the sbt run command.

So, I am assuming at the moment we are applying these settings in the correct location.

Obviously, our setup is a bit more complicated, because we are running an additional actor system alongside the standard Play actor system that drives the Akka HTTP server. That’s why we are using the play.akka.conf parameter to declare the namespace for Play’s own Akka configuration.

As already mentioned in the initial post : we are setting this now already in 2 places:

play.akka.config = “play-akka”
play-akka {
http.server.max-connections = 8192
http.server.backlog = 1024
}

play.server.akka {
max-connections = 8192
backlog = 1024
}

in the namespace play.server.akka and in our custom akka config namespace (http.server). But the results of our load test does not change : once we near 2048 we start to see the connection refused errors in our Gatling report.

Our gatling script is simple. We inject a constant number of new clients every second for some time period. That means we have a linear growth of concurrent clients : they simply open the SSE connection, expect to see a keep alive message arrive within a certain time frame, they pause for a while (long enough for our script to keep the number of concurrent sessions growing during the injection), and we then start to disconnect again and reduce the load at a similar rate as we previously injected them. A very simple script.

Looking at the source code of the Akka HTTP Server here (https://github.com/playframework/playframework/blob/2.7.x/transport/server/play-akka-http-server/src/main/scala/play/core/server/AkkaHttpServer.scala) I get the impression that setting max-connections or backlog in the play.server.akka namespace will have no effect. It seems to only pick the properties that are documented, but those do not include the ones which are critical to us). So, it looks like the the Akka HTTP server implementation indeed cherry picks from the properties set in that namespace and thus will ignore max-connections and backlog even if you set it.

So, that would mean we can only tweak through the standard akka settings. Those seem to be applied with a fallback to the underlying system configuration… but even that does not appear to happen at least not in that class when you put it in a custom namespace as we are doing.

Hello costa,

Not sure if the pipelining is relevant for our scenario.

In the end : this is basically a use case where we need to have sufficient connections for:

  1. The permanent SSE connections : this is one way communication, it is the server that pushes things. There will not be any requests coming from clients
  2. Making sure that when the number of SSE connections grows, we still have sufficient connections left such that regular request-response cycles can still be completed without delay.

It is exactly that point 2. that is the problem : with the SSE permanent connections growing, we run out of connections to serve the regular requests, and that’s how our system no longer serves the regular HTTP requests coming from our web client.

In our setup we have a load balancer in front of 2 Play servers. That load balancer does a liveness probe to the server every second. Once the liveness probe fails our load balancer stops forwarding to the irresponsive server, killing all existing connections, with an even bigger load going to the remaining server. End result : a ping pong game between the 2 servers.

Hi Greg and costa,

Some good news.

It turns out that when you are putting in the tweaks in the standard namespace

akka.http,server {
max-connections =
backlog =
}

… the test DOES succeed.

But this then raises the following questions:

  1. Why can’t we set this using the play.server.akka namespace at the moment (that would be the obvious location to put those tweaks, but indeed - as the Play docs show - that only takes into account a predefined set of properties, but not the ones we would like to tweak.
  2. Why isn’t the launch of the AkkaHttpServer taking into account the settings you make in the custom namespace you would define when running another actor system in the same JVM. Our settings in the namespace we defined through the parameter play.akka.config were also getting ignored.

The only solution thus is to put these settings in the regular namespace. I think that’s rather confusing.

Anyways : we seem to have found a solution to the problem, but it was not the solution we expected to need based on the existence of the above 2 options.

Have you tried setting the configuration exactly as given in the Akka HTTP documentation? In that case it would look something like this:

akka.http.server {
  max-connections = 8192
  backlog = 1024
}

Your attempt using play.akka.config doesn’t work due to this bug: https://github.com/playframework/playframework/issues/6183. In this case I suspect you don’t want to use the custom config prefix at all, since you shouldn’t have multiple Akka HTTP servers running at once. If you were already using it somewhere I’d be curious to know why.

For the play.server.akka configs, the only ones that work are the ones documented in https://www.playframework.com/documentation/2.7.x/SettingsAkkaHttp.

Thanks Greg for all your efforts, but I think we are out of the woods now with the option described above : just put it in the regular namespace, not in a custom one… well in fact … you were eventually suggesting what I was already doing… basically coming to the same conclusion : don’t use the custom namespace, use the standard one.

The reason why we used play.akka.config was that we are running another actor system that joins an Akka cluster where our business logic is located. So, we use the regular akka namespace for the other actor system, but in there we do not use Akka HTTP server or anything. It is safe for us to add those required parameters in the default namespace in the end. It will not affect or have any impact on the configurations for that other actor system.

There is one additional observation I would like to share here wrt the implementation of the Play SSE EventSource using Akka Streams.

As mentioned, we tried out 2 implementations.

The first one used an Akka stream source queue in combination with the BroadcastHub feature. However, with this construction, there appears to be a problem with the proper closing of the connections in Play : when browsers are closed, somehow, the connections at the Play level appeared to still remain open for an indefinite period of time. We noticed this when we were running several Gatling sessions in a row each adding loads to a total sum that exceeded our max-connections setting.

So, we eventually chose an approach that was also applied by the LinkedIn people.

We eventually implemented it using a regular ActorRef Source which gets prematearialized and then injected into a helper actor that is created for each connection, watches over that actor and then cleans itself up once the source actor gets terminated.

This raises some questions about the BroadcastHub in our opinion. We seem to loose control over connection management and a proper way of handling the situation when the clients close their event source (by simply closing the application by shutting down their browser).