Play non blocking vs Spring blocking - pure Play's performance

Pasting my question from StackOverflow since did not get any answer.

I made some benchmarks to figure out whether non-blocking io really performs better than blocking.

For this, I wrote to services: one in play, another - in spring.
Each service makes several parallel remote requests:
Play uses non-blocking WSClient, Sprint - blocking RestTemplate.

But in all benchmarks with various configurations (I’ve used apache bench and artillery) spring boot outperforms play.

In play I often get Connection reset by peer (54) error with apache bench (though spring handles save values of concurrent users fine).

It really confuses me, I’ve tried to set various values in play configuration (like parallelism-factor, etc), but still, it does not help.

Both services are deployed to aws with t2.xlarage running ubuntu 18 (each service is running on its own instance).

Code for play service:

HomeController

@Singleton
class HomeController @Inject()(requestService: RequestService, cc: ControllerComponents)(implicit ec: ExecutionContext) extends AbstractController(cc) {

  
  def index() = Action { implicit request: Request[AnyContent] =>
    Ok(views.html.index())
  }

  val rand = new Random

  def parallelRequests() = Action.async {
    val futures = (1 to 10).map( _ =>
      requestService.makeRequest(s"http://3.17.161.135:9000?p=${rand.nextInt(99999) + 1}"),
    )

    Future.sequence(futures).map{ _ =>
      Ok("Done")
    }
  }
}

Request Service

@Singleton
class RequestService @Inject()(wsClient: WSClient)(implicit ec: ExecutionContext){

  def makeRequest(url: String): Future[String] = {
      wsClient.url(url).get().map { r =>
        r.body
      }
  }

}

application.conf

# https://www.playframework.com/documentation/latest/Configuration
play.filters.enabled += play.filters.hosts.AllowedHostsFilter

play.filters.hosts {
  # Allow requests to example.com, its subdomains, and localhost:9000.
  allowed = ["."]
}

play.application.secret = "supersecret"

akka {
  http {
    server {
      max-connections = 1024
    }
  }
  actor {
    default-dispatcher {
      # This will be used if you have set "executor = "fork-join-executor""
      fork-join-executor {
        # Min number of threads to cap factor-based parallelism number to
        parallelism-min = 8

        # The parallelism factor is used to determine thread pool size using the
        # following formula: ceil(available processors * factor). Resulting size
        # is then bounded by the parallelism-min and parallelism-max values.
        parallelism-factor = 3.0

        # Max number of threads to cap factor-based parallelism number to
        parallelism-max = 64

        # Setting to "FIFO" to use queue like peeking mode which "poll" or "LIFO" to use stack
        # like peeking mode which "pop".
        task-peeking-mode = "FIFO"
      }
    }
  }
}

(I’ve tried different things here, including play’s default config)

Full source code of play service: https://github.com/teimuraz/benchmark-play-async

Spring service

Controller

@RestController
public class Controller {

    private RequestService requestService;

    @Autowired
    public Controller(RequestService requestService) {
        this.requestService = requestService;
    }

    Random rand = new Random();

    @GetMapping
    public String home() {
        return "Hi";
    }

    @GetMapping("/parallel-requests")
    public String parallelRequests() throws InterruptedException {
        CompletableFuture<String> res1 = requestService.makeRequest("http://18.188.1.66:8080" + rand.nextInt(99999) + 1);
        CompletableFuture<String> res2 = requestService.makeRequest("http://18.188.1.66:8080" + rand.nextInt(99999) + 1);
        CompletableFuture<String> res4 = requestService.makeRequest("http://18.188.1.66:8080" + rand.nextInt(99999) + 1);
        CompletableFuture<String> res5 = requestService.makeRequest("http://18.188.1.66:8080" + rand.nextInt(99999) + 1);
        CompletableFuture<String> res6 = requestService.makeRequest("http://18.188.1.66:8080" + rand.nextInt(99999) + 1);
        CompletableFuture<String> res7 = requestService.makeRequest("http://18.188.1.66:8080" + rand.nextInt(99999) + 1);
//        CompletableFuture<String> res8 = requestService.makeRequest("https://amazon.com?p=" + rand.nextInt(99999) + 1);
        CompletableFuture<String> res9 = requestService.makeRequest("http://18.188.1.66:8080" + rand.nextInt(99999) + 1);
        CompletableFuture<String> res10 = requestService.makeRequest("http://18.188.1.66:8080" + rand.nextInt(99999) + 1);

        CompletableFuture.allOf(res1, res2, res4, res5, res6, res7,  res9, res10).join();
        return "Done";
    }
}

RequestService

@Service
public class RequestService {

    private static final Logger logger = LoggerFactory.getLogger(RequestService.class);

    private final RestTemplate restTemplate;

    @Autowired
    public RequestService(RestTemplateBuilder restTemplateBuilder) {
        this.restTemplate = restTemplateBuilder.build();
    }

    @Async
    public CompletableFuture<String> makeRequest(String url) throws InterruptedException {
//        logger.info("Making request to url " + url);
        String result = "";
        result = restTemplate.getForObject(url, String.class);
        return CompletableFuture.completedFuture(result);
    }
}

Application

@SpringBootApplication
@EnableAsync
public class BenchmarkApplication {

	public static void main(String[] args) {
		SpringApplication.run(BenchmarkApplication.class, args);
	}

	@Bean
	public Executor taskExecutor() {
		return Executors.newFixedThreadPool(400);

	}
}

application.properties

server.tomcat.max-threads=500

Full source code https://github.com/teimuraz/becnhmark-spring-blocking

Benchmark results with artiller

artillery quick --count 500 -n 20 [url]

Play

Summary report @ 14:47:25(+0400) 2018-12-24
  Scenarios launched:  500
  Scenarios completed: 411
  Requests completed:  8614
  RPS sent: 104.45
  Request latency:
    min: 150.9
    max: 72097.2
    median: 195
    p95: 1793.2
    p99: 12623.9
  Scenario counts:
    0: 500 (100%)
  Codes:
    200: 8614
  Errors:
    ECONNRESET: 89

Spring

Summary report @ 14:49:14(+0400) 2018-12-24
  Scenarios launched:  500
  Scenarios completed: 500
  Requests completed:  10000
  RPS sent: 600.24
  Request latency:
    min: 155.2
    max: 3550.4
    median: 258.9
    p95: 500.8
    p99: 1370.7
  Scenario counts:
    0: 500 (100%)
  Codes:
    200: 10000    

150 RPS in play vs 600 in Spring!!!

Benchmark results with ab

ab -n 5000 -c 200 -s 120 [url]

Play

apr_socket_recv: Connection reset by peer (54)

Spring

Concurrency Level:      200
Time taken for tests:   23.523 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      680000 bytes
HTML transferred:       20000 bytes
Requests per second:    212.56 [#/sec] (mean)
Time per request:       940.924 [ms] (mean)
Time per request:       4.705 [ms] (mean, across all concurrent requests)
Transfer rate:          28.23 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      153  352 198.5    303    1485
Processing:   163  364 242.4    319    6589
Waiting:      163  363 240.3    318    6589
Total:        324  716 313.2    642    7003

What am I doing wrong with play?

Which values in config should I tweak to make play perform better?

How was Play deployed on AWS? Did you run it in production (prod) mode?

@MarkoD yes, it is i production mode (I do sbt dist on the server and run executable)

I get suspicious when I see the rendering execution context passed in. Normally you would have a request service operating in its own thread pool, see the thread pools page.

Try this:

package service

import javax.inject.{Inject, Singleton}
import play.api.libs.ws.WSClient

import scala.concurrent.Future

@Singleton
class RequestService @Inject()(wsClient: WSClient) {

  def makeRequest(url: String): Future[String] = {
      import scala.concurrent.ExecutionContext.Implicits._
      wsClient.url(url).get().map { r =>
        r.body
      }
  }

}

I installed nginx on my Raspberry Pi 3 and then pointed my Dell XPS 15 at it – I threw away the first three runs to give the JVM time to warm up. Note that the raspberry-pi response time is the bottleneck time here…

❱ ab -n 5000 -c 200 -s 120 http://localhost:9000/parallel-requests
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests


Server Software:        
Server Hostname:        localhost
Server Port:            9000

Document Path:          /parallel-requests
Document Length:        4 bytes

Concurrency Level:      200
Time taken for tests:   19.627 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      600000 bytes
HTML transferred:       20000 bytes
Requests per second:    254.75 [#/sec] (mean)
Time per request:       785.084 [ms] (mean)
Time per request:       3.925 [ms] (mean, across all concurrent requests)
Transfer rate:          29.85 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   15 120.2      0    1010
Processing:   383  756 597.8    623    6439
Waiting:      383  756 597.8    623    6439
Total:        390  771 661.1    623    6439

Percentage of the requests served within a certain time (ms)
  50%    623
  66%    637
  75%    648
  80%    659
  90%    687
  95%   1940
  98%   3920
  99%   4091
 100%   6439 (longest request)

I tried running the Spring example, but I’m getting lots of failures, and I’m not sure how you have it configured. Looking at the spring logs I see

2018-12-26 16:27:01.350 ERROR 22556 --- [nio-8080-exec-9] o.a.c.c.C.[.[.[/].[dispatcherServlet]    : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.util.concurrent.CompletionException: java.lang.NumberFormatException: For input string: "8080102131"] with root cause

java.lang.NumberFormatException: For input string: "8080102131"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[na:1.8.0_192]
        at java.lang.Integer.parseInt(Integer.java:583) ~[na:1.8.0_192]
        at java.lang.Integer.parseInt(Integer.java:615) ~[na:1.8.0_192]

You’re also using WAY too much memory. Play can operate happily with a 512MB heap – you have -J-Xms5060M -J-Xmx5060m in your deploy.sh, and you don’t need that unless you’re running an in-memory database or HTTP cache of some kind.

I had a similar issue trying to integrate a chatbot to a non-blocking WSClient. I put my questions on several forums but did not get any proper response. I did a GET request on the service. The service fetched 10 records from the database and returned them as JSON. I have also been using a REST/WebService APIs through a 3rd party client library (ie, not using Play’s asynchronous WS API). Since I didn’t get any response I tried talking to a chatbot development services provider. They told me that in most situations, the appropriate execution context to use will be the Play default thread pool. This is accessible through @Inject()(implicit ec: ExecutionContext). it didn’t do much help either, I think I’ll have to leave it a professional development company.