Database Connections not being returned to pool [Slick][HikariCP]

Currently on play 2.6 with play-slick 3.0.1 I’ve been running into an issue during load testing where the app is consuming tons of db connections from the pool, but for seemingly no reason - the database is healthy with no slow-queries, the apps are not being overloaded.

As proof of this, when I increase the db connection pool to 340 (our max_connections is 350), I can hit up to 28k rpm on one server. But with a pool of only 100, it almost instantly crashes under the same load with the connections timing out waiting for a connection from the pool. I can confirm that the database transactions are happening rapidly. When I enable hikari connection leak detection, I can see that some connections are taking > 60s to be returned, yet the database is showing nothing over 5 seconds. From everything I’ve read, we should be able to configure this to run with a much smaller pool.

Here are some logs:
with pool of 100

com.zaxxer.hikari.pool.HikariPool - db - Timeout failure stats (total=100, active=100, idle=0, waiting=33)

with pool of 340

com.zaxxer.hikari.pool.HikariPool - db - After cleanup stats (total=340, active=244, idle=96, waiting=0)

Some errors with pool of 100:

java.sql.SQLTransientConnectionException: db - Connection is not available, request timed out after 10000ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:548)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:145)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:83)
at slick.jdbc.hikaricp.HikariCPJdbcDataSource.createConnection(HikariCPJdbcDataSource.scala:14)
at slick.jdbc.JdbcBackend$BaseSession.(JdbcBackend.scala:438)
at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:46)
at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:37)
at slick.basic.BasicBackend$DatabaseDef.acquireSession(BasicBackend.scala:216)
at slick.basic.BasicBackend$DatabaseDef.acquireSession$(BasicBackend.scala:215)

com.zaxxer.hikari.pool.ProxyConnection - db - Connection com.mysql.jdbc.JDBC4Connection@6959f321 marked as broken because of SQLSTATE(08003), ErrorCode(0)
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Can’t call rollback when autocommit=true
at sun.reflect.GeneratedConstructorAccessor41.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at com.mysql.jdbc.Util.getInstance(Util.java:408)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:918)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:897)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:886)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860)
at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4530)

NewRelic is also showing these requests spending the majority of their time waiting on com.zaxxer.hikari.HikariDataSource.getConnection()

I’ve been digging into this for a while so any help would be appreciated.

Can you provide a project that reproduces the issue?

So I’ve narrowed down the issue. It seems that it has something to do with running queries with transactionally. Here is an example:

def _insertIfNotExists(partner: Partner, accountId: String): DBIO[Account] = {
    accounts
      .filter(_.account_id === accountId)
      .filter(_.partner === partner)
      .result.headOption.flatMap {
      case Some(account) => DBIO.successful(account) // no-op
      case _ => {
        accounts.returning(accounts.map(_.id)).into((account, id) => account.copy(id = Some(id))) += Account(None, partner, accountId)
      }
    }
  }

  def createWithDeviceIfNotExists(partner: Partner, accountId: String, deviceId: String): Future[Device] = {
    db.run(
      _insertIfNotExists(partner, accountId).flatMap { account =>
        devices._insertIfNotExists(deviceId, account.id.get) // this looks exactly like the _insertIfNotExists above
      }.transactionally
    )
  }

I am having trouble reproducing under other circumstances, but if I remove the transactionally here, I can hammer this endpoint at 28k rpm with a pool of 100 db connections and it seems to efficiently return connections to the pool. As soon as I add that line back in, it very quickly uses up all the connections in the pool at high load. I cannot reproduce at lower loads yet.

I am occasionally seeing the following error as well:

play.api.UnexpectedException: Unexpected exception[SQLTransientConnectionException: db - Connection is not available, request timed out after 10000ms.]
at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:251)
at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:182)
at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:320)
at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:318)
at scala.concurrent.Future.$anonfun$recoverWith$1(Future.scala:412)
at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:37)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
Caused by: java.sql.SQLTransientConnectionException: db - Connection is not available, request timed out after 10000ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:548)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:145)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:83)
at slick.jdbc.hikaricp.HikariCPJdbcDataSource.createConnection(HikariCPJdbcDataSource.scala:14)
at slick.jdbc.JdbcBackend$BaseSession.(JdbcBackend.scala:438)
at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:46)
at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:37)
at slick.basic.BasicBackend$DatabaseDef.acquireSession(BasicBackend.scala:216)
at slick.basic.BasicBackend$DatabaseDef.acquireSession$(BasicBackend.scala:215)

Let me know if you have any ideas, and thanks!

How have you configured Slick? There are some situations that can result in deadlocks if the number of threads in Slick’s thread pools are set too low. See the discussion in https://github.com/slick/slick/issues/1614 for details.

It might also be worth ensuring that you use the latest version of Slick (3.2.3) as there have been some bug fixes that might be related.

I was just using the default configuration for slick. Those discussions seem quite relevant but after reading through them and trying several of the suggestions in there I am still running into this issue.

My configuration before was:

slick.dbs {
  default {
    profile="slick.jdbc.MySQLProfile$"
    db {
      driver="com.mysql.jdbc.Driver"
      url=${PLAY_DB_URL}
      user=${PLAY_DB_USER}
      password=${PLAY_DB_PASSWORD}
      leakDetectionThreshold=60000
    }
  }
}

I have since upgraded play-slick from 3.0.1 -> 3.0.3 which is the latest version that supports play 2.6.x (I am using 2.6.12)

I also upgraded slick from the version bundled in play-slick from 3.2.1 -> 3.2.3. Unfortunately, it seems like the relevant bug fixes were present in 3.2.1 already, but I gave it a shot regardless. This did provide a nice warning message which basically echoes what is said in the discussion:

[warn] slick.util.AsyncExecutor - Having maxConnection > maxThreads can result in deadlocks if transactions or database locks are used.

I configured slick as follows which stifled the warning message

slick.dbs {
  default {
    profile="slick.jdbc.MySQLProfile$"
    db {
      driver="com.mysql.jdbc.Driver"
      url=${PLAY_DB_URL}
      user=${PLAY_DB_USER}
      password=${PLAY_DB_PASSWORD}
      leakDetectionThreshold=60000
      numThreads=20
      maxConnections=20
    }
  }
}

However I am still encountering the error. In the discussion it seemed that if this fix only applied to Sequences and AndThen while I was using flatMap. I am unsure if the fix didn’t apply because it didn’t need fixing or it wasn’t fixed.

So as a test, I switched to using an AndThen query - I still encountered the issue unfortunately. However, this did eventually show a new exception I had not seen yet, although I’m not sure how relevant it is:

Exception in thread “db-10” java.lang.IllegalArgumentException: requirement failed: count cannot be decreased
at scala.Predef$.require(Predef.scala:277)
at slick.util.ManagedArrayBlockingQueue.$anonfun$decreaseInUseCount$1(ManagedArrayBlockingQueue.scala:54)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at slick.util.ManagedArrayBlockingQueue.locked(ManagedArrayBlockingQueue.scala:201)
at slick.util.ManagedArrayBlockingQueue.decreaseInUseCount(ManagedArrayBlockingQueue.scala:53)
at slick.util.AsyncExecutor$$anon$2$$anon$1.afterExecute(AsyncExecutor.scala:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Thanks for your responses, it is much appreciated!

1 Like

that’s the exact issue we currently have on production with similar setup( number of pool connections, slick threads). the problem is that we use ‘transactionally’ for each query because we need to pre-execute “SET ROLE…” to handle our permissions on db level. have no idea how to get rid of transactionally in this case…

In my case this was caused by NewRelic javaagent
(using play 2.8.7 slick 3.3.3 (numThreads = 5) hikari 4.0.3 mysql connector 8.0.22)
with newrelic:
image

without newrelic: