Java Play 2.7 processes stuck at 100% CPU on Linux server

Play deployed on a Linux server goes into an unexplained state of 100% CPU usage several times a day, for 15 minutes or more. The system is not serving any traffic (site is completely idle). Play was deployed using sbt dist. The application logs show no errors.

Play Version 2.7

API (Java)

Operating System (Ubuntu 15.10 / MacOS 10.10 / Windows 10)

Linux api1 4.18.0-15-generic #16-Ubuntu SMP Thu Feb 7 10:56:39 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

JDK

openjdk version “11” 2018-09-25
OpenJDK Runtime Environment 18.9 (build 11+28)
OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)

Behavior

Running htop yields this line:

CPU[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%]   Tasks: 39, 47 thr; 1 running
Mem[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||       419M/985M]   Load average: 1.03 1.02 0.70 
Swp[                                                                                  0K/0K]   Uptime: 6 days, 03:45:00

PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
24110 produser      20   0 2318M  288M 22312 S 99.3 29.2 15:38.37 java -Duser.dir=/home/produser/server-1.0-SNAPSHOT
25106 produser      20   0 2318M  288M 22312 R 97.4 29.2 14:26.12 java -Duser.dir=/home/produser/server-1.0-SNAPSHOT
24182 produser      20   0 2318M  288M 22312 S  2.6 29.2  0:21.78 java -Duser.dir=/home/produser/server-1.0-SNAPSHOT
24195 produser      20   0 2318M  288M 22312 S  0.0 29.2  0:07.89 java -Duser.dir=/home/produser/server-1.0-SNAPSHOT
24190 produser      20   0 2318M  288M 22312 S  0.0 29.2  0:01.56 java -Duser.dir=/home/produser/server-1.0-SNAPSHOT
24186 produser      20   0 2318M  288M 22312 S  0.0 29.2  0:16.72 java -Duser.dir=/home/produser/server-1.0-SNAPSHOT
24188 produser      20   0 2318M  288M 22312 S  0.0 29.2  0:01.40 java -Duser.dir=/home/produser/server-1.0-SNAPSHOT

As you can see, the two top Play processes are both running at nearly 100% CPU.

If I strace the first process, I get:

sudo strace -p 24110
strace: Process 24110 attached
futex(0x7fee273af9d0, FUTEX_WAIT, 24181, NULL...

and it hangs there.

If I strace the second process, I get this in a never-ending loop:

strace: Process 25106 attached
strace: [ Process PID=25106 runs in x32 mode. ]
strace: [ Process PID=25106 runs in 64 bit mode. ]
futex(0x7fee2009717c, FUTEX_WAKE_PRIVATE, 1) = 1
gettid()                                = 25106
futex(0x7fee20097178, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7fee2009717c, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7fee20097178, FUTEX_WAKE_PRIVATE, 1) = 1
gettid()                                = 25106
futex(0x7fee2009717c, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7fee20097178, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7fee2009717c, FUTEX_WAKE_PRIVATE, 1) = 1
(loops forever)

If I run jstack on the two process respectively, I get:

Full thread dump OpenJDK 64-Bit Server VM (11+28 mixed mode):

Threads class SMR info:
_java_thread_list=0x00007f312c011b60, length=22, elements={
0x00007f315809e800, 0x00007f31580a2800, 0x00007f31580b3800, 0x00007f31580b9800,
0x00007f31580bc000, 0x00007f31580be000, 0x00007f3158138000, 0x00007f3158148000,
0x00007f31598dd000, 0x00007f31588c1800, 0x00007f315b15c000, 0x00007f3158017000,
0x00007f312c01f800, 0x00007f31246f1000, 0x00007f3128445000, 0x00007f313c058800,
0x00007f3134086800, 0x00007f313c10d000, 0x00007f312c064000, 0x00007f3138093000,
0x00007f311c012800, 0x00007f312c009800
}

"Reference Handler" #2 daemon prio=10 os_prio=0 cpu=10.37ms elapsed=1338.27s tid=0x00007f315809e800 nid=0x6277 waiting on condition  [0x00007f315c2a3000]
   java.lang.Thread.State: RUNNABLE
	at java.lang.ref.Reference.waitForReferencePendingList(java.base@11/Native Method)
	at java.lang.ref.Reference.processPendingReferences(java.base@11/Reference.java:241)
	at java.lang.ref.Reference$ReferenceHandler.run(java.base@11/Reference.java:213)

"Finalizer" #3 daemon prio=8 os_prio=0 cpu=8.79ms elapsed=1338.27s tid=0x00007f31580a2800 nid=0x6278 in Object.wait()  [0x00007f315c1a2000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(java.base@11/Native Method)
	- waiting on <no object reference available>
	at java.lang.ref.ReferenceQueue.remove(java.base@11/ReferenceQueue.java:155)
	- waiting to re-lock in wait() <0x00000000f5bf62b0> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(java.base@11/ReferenceQueue.java:176)
	at java.lang.ref.Finalizer$FinalizerThread.run(java.base@11/Finalizer.java:170)

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 cpu=0.26ms elapsed=1338.26s tid=0x00007f31580b3800 nid=0x6279 runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 cpu=29095.86ms elapsed=1338.26s tid=0x00007f31580b9800 nid=0x627a waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   No compile task

"C1 CompilerThread0" #6 daemon prio=9 os_prio=0 cpu=6938.96ms elapsed=1338.26s tid=0x00007f31580bc000 nid=0x627b waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE
   No compile task

"Sweeper thread" #7 daemon prio=9 os_prio=0 cpu=3791.80ms elapsed=1338.26s tid=0x00007f31580be000 nid=0x627c runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Service Thread" #8 daemon prio=9 os_prio=0 cpu=0.07ms elapsed=1338.17s tid=0x00007f3158138000 nid=0x627d runnable  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Common-Cleaner" #9 daemon prio=8 os_prio=0 cpu=2.96ms elapsed=1338.16s tid=0x00007f3158148000 nid=0x627f in Object.wait()  [0x00007f31339f7000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(java.base@11/Native Method)
	- waiting on <no object reference available>
	at java.lang.ref.ReferenceQueue.remove(java.base@11/ReferenceQueue.java:155)
	- waiting to re-lock in wait() <0x00000000f5bf6aa0> (a java.lang.ref.ReferenceQueue$Lock)
	at jdk.internal.ref.CleanerImpl.run(java.base@11/CleanerImpl.java:148)
	at java.lang.Thread.run(java.base@11/Thread.java:834)
	at jdk.internal.misc.InnocuousThread.run(java.base@11/InnocuousThread.java:134)

"application-scheduler-1" #11 prio=5 os_prio=0 cpu=3116.47ms elapsed=1332.65s tid=0x00007f31598dd000 nid=0x6283 waiting on condition  [0x00007f3131ac3000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(java.base@11/Native Method)
	at akka.actor.LightArrayRevolverScheduler.waitNanos(LightArrayRevolverScheduler.scala:85)
	at akka.actor.LightArrayRevolverScheduler$$anon$3.nextTick(LightArrayRevolverScheduler.scala:265)
	at akka.actor.LightArrayRevolverScheduler$$anon$3.run(LightArrayRevolverScheduler.scala:235)
	at java.lang.Thread.run(java.base@11/Thread.java:834)

"com.google.common.base.internal.Finalizer" #17 daemon prio=5 os_prio=0 cpu=0.14ms elapsed=1331.94s tid=0x00007f31588c1800 nid=0x6289 in Object.wait()  [0x00007f31312bd000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(java.base@11/Native Method)
	- waiting on <0x00000000f6e18390> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(java.base@11/ReferenceQueue.java:155)
	- waiting to re-lock in wait() <0x00000000f6e18390> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(java.base@11/ReferenceQueue.java:176)
	at com.google.common.base.internal.Finalizer.run(Finalizer.java:145)
	at java.lang.Thread.run(java.base@11/Thread.java:834)

"application-akka.io.pinned-dispatcher-7" #18 prio=5 os_prio=0 cpu=349.57ms elapsed=1328.51s tid=0x00007f315b15c000 nid=0x628a runnable  [0x00007f3130173000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPoll.wait(java.base@11/Native Method)
	at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11/EPollSelectorImpl.java:120)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11/SelectorImpl.java:124)
	- locked <0x00000000f72d4310> (a sun.nio.ch.Util$2)
	- locked <0x00000000f72d42b8> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(java.base@11/SelectorImpl.java:136)
	at akka.io.SelectionHandler$ChannelRegistryImpl$$anon$2.tryRun(SelectionHandler.scala:131)
	at akka.io.SelectionHandler$ChannelRegistryImpl$Task.run(SelectionHandler.scala:269)
	at akka.io.SelectionHandler$ChannelRegistryImpl$$anon$2.run(SelectionHandler.scala:164)
	at akka.util.SerializedSuspendableExecutionContext.run$1(SerializedSuspendableExecutionContext.scala:68)
	at akka.util.SerializedSuspendableExecutionContext.run(SerializedSuspendableExecutionContext.scala:72)
	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11/ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11/Thread.java:834)

"DestroyJavaVM" #20 prio=5 os_prio=0 cpu=4431.96ms elapsed=1328.40s tid=0x00007f3158017000 nid=0x6275 waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"HttpClient-2-Worker-0" #329 daemon prio=5 os_prio=0 cpu=680890.13ms elapsed=818.42s tid=0x00007f312c01f800 nid=0x63e1 runnable  [0x00007f31220f6000]
   java.lang.Thread.State: RUNNABLE
	at java.util.Collections$SynchronizedCollection.iterator(java.base@11/Collections.java:2034)
	at jdk.internal.net.http.common.SSLFlowDelegate.cleanList(java.net.http@11/SSLFlowDelegate.java:928)
	- locked <0x00000000f79a6070> (a java.util.Collections$SynchronizedList)
	at jdk.internal.net.http.common.SSLFlowDelegate$Writer.processData(java.net.http@11/SSLFlowDelegate.java:768)
	at jdk.internal.net.http.common.SSLFlowDelegate$Writer$WriterDownstreamPusher.run(java.net.http@11/SSLFlowDelegate.java:645)
	at jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(java.net.http@11/SequentialScheduler.java:147)
	at jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(java.net.http@11/SequentialScheduler.java:198)
	at jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(java.net.http@11/SequentialScheduler.java:271)
	at jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(java.net.http@11/SequentialScheduler.java:224)
	at jdk.internal.net.http.common.SSLFlowDelegate$Writer.triggerWrite(java.net.http@11/SSLFlowDelegate.java:722)
	at jdk.internal.net.http.common.SSLFlowDelegate.doHandshake(java.net.http@11/SSLFlowDelegate.java:1024)
	at jdk.internal.net.http.common.SSLFlowDelegate.doClosure(java.net.http@11/SSLFlowDelegate.java:1094)
	at jdk.internal.net.http.common.SSLFlowDelegate$Reader.unwrapBuffer(java.net.http@11/SSLFlowDelegate.java:500)
	at jdk.internal.net.http.common.SSLFlowDelegate$Reader.processData(java.net.http@11/SSLFlowDelegate.java:389)
	- locked <0x00000000f79a4bc8> (a java.lang.Object)
	at jdk.internal.net.http.common.SSLFlowDelegate$Reader$ReaderDownstreamPusher.run(java.net.http@11/SSLFlowDelegate.java:263)
	at jdk.internal.net.http.common.SequentialScheduler$SynchronizedRestartableTask.run(java.net.http@11/SequentialScheduler.java:175)
	- locked <0x00000000f79a5fe0> (a java.lang.Object)
	at jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(java.net.http@11/SequentialScheduler.java:147)
	at jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(java.net.http@11/SequentialScheduler.java:198)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11/ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11/Thread.java:834)

"application-akka.actor.default-dispatcher-261" #695 prio=5 os_prio=0 cpu=56.99ms elapsed=68.43s tid=0x00007f31246f1000 nid=0x66d7 waiting on condition  [0x00007f31233fc000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11/Native Method)
	- parking to wait for  <0x00000000f6e17768> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
	at sun.misc.Unsafe.park(jdk.unsupported@11/Unsafe.java:1067)
	at akka.dispatch.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

"application-akka.actor.default-dispatcher-265" #702 prio=5 os_prio=0 cpu=9.63ms elapsed=53.23s tid=0x00007f3128445000 nid=0x66df waiting on condition  [0x00007f31315c0000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11/Native Method)
	- parking to wait for  <0x00000000f6e17768> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
	at sun.misc.Unsafe.park(jdk.unsupported@11/Unsafe.java:1067)
	at akka.dispatch.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

"application-akka.actor.default-dispatcher-268" #706 prio=5 os_prio=0 cpu=9.64ms elapsed=43.22s tid=0x00007f313c058800 nid=0x6b6d waiting on condition  [0x00007f31316c1000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11/Native Method)
	- parking to wait for  <0x00000000f6e17768> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
	at sun.misc.Unsafe.park(jdk.unsupported@11/Unsafe.java:1067)
	at akka.dispatch.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

"application-akka.actor.default-dispatcher-269" #707 prio=5 os_prio=0 cpu=8.60ms elapsed=35.38s tid=0x00007f3134086800 nid=0x6fb5 waiting on condition  [0x00007f31317c2000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11/Native Method)
	- parking to wait for  <0x00000000f6e17768> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
	at sun.misc.Unsafe.park(jdk.unsupported@11/Unsafe.java:1067)
	at akka.dispatch.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

"application-akka.actor.default-dispatcher-273" #712 prio=5 os_prio=0 cpu=27.15ms elapsed=23.19s tid=0x00007f313c10d000 nid=0x6fba waiting on condition  [0x00007f31235fe000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11/Native Method)
	- parking to wait for  <0x00000000f6e17768> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
	at sun.misc.Unsafe.park(jdk.unsupported@11/Unsafe.java:1067)
	at akka.dispatch.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

"application-akka.actor.default-dispatcher-276" #716 prio=5 os_prio=0 cpu=5.52ms elapsed=13.19s tid=0x00007f312c064000 nid=0x6fd7 waiting on condition  [0x00007f31313be000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11/Native Method)
	- parking to wait for  <0x00000000f6e17768> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
	at sun.misc.Unsafe.park(jdk.unsupported@11/Unsafe.java:1067)
	at akka.dispatch.forkjoin.ForkJoinPool.idleAwaitWork(ForkJoinPool.java:2135)
	at akka.dispatch.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2067)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

"application-akka.actor.default-dispatcher-277" #717 prio=5 os_prio=0 cpu=1.41ms elapsed=5.35s tid=0x00007f3138093000 nid=0x6fe4 waiting on condition  [0x00007f31234fd000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11/Native Method)
	- parking to wait for  <0x00000000f6e17768> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
	at sun.misc.Unsafe.park(jdk.unsupported@11/Unsafe.java:1067)
	at akka.dispatch.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

"application-akka.actor.default-dispatcher-279" #719 prio=5 os_prio=0 cpu=1.07ms elapsed=3.18s tid=0x00007f311c012800 nid=0x6fe6 waiting on condition  [0x00007f31314bf000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11/Native Method)
	- parking to wait for  <0x00000000f6e17768> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
	at sun.misc.Unsafe.park(jdk.unsupported@11/Unsafe.java:1067)
	at akka.dispatch.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

"Attach Listener" #721 daemon prio=9 os_prio=0 cpu=0.46ms elapsed=0.11s tid=0x00007f312c009800 nid=0x6ff5 waiting on condition  [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"VM Thread" os_prio=0 cpu=25978.89ms elapsed=1338.28s tid=0x00007f3158096000 nid=0x6276 runnable  

"VM Periodic Task Thread" os_prio=0 cpu=585.62ms elapsed=1338.16s tid=0x00007f315813a000 nid=0x627e waiting on condition  

JNI global refs: 16, weak refs: 0

and

25569: Unable to open socket file /proc/25569/root/tmp/.java_pid25569: target process 25569 doesn't respond within 10500ms or HotSpot VM not loaded

1 Like

Hey @nextdtinc,

Your application may not be receiving any traffic, but it does not mean it is doing nothing. Based on the thread-dump, there are some threads in RUNNABLE state. HttpClient-2-Worker-0 and application-akka.io.pinned-dispatcher-7 are good suspects here. C1 CompilerThread0 and C2 CompilerThread0 threads are also taking a lot of CPU time, so you may want to investigate those too.

Best.

1 Like

I have what seems to be the same issue.

I am using Ubuntu 18.04.02 LTS, and am running the Play app on OpenJDK 11.0.2:

$ java -version
java version "11.0.2" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.2+7-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.2+7-LTS, mixed mode)

I am terminating SSL on this instance, so I think the issue might be related to SSL.

I am using a custom SSLEngineProvider (with a Let’s Encrypt SSL certificate):

package appinit

import java.nio.file._
import java.security.KeyStore

import javax.net.ssl._
import play.api.Application
import play.core.ApplicationProvider
import play.server.api._
import utils.{Logging, ProfilingUtils}

import scala.collection.immutable.IndexedSeq
import scala.util.{Failure, Success}
import scala.util.control.NonFatal

/**
 * Custom application SSL engine provider.
 * Allows overriding the default SSL engine creation.
 */
class AppSSLEngineProvider(appProvider: ApplicationProvider) extends SSLEngineProvider with Logging {

  private lazy val app: Application = {
    appProvider.get match {
      case Failure(e) =>
        throw logError("Unable to get app from appProvider!", e)

      case Success(application) =>
        application
    }
  }

  private lazy val certificateDirectory: String = app.configuration.get[String]("server.security.certificateDirectory")

  private val priorityCipherSuites = IndexedSeq(
    "TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA",
    "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA",
    "TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA"
  )

  override def createSSLEngine: SSLEngine = {
    val startTime = System.currentTimeMillis()
    logDebug("createSSLEngine: Creating SSLEngine ...")
    try {

      val context = createSSLContext(appProvider)
      val engine = context.createSSLEngine

      // set custom cipher suites order
      val cipherSuites = engine.getEnabledCipherSuites.toIndexedSeq

      val orderedCipherSuites = {
        priorityCipherSuites.filter(cipherSuites.contains) ++
          cipherSuites.filterNot(priorityCipherSuites.contains)
      }
      engine.setEnabledCipherSuites(orderedCipherSuites.toArray)

      val params = engine.getSSLParameters

      // tells the server to ignore client's cipher suite preference.
      // http://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html#cipher_suite_preference
      params.setUseCipherSuitesOrder(true)

      engine.setSSLParameters(params)

      logInfo(s"createSSLEngine: SSLEngine successfully created - have ${orderedCipherSuites.length} CipherSuites. ${ProfilingUtils.processingTime(startTime)}")

      engine

    } catch {
      case NonFatal(e) =>
        throw logError("createSSLEngine: Exception while creating SSLEngine!", e)
    }
  }

  private def createSSLContext(applicationProvider: ApplicationProvider): SSLContext = {
    val keyManagers = readKeyManagers

    // Configure the SSL context to use TLS
    val sslContext = SSLContext.getInstance("TLS")
    sslContext.init(keyManagers, Array.empty, null)
    sslContext
  }

  private def readPassword(): Array[Char] = {
    app.configuration.get[String]("server.security.certificatePassword").toCharArray
  }

  private def readKeyInputStream(): java.io.InputStream = {
    val keyPath = FileSystems.getDefault.getPath(certificateDirectory, "keyStore.jks")
    Files.newInputStream(keyPath)
  }

  private lazy val readKeyManagers: Array[KeyManager] = {
    logInfo(s"createSSLEngine: reading key managers ...")
    val password = readPassword()
    val keyInputStream = readKeyInputStream()
    try {
      val keyStore = KeyStore.getInstance(KeyStore.getDefaultType)
      keyStore.load(keyInputStream, password)
      val kmf = KeyManagerFactory.getInstance(KeyManagerFactory.getDefaultAlgorithm)
      kmf.init(keyStore, password)

      val keyManagers = kmf.getKeyManagers

      logInfo(s"createSSLEngine: reading key managers ... done.")
      keyManagers

    } finally {
      keyInputStream.close()
    }
  }

}

Although the server only has 700MB of RAM, memory usage is not the issue since profiling shows the tenured gen size of 33MB and approximately up to 85MB of eden space usage.

In the profiler, I can see two threads using 100% of CPU:

I’m not exactly sure what’s going on, as the highlighed range represents 3 min and 13 sec of real time, with the CPU usage at almost 100% the whole time, but the profiler indicating only about 29 sec of CPU time is actually used within the highlighted range.

During the highlighted range, 0 HTTP requests were served by the server.

When I make a request to the server using a web browser, I get the custom SSL provider log:

2019-03-25 23:27:56.465  INFO -   a.AppSSLEngineProvider - createSSLEngine: SSLEngine successfully created - have 44 CipherSuites. {   1 ms}

… but the HTTP request seems to be stuck and I get no response (and there are no other logs that should be coming from the application controller, applicable filters, service etc.).

When I restart the server, it works fine again.

Application startup parameters (systemd):

HTTP_PORT=80
HTTPS_PORT=443
MEMORY_MB=700
(other parameters omitted)

ExecStart=/opt/sandro/sf/current/bin/server \
        -J-agentpath:/opt/sandro/profiler/libyjpagent.so=delay=10000,dir=/opt/sandro/profiler/,logdir=/opt/sandro/profiler/logs/,sessionname=sf \
        -J-server -J-Xms${MEMORY_MB}m -J-Xmx${MEMORY_MB}m \
        -Dconfig.file=${CONFIG_FILE} -Dlogger.file=${LOGGER_FILE} \
        -Duser.timezone=${USER_TIMEZONE} -Duser.dir=${USER_DIR} -Dpidfile.path=/dev/null \
        -Dhttp.port=${HTTP_PORT} -Dhttps.port=${HTTPS_PORT} \
        -Djdk.tls.ephemeralDHKeySize=2048 -Djdk.tls.rejectClientInitiatedRenegotiation=true

Thanks for any help.

I’m not using SSL in my case, so I don’t think it’s related to SSL. In fact, I’m not running anything else that is active on that machine except for the Play process. So I’m quite sure it’s something related to Play or Java.

I am encountering this bug on both OpenJDK 11.0.2 as well as Oracle Java 11.0.2 on Ubuntu 18.04.2.

@SandroGrzicic when I Google for “C1 CompilerThread0” (which I see references to when running jstack PID) I found this: https://community.atlassian.com/t5/Jira-questions/c1-c2-compiler-thread-eats-a-lot-of-cpu/qaq-p/51641

That thread mentions this: https://confluence.atlassian.com/jirakb/jira-crashes-or-is-slow-due-to-codecache-is-full-compiler-has-been-disabled-314447083.html

I have installed Java 12 on my server and so far there have been no issues. Uptime 2 days so far with no reboots. I suggest you try upgrading your server to JDK 12 (e.g. OpenJDK 12) and see if the issue persists.

[edit as of 2019-09-03] I forgot that I had my hourly Play process restart cronjob still enabled, so it might not be a real solution - I just disabled the cronjob and will update this post if I see any further issues.

[edit as of 2019-09-04] The process has been running with no issues so far.

Bad news: This problem PERSISTS on all of these configurations:
Ubuntu 18.10 with OpenJDK
Ubuntu 18.04 with OracleJDK
Ubuntu 18.10 with OracleJDK

So this is very bad.

Have you found out what was the issue? Was the upgrade do Java 12 enaught?

Sorry, I haven’t. I just upgraded and the problem seemed to go away.