Graceful stop of service with docker stop

When I run docker-compose stop id, after about 10 seconds the docker log shows:
nightly_id_1 exited with code 137

According to the docker docs, docker-compose stop will send SIGTERM and wait 10 seconds, then will send SIGKILL. I’m guessing the service isn’t responding to SIGTERM. I’ve increased the timeout to 60 seconds with same result, so doesn’t seem like I’m simply waiting for lagom to stop.

My docker packaging uses sbt-native-packager and is very close to the chirper example, except my image is "openjdk:8-jdk-slim-stretch".

Any ideas on how to get the service to shutdown gracefully, and presumably faster?

It should shut down on SIGTERM with the default settings, though it’s possible that it could take more than ten seconds. Sixty seems like a lot. Are you seeing anything in the service logs after it receives the signal? There should be info-level logging as the shutdown process proceeds.

There were some known issues in older versions of Lagom where shutdown could deadlock. What version are you using?

Hi Tim, thanks for the help - I’m using 1.4.4

I logged into the container as root to send SIGTERM direclty

> docker-compose exec -u 0 user bash
root@9b62ded44b19:/opt/docker# kill -n 15 1

and nothing happened, so then I

root@9b62ded44b19:/opt/docker# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
daemon       1  0.0  0.0   4292   468 ?        Ss   19:57   0:00 /bin/sh -c bin/
daemon      14 12.6  2.1 9996128 343576 ?      Sl   19:57   0:45 /docker-java-ho
root      2322  0.0  0.0  19900  3520 pts/0    Ss   20:02   0:00 bash
root      2596  0.0  0.0  38384  3108 pts/0    R+   20:03   0:00 ps aux

and then
root@9b62ded44b19:/opt/docker# kill -n 15 14

It shutdown almost instantly and checking the logs showed a nice orderly shutdown.

I think docker stop sends to pid 1 which doesn’t work for these images. I’m not super familiar with docker yet so I’m not sure if this is an issue with sbt-native-packager or how it’s configured in build.sbt - or something else. My docker settings are closely ripped from the chirper kubernetes example and you can see them in the TagWriter issue I posted at TagWriter fails if cassandra isn’t started before lagom service.

I’m not at the stage of using kubernetes yet, but I don’t see why this would be any different for anyone using the chirper example as a guide to build docker images for any orchestration platform. Do you know if others are getting their containers shutdown gracefully or are they all being killed after timeout?

This sounds like the infamous docker issue where signals are not delivered. See:

Some background

(the following is take from https://www.youtube.com/watch?v=3OdrmLerM3M&t=2209 which is a talk in spanish using slides in english)

  • When sending a SIGTERM to a process, if the process didn’t register a handler for SIGTERM then the kernel will fallback to SIGKILL. When the PID of the process is 1, this fallback doesn’t exist.
  • By default, docker stop sends a SIGTERM and, after 10 seconds, sends a SIGKILL.
  • bash doesn’t forward signals to underlying processes
  • A process must take care of its child processes’ reaping

I think the containers produced are using a setup (not sure which) causing the SIGTERM generated by docker stop to not reach the Lagom process inside the container. Could you share the Dockerfile?

PS: just found this intersting link too: http://goinbigdata.com/docker-run-vs-cmd-vs-entrypoint/

Just found https://docs.docker.com/engine/reference/builder/#entrypoint too.

I found this file in target/docker/stage/opt/Dockerfile - it seems to be what sbt-native-packager generated and uses…

FROM openjdk:8-jdk-slim-stretch
WORKDIR /opt/docker
ADD --chown=daemon:daemon opt /opt
RUN ["mkdir", "-p", "/opt/docker/service-data"]
RUN ["chown", "-R", "daemon:daemon", "/opt/docker/service-data"]
VOLUME ["/opt/docker/service-data"]
USER daemon
ENTRYPOINT bin/id-impl  -Dconfig.file="$(eval "echo $APP_CONFIG_FILE")" -Dhttp.address="$(eval "echo $SERVICE_BIND_ADDRESS")" -Dhttp.port="$(eval "echo $SERVICE_BIND_PORT")" -Dakka.remote.netty.tcp.hostname="$(eval "echo $AKKA_REMOTING_HOST")" -Dakka.remote.netty.tcp.port="$(eval "echo $AKKA_REMOTING_PORT")" -Dakka.remote.netty.tcp.bind-hostname="$(eval "echo $AKKA_REMOTING_BIND_HOST")" -Dakka.remote.netty.tcp.bind-port="$(eval "echo $AKKA_REMOTING_BIND_PORT")"
CMD []

I’ll have a look at those links, but I’m too sure about how the ENTRYPOINT or the run script is currently generated.

This seems to be using the shell syntax which is known to not propagate signals. Instead of:

ENTRYPOINT bin/id-impl -Dconfig. ...

it should be using the syntax:

ENTRYPOINT ["bin/id-impl", "-Dconfig. ..."more", "arguments", "here" ]

Is there a repo and a set of instructions you can share to reproduce this? I’ve tried with chirper and I think chirper uses the correct syntax.

Sure from the TagWriter issue I have the that repro at https://github.com/jibbers42/lagom-write-tag-fail. It generates a similar Dockerfile:

FROM openjdk:8-jdk-slim-stretch
WORKDIR /opt/docker
ADD --chown=daemon:daemon opt /opt
USER daemon
ENTRYPOINT bin/hello-impl  -Dhttp.address="$(eval "echo $SERVICE_BIND_ADDRESS")" -Dhttp.port="$(eval "echo $SERVICE_BIND_PORT")" -Dakka.remote.netty.tcp.hostname="$(eval "echo $AKKA_REMOTING_HOST")" -Dakka.remote.netty.tcp.port="$(eval "echo $AKKA_REMOTING_PORT")" -Dakka.remote.netty.tcp.bind-hostname="$(eval "echo $AKKA_REMOTING_BIND_HOST")" -Dakka.remote.netty.tcp.bind-port="$(eval "echo $AKKA_REMOTING_BIND_PORT")"
CMD []

The docker config inspiration comes from https://github.com/lagom/lagom-java-chirper-example/blob/f422d7507b62ccabcd4e77c3f449269a7e664ef4/build.sbt#L70.

I see. That is wrong.

Also, I just realised you based your code in an obsolete repo of chirper. Few weeks ago we realised the chirper was repo was getting out of hand and decided to split it in few, smaller, more specific examples: https://github.com/lagom/lagom-java-sbt-chirper-example and https://github.com/lagom/lagom-java-maven-chirper-example

We also have been working on some improvements on the docker images (using JRE instead of JDK to reduce from 800Mb to 140Mb), etc… and, finally, we’ve reviewed the tooling around k8s and mesos DC/OS deployment. For all this reasons I suggest you had a look at build.sbt and KUBERNETES.md in https://github.com/lagom/lagom-java-sbt-chirper-example and upgrade.

If you can’t afford the change, consider fixing your current build.sbt to use exec syntax in your container by reviewing the code you copied from https://github.com/lagom/lagom-java-chirper-example/blob/f422d7507b62ccabcd4e77c3f449269a7e664ef4/build.sbt#L70 (which seems to be the source of the issue).

1 Like

Awesome, I’ll have a look at the new stuff - sounds like some great work!

I got to that code via https://www.lagomframework.com/documentation/1.4.x/scala/ProductionOverview.html -> first paragraph links to -> https://developer.lightbend.com/guides/lagom-kubernetes-k8s-deploy-microservices/ -> The Setup section has 2 links to -> https://github.com/lagom/lagom-java-chirper-example

Not sure if that means the link updates were missed or if the new stuff has yet to find it’s way to the docs, but thought I’d mention it.

Thanks again to you both.

Thanks, @jibbers42. We’re working on an overhaul of the production documentation to refer to the new stuff. Forum members are getting a sneak peak! :smile: