Prod issue - Lagom Producer stopped producing messages and the offsetstore table column - toTimestamp(timeuuidoffset) shows stale date

Hi,

We are using Lagom framework 1.6.2 in production. The micro-service produces to 3 kafka topics and uses TopicProducer.taggedStreamWithOffset as a producer from persistentEntities eventStream.

We are experiencing an issue where the messages are not produced for 1 topic. The other 2 topics from the same event are produced successfully. The timeuuidoffset from the offsetstore for the eventprocessorid where the messages are not producing is stale and is not producing new kafka messages.

  1. What determines the offsetstore table, timeuuidoffset value for partitions.

Please let us know how to proceed ?

Hi @sarajagopal,

it´s hard to say without more details, but it sounds like there’s a head-of-line blocking on that topic producer. Try to locate the events around the timeuuidoffset currently frozen. Maybe there’s a deserialization error trying to read the events from the journal or maybe the code on your topic producer is not considering all possible inputs.
I wonder if the issue may even be on the Kafka side. Try to see if the topic you are writing to is healthy and there’s no other setting causing a continued failure.

Cheers,

Hi [Ignasi Marimon-Clos]

Thanks for the response. The above error occurred again, we observed the "Persistence failure when replaying events for persistenceId XXXXX . Last known sequence number "

What is the reason for the above error? Would “Persistence failure when replaying events for persistenceId” make the offsetstore table record stale ?

2020-12-19T15:39:11.142Z LCS akkaAddress=akka://application@xx.xx.xx.xx:yyyy, sourceThread=application-akka.actor.default-dispatcher-10, akkaSource=akka://application@xx.xx.xx.xx:yyyy/system/sharding/WildfireEntity/78/xxxx, sourceActorSystem=application, akkaTimestamp=15:39:11.142UTC LCE [application-akka.actor.default-dispatcher-23] ERROR c.l.l.i.s.p.PersistentEntityActor.$anonfun$applyOrElse$1(76) - Persistence failure when replaying events for persistenceId XXXXX . Last known sequence number [0]akka.pattern.CircuitBreakerOpenException: Circuit Breaker is open; calls are failing fast–>