Lengthy database calls in akka-peristence-r2dbc

cwardgar · August 10, 2023, 6:35am

I’m running an event-sourced system with ~300 million entities, and I’m persisting events using the R2DBC plugin. In the log, I’m getting frequent messages that indicate unexpectedly long DB operations, e.g.

insert [Variant|1_d91439d5261ba186ddc0ddf2e3351291_9502fb2b89cb53c3ef24aed51ca09994] - getConnection took [464194] µs
insert [Variant|1_cff35e4413aceb4de82fe113f755451c_b42d35106898db1d2ecd262649c69f23] - DB call completed [2023-08-07T22:02:15.866209Z] in [1213377] µs
select eventsByPersistenceId [Parent|1_35bd1a5323e49022a8cca07e8e21e1f2] - Selected [2] rows in [1310022] µs
select highest seqNr [Parent|1_b2b064e73a2b764b895b5718a93745d1] - Selected [1] rows in [780619] µs

The high latency doesn’t seem to be related to the DB server: p99 latency there is below 1ms.

So, this has to be client-side, right? What exactly is happening? Is it something like the (completed) Futures containing the results from the DB are not being scheduled to run on the CPU soon enough?

I’m running the Starvation Detector on all of the relevant dispatchers, and I do get the occasional warning, e.g.

Exceedingly long scheduling time on ExecutionContext Dispatcher[command-hub.dispatchers.persistence]. Starvation detector task was only executed after 499 ms which is longer than the warning threshold of 100 milliseconds. This might be a sign of thread, CPU, or dispatcher starvation. This can be caused by blocking calls, CPU overload, GC, or overlong work units.

Among those 4 causes, this seems like CPU overload, because I’ve been pretty careful to avoid the other 3. In situations like these, is the solution to either a) throw more compute resources at the problem, or b) decrease parallelism to a level that the current compute resources can handle?

I guess I’m using this post as a sounding board for my own conclusion. But also, maybe I’m missing something or am flat-out wrong. Any insight is appreciated.

johanandren · August 15, 2023, 6:49am

Sounds like it could be starvation, look at CPU (and maybe network) metrics to verify if you are saturating the machine.

Other things that could be worth looking into would be connection pool size and possibly also dispatcher sizing. Also, even if you think you have been careful to not block, give that an extra look through, third party libraries can block in surprising places.

cwardgar · August 16, 2023, 4:22pm

Hello Johan. CPU usage is pegged at 90+%, so I’m definitely saturating the machine. I guess I was just wanting confirmation that those akka-persistence warnings can manifest as a result of a saturated CPU.

johanandren · August 17, 2023, 6:52am

Sounds likely that is the cause.

Topic		Replies	Views
R2DBC offset store cleaning up Persistence / Event Sourcing	2	181	January 23, 2024
Akka persistence R2DBC pubsub event dropped Persistence / Event Sourcing	2	141	February 8, 2024
Akka Persistence Cassandra - High RU/s on tag_views table Persistence / Event Sourcing	1	754	February 16, 2022
Akka Projection Slick High Eventual Consistency Delay Akka scala	2	539	June 30, 2021
Akka persistence events projection with JDBC plugin and without tagging Persistence / Event Sourcing	6	308	October 10, 2023

Lengthy database calls in akka-peristence-r2dbc

Related Topics