12 CPUs and 64 GB RAM each. We already changed from G1C to Shenandoah because of GC stop-the-world problems.
In our use case we constantly are spawning (empty recovery) and passivating about 20% of these actors (sharded entities). The cassandra write and read latency seems to be fine.
Just for testing purposes we changed the
AbstractPersistentActor to an
AbstractActor and implemented the recovery and persist ourselfs by loading and saving the state snapshot using the cassandra directly (blocking i/o *).
- I know this is evil magic. It was just for testing … We won’t do it again : )
The throughput doubled. Which is confusing me because in this case we are saving the entire state to cassandra every time instead of using
AbstractPersistentActor.persist to save the commands only (CQRS), which should be much more efficient. (We already checked the setting
AbstractPersistentActor implementation itself seems to consume about +600 bytes heap per actor instance (
AbstractActor = 400 bytes ).
Maybe the whole problem is caused by GC collections. Right now it seems like the AbstractPersistentActor implementation is heap expensive but unfortunately not very fast…
Thanks for your support