Persistence without Clustering

Is it possible to use Akka Persistence without a Cluster? Our desired architecture is to have a system where nodes receive Commands via Kafka (thus accomplishing sharding), but the only way of finding Actors that I see in the docs is by using ClusterSharding. There would be thousands of Actor/Behavior instances running as FSMs with unique IDs (so any given Command can only be processed by a particular Actor/Behavior). The general design would be:

KafkaConsumer (partition A) has a Guardian Actor ref that can route Commands to the relevant Persisted EventSourcedBehavior

If the Actor has been passivated, or if there is a redeploy, the Persisted EventSourcedBehavior is rehydrated to its latest state.

Possible solutions I’m considering are:

  1. Guardian actor maintains a hashmap of ID/EventSourcedBehavior ActorRef
  2. Using Cluster but each node is its own cluster (single node clusters)

I am unclear whether Option 1 would enable recovery correctly

Option 2 seems like it would ensure correct recovery, but seems like quite a hack and not even sure it would work. Also seems like bringing in a lot of dependencies that shouldn’t be needed.

Any guidance would be much appreciated.

Also, just to be clear, we are trying to avoid using Cluster Sharding as it seems to add a level of complexity that should not be needed and almost certainly won’t be approved.

Thanks

Have you looked at the Alpakka Kafka Connector? This would let you consume messages for a partition with Akka Streams, and then from there on you could send messages to an actor (see e.g. the ActorFlow).

Hi @sfosborn,

it is possible to use persistence without clustering though it’s rare since it can lead to data corruption where two incarnations of a single persistent entity produce diverging (or conflicting) events.

In your use case, where the input is a Kafka Topic, it seems safe since messages for a given instance always arrive to the same node. The problem comes when kafka reassigns partitions to a different node. Let’s say we were processing entity-1 in node-A and, after a partition reassignment, we are now processing the partition where entity-1 is on node-B. The process in node-B will create a new incarnation and rehydrate the actor with all events for entity-1. So far this looks good except there are now two instances of entity-1 in memory. When kafka rebalances the partitions again, if entity-1 lands back on node-A there will be no re-hydration and node-A will not see the latest events produced in node-B.

I think the missing bit to solve the problem I described above is to stop all persistent actors any time a partition rebalance happens. IIRC, there’s a API’s to subscribe callback for certain Kafka Client events. Note that this will also be helpful to have a more efficient use of memory, otherwise all nodes would end up with all entities in their memory (note that cluster sharding provide passivation features that handle that for you).

Summing up, in case of a partition rebalance, all surviving nodes must stop any persistent actor they have and rehydrate. This is potentially a costly operation but, again, it may be fine on your use case.

So, back to your question: you can run persistence without clustering.

Cheers,.

@ignasi35 Thank you, that was super helpful. Part of the question still open is addressability of the actors. Short of maintaining an in-memory map of actors, is there any way to discover them without using the ClusterSharding feature?

@manuelbernhardt Yeah, the Alpakka Kafka Connector looks great. The docs still seem to assume an Akka Cluster, so the rebalancing issue still seems to be a thing. And addressability is still unclear to me.

And addressability is still unclear to me.

Addressing is just your responsibility if you don’t use something out of the box like sharding. For example, I built a small project with persistence and no sharding. I had a parent actor that spawned all of the persistent children and acted as a router (keeping all of its children in a map). It was effectively modeled after the way a shardregion would work, except that it was all local to one JVM.

So, yes, in response to your other comment, there’s nothing wrong with keeping an in-memory map. Or, if you name them strictly, you could also look them up by name from the parent.