Akka distributed data for large data set

envee · August 17, 2018, 10:54am

I was planning to use the distributed data feature of Akka to store data for more than 15 million subscribers of a digital service provider. Each subscriber can contain purchased offerings, balances etc which would then be looked up in real time by actors for performing computations specific to a subscriber. These computations would result in updates to subscriber data with the hope that if such data was stored as CRDTs then the updates could be synchronised across actors/nodes without coordination.

But I then happened to read the limitation of akka distributed data that it is not recommended to use it for more than 100k top level objects. I guess My use case would need at least a number equal to the number of subscribers in the system. Is there a way around this ? Or are there any plans to increase this limit to allow large datasets ? Or is it just a recommenDation or a hard limit in the akka distributed module ?

If I understand correctly for my use case is 100k going to be the limit for the number of subscribers ?

Thanks!!!

patriknw · August 18, 2018, 7:28am

I think that is more than can be handled by Distributed Data. See this post about Scalability of Distributed Data and note that each of your data entry has a much larger size than was discussed there.

envee · August 18, 2018, 8:30am

Thanks Patrik.
Yeah would be an excellent fit if size was handled by this module. How I wish !!!
What do you suggest can be an alternative ? Just akka persistence to a distributed database ?

patriknw · August 19, 2018, 6:25am

Akka Persistence is based single writer principle so it requires coordination (via Cluster Sharding. If you are fine with that then that is a good choice used by many.

Lighbend has the commercial module Multi-DC Persistence which supports active-active across data centers.

Otherwise do-it-yourself with a distributed database.

Topic		Replies	Views
Heartbeat interval is growing too large Akka Distributed Data (CRDT)	4	4108	August 27, 2018
Does it fit to CRDT? Akka Distributed Data (CRDT)	3	1969	April 13, 2018
Akka Use case discussion Akka akka	0	340	August 1, 2020
Time-series data with Akka Actors Persistence / Event Sourcing	0	804	January 25, 2020
Akka Cluster Sharding Performance Issue with Many Active Entities (>10 MM) Akka Cluster	5	1943	October 21, 2018

Akka distributed data for large data set

Related Topics