Akka distributed data for large data set


#1

I was planning to use the distributed data feature of Akka to store data for more than 15 million subscribers of a digital service provider. Each subscriber can contain purchased offerings, balances etc which would then be looked up in real time by actors for performing computations specific to a subscriber. These computations would result in updates to subscriber data with the hope that if such data was stored as CRDTs then the updates could be synchronised across actors/nodes without coordination.

But I then happened to read the limitation of akka distributed data that it is not recommended to use it for more than 100k top level objects. I guess My use case would need at least a number equal to the number of subscribers in the system. Is there a way around this ? Or are there any plans to increase this limit to allow large datasets ? Or is it just a recommenDation or a hard limit in the akka distributed module ?

If I understand correctly for my use case is 100k going to be the limit for the number of subscribers ?

Thanks!!!


(Patrik Nordwall) #2

I think that is more than can be handled by Distributed Data. See this post about Scalability of Distributed Data and note that each of your data entry has a much larger size than was discussed there.


#3

Thanks Patrik.
Yeah would be an excellent fit if size was handled by this module. How I wish !!!
What do you suggest can be an alternative ? Just akka persistence to a distributed database ?


(Patrik Nordwall) #4

Akka Persistence is based single writer principle so it requires coordination (via Cluster Sharding. If you are fine with that then that is a good choice used by many.

Lighbend has the commercial module Multi-DC Persistence which supports active-active across data centers.

Otherwise do-it-yourself with a distributed database.