When working with Akka Persistence, how would I go about asserting the uniqueness of incoming form data ? The uniqueness in this case is in terms of the customer’s email, and address and would require serching all other customers for matches.
I am using Akka pesistence with the eventstore plugin.
I’d suggest searching this forum for “uniqueness” because it’s a complicated issue and it has been discussed multiple times here. Unique constraint validation for example. And here: How to handle unique constraint in Entity (Aggregate Root)? - #3 by codingkapoor
In short, it’s a tricky problem to enforce arbitrary uniqueness in a distributed system. Uniqueness, pretty much by definition, requires you to know that something doesn’t exist not only in your local node, but also in every single other remote node. Imagine if you were running a cluster of 400 Akka nodes and at the exact same microsecond every single node received a message with a duplicate email? How do coordinate all 400 nodes at the same time to determine which one wins?
In other words, doing an arbitrary uniqueness constraint in a distributed system has inherent scalability issues because it has to be strongly consistent decision. It’s a classic difficulty with distributed systems, not just Akka.
Let me summarize the options that I’m aware of.
Don’t have arbitrary uniqueness constraints. Obviously this isn’t a solution to the actual problem, but I’ve found that in many cases uniqueness constraints exist just because they are easy in centralized systems. If you tell a user that enforcing email address uniqueness will slow the system down and/or cost more development time then you may find that they don’t really care about duplicate email addresses. (And I think this is a great example, having two different users with the same email address usually isn’t the problem people think it is.)
Rethink your design. I suppose this is really the same as #1. But domain driven design says we should implement our aggregate roots so that they can make independent decisions. By designing so that we have to rely on the uniqueness of something outside our actor’s state, we are violating that principle. Having these kinds of uniqueness issues often highlight that we are modeling the wrong entities as actors. Maybe we should have “organization” as the entity, for example, rather than “user”. If “example.com” is the persistent entity, then enforcing uniqueness of emails under that domain name is then trivial.
If you only have one arbitrary uniqueness constraint make it the hashing function of your sharding. In this example, if you really must have each user have a unique email address, then have each persistent entity map to one email. (The big drawback of this, however, is that this can then be your only uniqueness constraint.) This solves the problem by delegating the responsibility for uniqueness to the actual sharding mechanism and each shard only has to maintain unique within each shard because of the hashing.
Create a second set of persistent entities that exists entirely to maintain the uniqueness of the value. e.g. have an “email” persistent entity. Essentially an actor whose whole purpose is to respond with which user entity is associated with a given email (and to accept changes to that association). So the user entity as part of its creation sends a message to the email entity attempting to request the email. This is probably what you need to do if you can’t take either of the simplifying assumptions above. It works because it is essentially sharding the responsibility for maintaining the uniqueness. But that means you are essentially going to have to create a saga pattern anytime you want to assign/change the email address: it’s going to require three separate calls and the separate “transactions”: provisional create the user, assign the email, and finalize the user. And it gets progressively slower and more complicated as you add additional constraints: like addresses.
Somewhat similar to the above except introduce some other centralized service rather than a second set of distributed actors. I’ve seen people use a singleton actor. I’ve seen people use a centralized database. I’m not super fond of this because it has most of the disadvantages of option #4 (e.g. you are building a saga pattern) but now you also have the scalability limitations of a centralized service.
Use eventual consistency to enforce these uniqueness rules. For example, imagine that we have a projection for emails. This is something that might exist naturally anyway, if we want to be able to look up users via email. While doing that projection the query system can be used to “enforce” uniqueness, triggering some kind of compensating action. e.g. if I try to register a duplicate email the actor is allow to use the duplicate email, but when the duplicate is detected in the projection a new message is sent to the persistent actor invalidating the duplicate. This is pretty similar to #4 in that you are building a saga pattern. It’s going to be more scalable since it’s not trying to enforce the uniqueness semi-synchronously. But it’s going to allow some temporary violations to consistency. And it’s not very “pure” in terms of what projections should be doing.
Perhaps I’ll edit this later with more thoughts, but I have to jump into a meeting. If nothing else, I’ll try to fix the formatting issues.