Building actor system for comparing large immutable objects concurrently

I would like to use the scatter\gather pattern to compare large objects as part of a system that uses neural networks.
I have an actor that represents a question.
I have a bunch of actors that represent the (possible) answer.
I would like to find the answer that matches the question.
The main problem is that the question data and the answer data are large (several GB) and I don’t want to clone them.
Any ideas on the best way to implement this?
If I was not using actors I would have the answers in an immutable data structure and share it across worker threads to perform a compare operation.
All the answers are immutable and immutable in nature.

1 Like

Akka doesn’t usually clone messages. If you are sending messages within a local ActorSystem, it will share references, so you should be fine.

Large messages can be a problem when using Akka Cluster and sending messages between nodes, as it will require serializing the message at the sender and deserializing at the receiver. In that case, you are better off designing a way to colocate the question and answers on the same node.

1 Like

Thanks Tim for the reply!

Here is what I am thinking of doing. Create an actor for an answer. The answer actors will have some kind of NLP searchable vector that can be used for searching. When an answer actor receives a message to perform a comparison with some question text, it will spawn a searcher actor and when the searcher actor is asked to search, it will have the answer passes “by reference”. This will work only because:
The answer data is completely immutable
The search actors are always created on the same machine as their parent.
Doing it this way, I can ask my answer actor to perform many compare tasks to different questions without being a bottleneck and in addition reply to other messages faster without being tied up in a “compare” task that can be time consuming. I can now shard my answers across as many nodes if needed. As long as the searcher workers are always created on the same node as the parent, I can share the model state of the answer actor.

I would be very happy to hear your thoughts on this idea. Is this good practice or will something come back and bite me.

1 Like

That sounds good to me.