There is a common pattern I’ve seen in Lagom for asynchronous communication between services. Say I have two Services A and B. When a change in a persistent entity in Service B needs to be communicated to Service A, there are two approaches:
- In Service B subscribe to the event stream of the persistent entity and make calls to the API of Service A to notify the service
- In Service A subscribe to a topic of Service B and react to the events about the change
I’ve seen that the second approach is preferred because it more loosely couples the services, e.g., if Service A is down then Service B will continue without error and Service A will catch up when it comes back online.
My question has to do with how to handle errors that require notifying Service B about the failure so that it can take some compensating action.
In the first approach a failure can be handled by the event processor in Service B. Since the processor is in Service B it can send a command to the relevant persistent entity so that it can initiate a compensating action.
The second approach is trickier. The topic processor is in Service A and must communicate the failure back to Service B. My initial thought was to simply make an API call to Service B. However, that undoes the loose coupling we gained by using the second approach. To maintain loose coupling we’d need to use a topic to publish a failure notification event back to Service B. The only way I can see to accomplish that is to have the persistent entity in Service A persist a failure event. That seems inappropriate because failure events aren’t state changes and because it would mix Service B business logic in to Service A.
I can see the advantages of the second approach and why it seems to be a common pattern. However, I’m trying to figure out how to handle failure notifications without negating those advantages.
I’d appreciate the community’s experience or insight on this issue.