Question regarding Singleton failure handling

I have a questing regarding the failure handling when using a singleton. I did a lot of web research but didn’t find a solution yet. Perhaps someone can point me into the correct direction:

In my cluster I have a persistent singleton. When the persitence database is not available the recovery phase fails after a short time and the singleton actor is stopped. On the user side of the singleton I am doing a lookup of the singleton’s proxy using the Identity message. That works in all cases since the proxy exists even if the singleton itself does not start up or terminates. Also having a termination watch on the proxy does not provide any information since the singleton proxy itself does not terminate. How do I detect that failure situation on the user side?

I tried to add a kind of Ping message to the singleton itself. But using that approach only fills the logfiles with dead-letter messages since the singleton itself does not exist anymore and the message can not be delivered.

Is there any way for a user to detect the situation that the singleton’s proxy does not have a connection to the singlton itself anymore? I guess the singleton proxy internally must be aware of that situation …

Thanks for your help
Lay

A singleton should have some kind of restart supervision because if it’s stopped it will not be started again, until there is a node failure and it’s started on another node. For persistent actor it’s good to use backoff supervision.

There is a supervision strategy for my singleton and it will become active again if the persistence database is back online. That is not my problem.

I am more interested if it is somehow possible to get the information if the proxy can currently deliver the messages to the singleton. In the documentation I read about the proxy’s internal queue with a configurable size. So internally the proxy must have the information about the connection state to the singleton. But it seems as if that is not a public API that can be somehow used by a user from outside.

That is not exposed outside the proxy. There would anyway be a gap until the proxy knows that the singleton is unavailable so messages in flight would anyway be lost.

Thanks for the clarificaton. I already assumed that this is the case.

As said I already have supervision for the singleton in place and my protocol actor <=> singleton is secured by a timeout and a retry handling.

The only little problem that remains for me is the “caching” phase of the proxy when the singleton still tries to recover. During that phase all my retries are stored in the queue. When the singleton finally fails to recover, all these cached messages are logged as dead letters. That can be 100+ log messages depending on configured queue size and timeouts which perfectly hide the single error message of the failed recovery :-). I know I can switch off the dead-letter logging in general, but sinces this is normally valuable information for me, I don’t like that solution.

So can I switch of the dead-letter logging somehow only for the singleton proxy?

Additionally I think the proxy’s failure handling is a little bit inconsistent: messages received during the initial caching phase are logged as dead letters, messages received afterwards are silently dropped.