[Akka 2.6.0-M4] Can not reproduce documented behaviour for "Remote Watch Disabled" changes

While migrating to 2.6.0-M4 we had a few tests failing related to remote-watch changes in the recent milestone.

In the process of reproducing the issue with minimal code, we realized that the change is not even manifested as expected.

We first start RemoteWatcheeApp followed by RemoteWatcherApp as described in the readme of minimized-issue-repo.

We were expecting that Terminated signal will not be received by the watcher. But it does receive it!

What are we missing?

Hi, thanks for your question. First, the behavior you are checking is specifically for when one is not using Akka Cluster, yet this is the first dependency you have added to your build. It is however designed to be overridden if Akka Cluster is used because in that case, the remote watch/unwatch is safe, and Terminated messages for killed actors would indeed be received.

That said, you are not enabling Cluster and I do see an error in the docs because in remoting we catch the Watch/Unwatch for a DeathWatch, and do allow all other system messages through, like Terminated:

Remote Watch: ignores the watch and unwatch request, and Terminated will not be delivered when the remote actor is stopped or if a remote node crashes

So you are correct there and thank you for identifying it for us to update!

@helena , thanks for explanation!

If I understand it correctly: remote actor-ref watch and Terminated signals will work even in 2.6 as long as both (watcher and watchee) actor systems use the remote provider.

With this understanding, we further modified our example such that a) Watcher has Cluster Provider b) Watchee has Remote Provider with use-unsafe-remote-features-without-cluster = on.

Now, we should receive Terminated signal as expected, right? But we do not! This is breaks few of our tests when we try to migrate to M4

What explains this behaviour?

Hi @skvithalani, actually in looking into this, we found something and I’m pushing a fix, so thank you very much for finding this!! I will update you here today, with a link for the function change if you’re curious. It will be in the next milestone :slightly_smiling_face:

Only if they enable use-unsafe-remote-features-without-cluster

Yes, if use-unsafe-remote-features-without-cluster is enabled.

This didn’t work correctly in M4. You will be able to try a nightly snapshot when Helena’s fix has been merged.

Thanks a lot for trying out the milestones and reporting this issue.

@helena and @patriknw,

Thanks for the update.

We look forward to the next milestone getting released with the fix. :slight_smile:

Hi @skvithalani it looks like the updated snapshots are now published, I believe what you want is in:
https://repo.akka.io/snapshots/com/typesafe/akka/akka-cluster-typed_2.12/2.6-20190712-192311/

Thanks again for bringing this to our attention.
Kind regards,
Helena

Thanks for the update @helena .

@helena and @patriknw

We tested the snapshot 2.6-20190712-192311 in our experiment repo for
a) Watcher with cluster provider and
b) Watchee with remote provider.

Our observation:

  • Watcher(with cluster provider) requires akka.remote.use-unsafe-remote-features-without-cluster = on and it will receive Terminated signal from watchee (with remote provider) as expected

This solves the remote watching problem.

Our confusion:

  • The watcher already has a cluster provider so why does it require the flag akka.remote.use-unsafe-remote-features-without-cluster?

Instead, watchee which has remote provider should require the flag, isn’t it?

1 Like

So this is about ‘safe’ use of remoting and in your example you want to knowingly do unsafe, which is across the cluster boundary if your watchee is outside the cluster. Hence you need to declare it. Does that help? You should not need to if watcher/watchee are inside the same cluster. You would need to if both were remote only.

Helena

Here is an another attempt to explain. In the shared example watcher is a cluster as shown by the following config:

  akka {
    actor {
      provider = "cluster"
    }

    remote {
      artery {
        canonical.port = 4567
      }
      use-unsafe-remote-features-without-cluster = on
    }
  }

This configuration is a bit confusing for the user.

  • use-unsafe-remote-features-without-cluster = on setting is required even though provider = cluster
  • We say provider = cluster but setting ends with -without-cluster=on

This makes it clear that new setting is required for all the watchers whether or not it is remote. Either documentation needs to reflect this or the current implementation has unexpected effect.

Hope this clarifies the confusion.

1 Like

The background of the feature/limitation is that we want to make users very aware of that watch to a node outside of the cluster may have unexpected consequences, such as quarantining and therefor required restart as soon as the failure detector timeout triggers.

Failure detection between nodes that are members of the same cluster doesn’t have that shortcoming.

Typically this is when using plain remoting without any cluster provider at all. As you mention it can also be when using cluster provider but watching a node that is not a member, but I think such mixed usage is more rare.

We could consider renaming the config to -outside-cluster instead of -without-cluster

That will be really helpful. There is another key point here: we should document that all watchers need this setting (cluster as well as remote).

I agree that cluster having a need to watch remotes is rare. But in our case, that is the core of our design for a general purpose Akka-CRDT based service-discovery mechanism where some of the registered-services are remote actors.

Yes, even though you use provider = cluster, you are watching across the cluster boundary, which is where you should note:

if you understand the consequences

But I think I will create a ticket to describe the cross-boundary use for clarity, thanks for bringing it up.

~Helena

You are welcome @helena :slight_smile: and thanks for updating us with the issue created.

Looking forward to next 2.6 milestone.

@patriknw since there is already an issue created for improving docs ( Issue: Add clarification to doc on cluster cross boundary use of DeathWatch) should I go ahead and create another issue for renaming the flag use-unsafe-remote-features-without-cluster to
use-unsafe-remote-features-outside-cluster ?

I don’t think we need an additional issue but you can make a comment on that doc ticket. Thanks.

okay sure.