Resolve 1000+ DNS with async-dns without OOM error

OOM after an hour sounds like there is a resource leak, could be in your application

It could, but with disabled DNS everything works smoothly.
By disabled DNS I meant killed DNS actor =)

We would like to track down the root cause of the OOME.

We need an example of how to reproduce. You wrote:

IO(Dns) ? DnsProtocol.Resolve(domain, ipRequestType(ipv4=true, ipv6=true)(1 second)

We need a more complete example of usage. How is that called? How frequently? Is it only one domain or how many?

I’m running a gRPC server and feeding from 500 up to 1500 URLs per second from Alexa 1M
with around 50 concurrency.

I downloaded a top-1m.csv file and tried the following with async-dns enabled:

      FileIO
        .fromPath(Paths.get("top-1m.csv"))
        .via(Framing.delimiter(ByteString("\n"), 1024, false))
        .throttle(1, 100.millis)
        .mapAsync(10) { line =>
          val domain = line.utf8String.split(",")(1)
          val result = (IO(Dns) ? DnsProtocol.Resolve(domain, DnsProtocol.ipRequestType(ipv4 = true, ipv6 = false)))
            .mapTo[DnsProtocol.Resolved]
          result.recoverWith {
            case exc =>
              Future.successful(s"ERROR $domain: $exc")
          }
        }
        .runForeach(println)
        .failed
        .foreach(println)

I used config:

akka.io.dns.resolver = "async-dns"
akka.io.dns.async-dns {
        resolve-timeout = 2s
        positive-ttl = 20s
        negative-ttl = 20s
        cache-cleanup-interval = 10s
}

Run it for 10 minutes with attached profiler. Using -Xms32m -Xmx32m so that it would trigger OOME early. I see no sign of a memory leak:

Two differences of my run with what I think you use:

  • ipv6 = false, I had some error otherwise
  • I don’t dare to run with a high rate

Will run your code with our DNS server with some changes:

  1. up to 1000 domains per second
  2. make two separate lookups for IPv4 and IPv6