Download large files from S3 with Alpakka

alpakka
streams
(K) #1

Hello,

Is there a chunking mechanism for S3 file download similar to multipart upload onto S3 ?

I’m trying to stream a large file through S3 and the underlying consumer is timing out or so …resulting in an error like this:

[WARN] [04/15/2019 14:59:00.553] [default-akka.actor.default-dispatcher-14] [default/Pool(shared->https://s3-eu-west-1.amazonaws.com:443)] [0 (WaitingForEndOfResponseEntity)] Ongoing request [GET /bucket/file Empty] was dropped because pool is shutting down

and right after that:

[INFO] [04/15/2019 14:59:32.033] [default-akka.actor.default-dispatcher-3] [akka://default/system/IO-TCP/selectors/$a/0] Message [akka.io.TcpConnection$Unregistered$] from Actor[akka://default/system/IO-TCP/selectors/$a/0#-1268872956] to Actor[akka://default/system/IO-TCP/selectors/$a/0#-1268872956] was not delivered. [1] dead letters encountered. If this is not an expected behavior, then [Actor[akka://default/system/IO-TCP/selectors/$a/0#-1268872956]] may have terminated unexpectedly, This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

and the stream terminates after that, I guess I could define the ByteRange manually and loop through that, but seems less stream-like or a dirty way of achieving this.

Any thoughts on this?

(Martynas Mickevičius) #2

Hi @psilos,

even for large files, the download should not time out. Are you sure you are not inadvertently shutting down the pool before the download is complete? (it looks like it from the log message you posted).

(K) #3

Hi @2m,

Thanks for the suggestion, but not really sure what may be causing the pool to shut down, how can I investigate this further in this context ?

Thanks

(Martynas Mickevičius) #4

Terminating the ActorSystem could be one of the causes. Or calling Http().shutdownAllConnectionPools().

(K) #5

Hi @2m,

You were right, it seems to be the Kafka Producer terminating, since I can now see:

[info] [INFO] [04/25/2019 12:22:41.806] [default-akka.actor.default-dispatcher-37] [akka.stream.Log(akka://default/system/StreamSupervisor-0)] [file] Downstream finished.
[info] 12:22:42.109 [default-akka.kafka.default-dispatcher-24] INFO  o.a.k.clients.producer.KafkaProducer - [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 60000 ms.

I have been using Producer.plainSink so not really sure why the producer would timeout and terminate the stream, since there are still data coming in from the upstream