XML parsing from entity bytestring

Hello,

While using akka-stream-alpakka-xml with akka http requests, I noticed interesting behavior when parsing large xml files.

Detailed scenario is as follows:

  • user posts xml file using akka http endpoint
  • xml is then taken from request entity as bytestring source and parsed by XmlParsing.parser

XmlParsing logic works with the same xml without akka http, or when I first materialize it by Unmarshal(httpRequest.entity).to[String] and pass it as Source.single(Bytestring) to XmlParsing.parser.

When I run source.via(XmlParsing.parser), where source is from akka http request (request.entity.dataBytes), I almost always see that some keys from xml are splitted:

testuser/one/two/three/four/five/six/seven/eight/nine/ten/eleven/twelve/sub736/KM8DaDXEVNP4MByygsM8d5vK96NStFJC=87i and
zFQw0lL90.txt

I susspect that this is related to fact that parsing is faster then source?

What do you think?

if this helps, I have a working code in

if you run curl -XPOST http://localhost:8123 -d @large.xml two or more times you should see line split

Hi @arempter

Is it that you get several consecutive TextEvents?
That may happen when the source delivers the data in “chopped up” ByteStrings as it doesn’t know about the structure. The parser does not try to aggregate data internally until a text section ends.

Cheers,
Enno.

Hi @ennru,

thanks for response. yes indeed I think I get two textEvents…
It also looks like source is not divided randomly, but more like on buffer size or something

If you need to get those into a single event, you could collapse consecutive TextEvents by adding a statefulMapConcat.

Enno.

cool, thanks for help