Processing a csv file with Akka Stream

chaitanya · July 23, 2018, 7:50am

Hi,
I am writing a sample program using akka streams where I am reading contents of csv file.
The csv file contains 5 lines of data. Now I want to process ( in this case display on console) 2 lines at time so I grouped the stream of lines by 2. It gave me 2 groups and skipped the last line ideally which should have been in 3rd group.
So how do i access elements in the last group is the last group has a size less that which was specified (in my case specified size is 2) ???

Sample code:

public class MyClass{
public static void main(String[] args)
{
final ActorSystem system = ActorSystem.create(“SplitAndPrint”);
final Materializer materializer = ActorMaterializer.create(system);

	Sink<ByteString, CompletionStage<Done>> splitAndPrintSink = Sink.<ByteString>foreach(chunk -> splitLine(chunk) );
	
	Sink<ByteString, CompletionStage<Done>> lineSink = Sink.<ByteString>foreach(s-> convert(s));
			
	
	Path filePath = Paths.get("C:\\Users\\admin\\oxy\\AkkaStreaming\\src\\main\\resources\\csvdata.csv");
	CompletionStage<Done> done= FileIO.fromPath(filePath)
			.via(Framing.delimiter(ByteString.fromString(System.lineSeparator()), Integer.MAX_VALUE,FramingTruncation.DISALLOW))
			.grouped(2)
			.
			.runForeach(lines->{
				System.out.println(lines.size());
				lines.forEach(l -> convert(l));
			}, materializer)
			
			;

}

public static void convert(ByteString xString)
{
System.out.println(" ----- “);
String converted = xString.utf8String();
System.out.println(converted.length()+” | "+converted);
}
}

csvdata.csv

Region,Country,Item Type,Sales Channel,Order Priority,Order Date,Order ID,Ship Date,Units Sold,Unit Price,Unit Cost,Total Revenue,Total Cost,Total Profit
Sub-Saharan Africa,South Africa,Fruits,Offline,M,7/27/2012,443368995,7/28/2012,1593,9.33,6.92,14862.69,11023.56,3839.13
Middle East and North Africa,Morocco,Clothes,Online,M,9/14/2013,667593514,10/19/2013,4611,109.28,35.84,503890.08,165258.24,338631.84
Australia and Oceania,Papua New Guinea,Meat,Offline,M,5/15/2015,940995585,6/4/2015,360,421.89,364.69,151880.40,131288.40,20592.00
Sub-Saharan Africa,Djibouti,Clothes,Offline,H,5/17/2017,880811536,7/2/2017,562,109.28,35.84,61415.36,20142.08,41273.28

2m · July 23, 2018, 12:53pm

As the documentation indicates, the grouped operator emits not only when it collects the required number of elements but also when the upstream completes. If that is not the case, then it could be a bug. Could you open a ticket for it in the Akka issue tracker with a small reproducer?

chaitanya · July 23, 2018, 1:53pm

Hi Martynas,
with given input file this program prints 4 lines whereas input file has 5 lines. This program will work and prints 5 lines (2+2+1) as the group size is 2)only when the input csv file has “\n” at the end of last line.
But there is no guaranty that input file always has “\n” character at the very end. The problem is with grouped() it does not consider last line if there is “\n”.

2m · July 24, 2018, 8:25am

Aha! Now I see. It is not the grouped operator behaviour that is unexpected, but Framing.delimeter truncation parameter.

From the docs:

if set to DISALLOW, then when the last frame being decoded contains no valid delimiter this Flow fails the stream instead of returning a truncated frame.

So you should either allow truncation, or, better yet, switch to Alpakka CSV connector which is designed to be Akka Streams API for CSV parsing and rendering.

chaitanya · July 24, 2018, 11:18am

Yes your are right I am going to allow truncation as we may have other characters as delimiters but will definitely check Alpakka CSV connector.

Thanks a lot

ktoso · July 24, 2018, 2:22pm

@chaitanya I have edited away a “f… you” emote ( : fu :) which you seem to have used. I think it was likely accidental. Please watch out for those typos, it could have been perceived as intentional/offensive.

Happy hakking!

chaitanya · July 24, 2018, 2:33pm

Oh ! Thanks a lot !!!

Topic		Replies	Views
Transform a CSV file into multiple CSV files using Akka Stream Akka Streams & Alpakka	5	2540	June 1, 2020
Keeping an object's context/metadata after passing through CSVParsing.lineScanner Akka Streams & Alpakka	5	1219	July 5, 2019
groupBy within akka stream Akka Streams & Alpakka	2	482	January 7, 2020
[Akka-Stream] How to let the groupBy operator without limited substreams? Akka Streams & Alpakka streams	2	553	June 3, 2020
Concat Stream operator Akka Streams & Alpakka streams	1	568	July 28, 2021

Processing a csv file with Akka Stream

Related Topics