Group a batch by key

streams

(Liel Shraga) #1

Hi,

I have a file of rdf format which I would like to group by a key. the key should be the first part.
and then batch it to groups of 3.
for example:
this is my file:
<http://data.com/1-22222222> c <http://data.com/Person> .
<http://data.com/1-22222222> g <http://data.com/Person> .
<http://data.com/1-22222222> q <http://data.com/Person> .
<http://data.com/1-22222222> e <http://data.com/Person> .
<http://data.com/1-22222222> d <http://data.com/Person> .
<http://data.com/1-22222222> v <http://data.com/Person> .
<http://data.com/1-33333333> a <http://data.com/Person> .
<http://data.com/1-77777777777777777> a <http://data.com/Person> .
<http://data.com/1-99999999999999999> a <http://data.com/Person> .
<http://data.com/1-4444444> a <http://data.com/Person> .
<http://data.com/1-11111111> a <http://data.com/Person> .

expected to group it as follows:

Grorup 1:
<http://data.com/1-22222222> c <http://data.com/Person> .
<http://data.com/1-22222222> g <http://data.com/Person> .
<http://data.com/1-22222222> q <http://data.com/Person> .
<http://data.com/1-22222222> e <http://data.com/Person> .
<http://data.com/1-22222222> d <http://data.com/Person> .
<http://data.com/1-22222222> v <http://data.com/Person> .
<http://data.com/1-33333333> a <http://data.com/Person> .
<http://data.com/1-77777777777777777> a <http://data.com/Person> .

Group 2:

<http://data.com/1-99999999999999999> a <http://data.com/Person> .
<http://data.com/1-4444444> a <http://data.com/Person> .
<http://data.com/1-11111111> a <http://data.com/Person> .

means that all lines with first part of
<http://data.com/1-22222222 should be considered as 1 group

this is my code:
FileIO.fromPath(Paths.get(“test”))
.via(Framing.delimiter(ByteString("\n"), 256, true)
.map(_.utf8String))
.groupBy(Int.MaxValue,x=> ({x.split(" ")(0)}, x))
.mergeSubstreams
.grouped(3)
.runForeach(println)

and this is the output which I get:(not considered each same key to 1 group…instead print each 3 lines)
Vector(, http://data.com/1-99999999999999999 a http://data.com/Person ., http://data.com/1-77777777777777777 a http://data.com/Person .)
Vector(http://data.com/1-33333333 a http://data.com/Person ., http://data.com/1-22222222 v http://data.com/Person ., http://data.com/1-22222222 f http://data.com/Person .)
Vector(http://data.com/1-22222222 d http://data.com/Person ., http://data.com/1-22222222 s http://data.com/Person ., http://data.com/1-22222222 w http://data.com/Person .)
Vector(http://data.com/1-22222222 e http://data.com/Person ., http://data.com/1-22222222 q http://data.com/Person ., http://data.com/1-22222222 g http://data.com/Person .)
Vector(http://data.com/1-22222222 c http://data.com/Person ., , )