Switching from `blob` to `mediumblob` for SQL DB table column types

I use Lagom 1.5.1 with Scala 2.12.7 & MySQL for event persistence.

I’ve followed the doc: https://www.lagomframework.com/documentation/1.5.x/scala/PersistentEntityRDBMS.html

In particular, I let Lagom create the DB tables for each micro service, that is,

lagom.persistence.jdbc.create-tables.auto = true

According to the doc, this means that Lagom uses the following SQL code to create
journal & snapshot tables for each micro service:

This means that the journal.message and snapshot.snapshot table columns are typed blob.
Per the MySQL doc, this means that a message can’t be more than 64 kilobytes:
https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html#data-types-storage-reqs-strings

Today, I hit that limit and got this:

com.lightbend.lagom.scaladsl.persistence.PersistentEntity$PersistException: Persist of [akka.persistence.journal.Tagged] failed in [org.open.caesar.service.integrationSessionExecution.CaesarIntegrationSessionExecutionEntity] with id [caesar_integration_session_execution], caused by: {Data truncation: Data too long for column 'message' at row 1
2019-06-19 10:02:01,145 [ERROR] from com.lightbend.lagom.internal.scaladsl.persistence.PersistentEntityActor in services-integrationSessionExecution-mysql-application-akka.actor.default-dispatcher-22 - Failed to persist event type [akka.persistence.journal.Tagged] with sequence number [207] for persistenceId [CaesarIntegrationSessionExecutionEntity|caesar_integration_session_execution].
com.mysql.cj.jdbc.exceptions.MysqlDataTruncation: Data truncation: Data too long for column 'message' at row 1
	at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:104)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:970)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1109)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1057)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeLargeUpdate(ClientPreparedStatement.java:1377)
...

To fix this, I disabled auto-creation:

lagom.persistence.jdbc.create-tables.auto = false

Instead, I wrote a simple shell script to create the tables for each micro service:

# For Lagom persistence schema, see:
# https://github.com/dnvriend/akka-persistence-jdbc/blob/v3.5.0/src/test/resources/schema/mysql/mysql-schema.sql

# <service name> <password>
generateLagomSchemas() {
    cat << EOF
#
# Service: $1
#
DROP DATABASE IF EXISTS $1;
CREATE DATABASE $1;
CREATE USER IF NOT EXISTS $1@'%' IDENTIFIED BY "$2";
GRANT ALL PRIVILEGES ON $1.* TO $1@'%';

DROP TABLE IF EXISTS $1.journal;

CREATE TABLE IF NOT EXISTS $1.journal (
  ordering SERIAL,
  persistence_id VARCHAR(255) NOT NULL,
  sequence_number BIGINT NOT NULL,
  deleted BOOLEAN DEFAULT FALSE,
  tags VARCHAR(255) DEFAULT NULL,
  message MEDIUMBLOB NOT NULL,
  PRIMARY KEY(persistence_id, sequence_number)
);

CREATE UNIQUE INDEX $1_journal_ordering_idx ON $1.journal(ordering);

DROP TABLE IF EXISTS $1.snapshot;

CREATE TABLE IF NOT EXISTS $1.snapshot (
  persistence_id VARCHAR(255) NOT NULL,
  sequence_number BIGINT NOT NULL,
  created BIGINT NOT NULL,
  snapshot MEDIUMBLOB NOT NULL,
  PRIMARY KEY (persistence_id, sequence_number)
);

EOF

}

Suppose there are 3 services: service1, service2, service3, then creating the schemas can be done like this, assuming that the MySQL user is $MYSQL_USERNAME and password $MYSQL_PASSWORD:

SCRIPT=$(mktemp /tmp/mysql-script.XXXXXX)
trap "/bin/rm -f $SCRIPT" 0
trap "echo 'Temporary script file: $SCRIPT'" 2 3 15

for schema in service1 service2 service3; do
  generateLagomSchemas $schema $MYSQL_PASSWORD >> $SCRIPT
done
mysql \
    -h "${MYSQL_HOST}" \
    -u "${MYSQL_USERNAME}" \
    -p \
    < $SCRIPT

I hope this will help someone who may run into this obscure exception.

I’m also wondering why the default is just blob given that, let’s face it, 64KB is really tiny for storing arbitrary Lagom events.

One could even think of a compile-time plugin to calculate an estimated maximum size for Lagom events; after all, one can express in the Scala type system information about size using refinement techniques like https://github.com/fthomas/refined and https://github.com/theiterators/kebs. I use both so I can vouch for their compatibility with the latest/greatest Lagom 1.5

  • Nicolas.

Hi @NicolasRouquette,

In general, events are a few bytes. If you find 64Kb is tiny, it’s possible you need to review the amount of information stored on each event.

Alternatively, you can use Lagom’s features to compress serialized data. The compression uses GZIP and, if the format is JSON, the compressed representation usually reduces by a factor of 8.

Given the maturity of SQL DB this days and the cost of HDD, your solution looks fine too.

Cheers,

Thanks!

Regarding reviewing the amount of info on events: My services are designed for executing workflows of systems engineering automated processes. Some processes can have lots of data dependencies. Some workflows can have large topologies of processes & data flows. Information about configuring and executing these processes organized in workflows persists in events. So, in my case, the sensible choice is compression (regardless of the backend) and mediumblob for SQL backend.

I hadn’t looked at this part of the doc:

https://www.lagomframework.com/documentation/current/scala/Serialization.html#Compression

I’m really glad that tuning compression is so easy!

In fact, the doc is useful on a different level: type astronomy! That is, scanning the source code for definitions of JsonSerializerRegistry.serializers and inspect the list of serializer types to produce a report about estimated sizes of persistent data of any kind (events, messages, commands, state).

I’ve done something similar for generating an OpenAPI 3.0 document for my Lagom services that accounts for tagged types. I’ll add this to my todo list.

  • Nicolas.
1 Like