Beliebte Suchanfragen
|
//

Microstream – the end of O/R mappers?

29.9.2022 | 14 minutes of reading time

Impedance mismatch

Searching for alternatives to O/R mappers and persistence frameworks for NoSQL databases, I came across Microstream and was interested pretty quickly. On the one hand because Microstream is being developed in my home region Oberpfalz, but mainly because I couldn't believe the numbers from demos in talks and the GitHub repository: Data accesses are supposed to be partly faster by a factor of 1000 than with an application that is implemented with JPA (including a cache). At the same time they are supposed to save even more resources.

"How does Microstream do that?" I wondered, eager to try it out. So I decided to extend the to-do app from my article on Hotwire and swap out the persistence layer. Instead of storing the to-dos in an ArrayList, they should be stored in a Postgres database via Microstream. You can find the corresponding source code on GitHub.

In this article, I would like to not only report about my experiences with Microstream, but also share some thoughts on how to use it in practice. This article is also available in German language.

Impedance mismatch

Why did I start looking for alternatives to persistence frameworks in the first place? In object orientation, there are classes with variables and operations. Objects with an object ID are instances of such classes. There can be relationships between classes: 1-to-1, 1-to-N and M-to-N. Furthermore, in object orientation there is the concept of inheritance and polymorphism.

In relational databases, on the other hand, there are tables with columns. Each row in a table has a primary key as ID. There can be relationships between the tables. These are mapped using foreign keys. It quickly becomes clear: not all paradigms of object orientation can be mapped in the relational model ("impedance mismatch"):

Object orientationRelational model
Class with variables and operationsTable with columns
Objects as instances of a classRow in a table
Object IDPrimary key
1-to-1, 1-to-N, M-to-N relations1-to-1, 1-to-N relations via foreign keys. M-to-N only via mapping tables
Inheritance and polymorphism

Thus, in order to store object graphs in a relational database, a complex transformation must take place. Nowadays, object-relational mappers (O/R mappers) are usually used for this conversion. In Java applications, there is even a standard for this, JPA, which is implemented by various frameworks (e.g. Hibernate or Eclipselink). O/R mappers not only hide the complexity of the mapping, but depending on the implementation also offer elegant access options to the data in the database without you having to write SQL yourself (e.g. Spring Data). Sounds great, doesn't it?

The flip side is that database accesses are almost too abstract by O/R mappers. It is completely intransparent which database accesses actually take place (unless you take a look at the corresponding log file and then get scared).

The following example illustrates the problem: a customer has their country of origin and a list of orders as attributes. In each order there is a list of articles with a price. For reporting purposes, the total order value is to be determined for each customer's country of origin. If the O/R mapper is not configured correctly (i.e. annotations are missing or set incorrectly), the following happens now:

  1. fetch all customers (DB access 1)
  2. fetch all orders for each customer (Number of customers * DB access 2)
  3. fetch all articles in each order (Number of orders of each customer * DB access 3)

In the source code, these are all harmless getter calls, so for developers it is initially invisible what is actually happening under the hood. And it is probably not even noticed during local developer tests that there is a performance problem here, because this will only occur with a larger amount of data.

But even if the O/R mapper is configured correctly and the queries are optimized for the particular use case, the O/R mapper still has to map between the object graph and the relational model. This mapping costs time and is often one of the main performance killers.


Then let's use NoSQL, where we can store JSON documents or graphs.

NoSQL databases are also incompatible with Java object graphs, or have you ever seen JSON with cyclic dependencies between objects? We have the same issue here: Java object graphs have to be mapped first to store them in a database. I can confirm that mapping can be a performance killer from one of my earlier projects: a graph database (Neo4j) with a Spring-Boot backend (incl. Spring Data Neo4j) was used there. And even if a query on the database returned a result within a very short time, it took significantly longer when the result of this query was mapped into a Java object graph by Spring Data Neo4j.

Microstream

This is where Microstream comes in. Microstream has been under development since 2013, went live with its first version in 2015, and has been open source since 2021. [1]

Microstream is a persistence framework that works directly with the Java object graph the mapping that other frameworks do under the hood is omitted here. At its core, Microstream has implemented its own serialization and persists data in binary form in a storage. Various storage solutions can be connected, for example AWS S3, relational databases (such as Postgres), NoSQL databases (such as MongoDB), Kafka or the file system. [2]

Developers will quickly feel comfortable using Microstream: Microstream is pure Java with all the advantages that come with it (clean object-oriented programming, type safety, among others). In addition, from now on you only have to worry about one model, namely the Java object graph. There is no need for a second, relational model anymore. Suitably the Java Streams API can be used as a query language.

Integration of Microstream

Microstream can be integrated via Maven, for example. Since I want to use a Postgres database as storage for the to-do application, I also need the library for SQL file systems and Postgres in addition to the libraries for Microstream and the configuration of Microstream. I use the current version of Microstream, version 07.00.00-MS-GA.

1<dependency>
2  <groupId>one.microstream</groupId>
3  <artifactId>microstream-storage-embedded</artifactId>
4  <version>${microstream.version}</version>
5</dependency>
6<dependency>
7  <groupId>one.microstream</groupId>
8  <artifactId>microstream-storage-embedded-configuration</artifactId>
9  <version>${microstream.version}</version>
10</dependency>
11<dependency>
12  <groupId>one.microstream</groupId>
13  <artifactId>microstream-afs-sql</artifactId>
14  <version>${microstream.version}</version>
15</dependency>
16<dependency>
17  <groupId>org.postgresql</groupId>
18  <artifactId>postgresql</artifactId>
19  <version>42.2.26</version>
20</dependency>

Storage Manager and root instances

The Microstream Storage Manager is the interface to the connected storage. The Storage Manager requires a so-called root instance as the entry point for accessing data. The root instance (red here) is the root of the Java object graph that is to be persisted.

In the example, I implemented a Java class DataRoot whose only attribute is TodoList a wrapper around a list of to-dos that provides some methods to interact with the to-do list (e.g. add, remove, search by to-do ID, search by user ID).

1public class DataRoot {
2
3    private final TodoList todoList = new TodoList();
4
5    public DataRoot() {
6        super();
7    }
8
9    public TodoList getTodoList() {
10        return this.todoList;
11    }
12}

As you can see, DataRoot is simply a POJO with no special properties like annotations. The root instance must be made known to the Storage Manager. In addition, there are other configuration options:

1private volatile EmbeddedStorageManager storageManager;
2
3private StorageManagerAccessor(final String dbUrl, final String dbUser, final String dbPassword) {
4    final PGSimpleDataSource dataSource = new PGSimpleDataSource();
5    dataSource.setUrl(dbUrl);
6    dataSource.setUser(dbUser);
7    dataSource.setPassword(dbPassword);
8
9    final SqlFileSystem fileSystem = SqlFileSystem.New(SqlConnector.Caching(SqlProviderPostgres.New(dataSource)));
10
11    final EmbeddedStorageFoundation<?> foundation = EmbeddedStorageFoundation.New().setConfiguration(
12            StorageConfiguration.Builder()
13                    .setStorageFileProvider(
14                            Storage.FileProviderBuilder(fileSystem)
15                                    .setDirectory(fileSystem.ensureDirectoryPath("microstream_storage"))
16                                    .createFileProvider())
17                    .setChannelCountProvider(
18                            StorageChannelCountProvider.New(Math.max(1, // minimum one channel, if only 1 core is available
19                                    Integer.highestOneBit(Runtime.getRuntime().availableProcessors() - 1))))
20                    .createConfiguration()
21    );
22
23    this.storageManager = foundation.createEmbeddedStorageManager().start();
24    if (this.storageManager.root() == null) {
25        LOG.info("Setting root for storage manager");
26        this.storageManager.setRoot(new DataRoot());
27        this.storageManager.storeRoot();
28    }
29}

First I configured the storage, in my case the Postgres database. The Storage Directory (microstream_storage) is the place where the data is stored. In the example, Microstream creates a microstream_storage table in the database. Channels are used to define the number of IO threads that may be used by Microstream. This way the IO performance can be optimized. For each channel another table is created in the Postgres database. If the file system is used instead of Postgres, there is a separate directory for each channel below the configured storage directory.

Finally, the root instance is made known and initially saved if it does not yet exist (e.g. this section is skipped when the application is restarted, since there are already saved objects).

This minimal setup is already sufficient to work with Microstream.

CRUD with Microstream

When a new object is to be saved, the "owner" of the object must be stored. So, in the example, if a new to-do is to be saved, the list of to-dos must be stored after the to-do has been added to the list:

1private final List<Todo> todoList = new ArrayList<>();
2
3public UUID add(Todo todo) {
4    this.todoList.add(todo);
5    StorageManagerAccessor.getInstance().getStorageManager().store(this.todoList);
6    return todo.getId();
7}

If an object is changed, only this object has to be stored. So if a to-do is marked as done, only this to-do and not the whole list will be stored:

1public UUID update(Todo todo) {
2    StorageManagerAccessor.getInstance().getStorageManager().store(todo);
3    return todo.getId();
4}

If an object is deleted, every reference to this object must be removed from the object graph and this change must be saved. In the example, the to-do to be deleted only needs to be removed from the list and the list then saved. If, in addition to the list, there was a map where to-dos are assigned to individual users, then the to-do would also have to be deleted from this map.

1private final List<Todo> todoList = new ArrayList<>();
2
3public void remove(UUID todoId) {
4    Optional<Todo> existing = byId(todoId);
5    existing.ifPresent(todo -> {
6        this.todoList.remove(todo);
7        StorageManagerAccessor.getInstance().getStorageManager().store(this.todoList);
8    });
9}

One of the reasons Microstream is so fast is that the data is held in-memory. This means that Microstream loads the object graph into the memory when the Storage Manager is initialized. You might throw your hands over your head and ban Microstream from your toolbox again, because you have to work with so much data that your memory would burst if all the data were loaded in. For such cases Microstream offers lazy loading. With lazy loading, not the entire object is loaded into the memory, but only an ID. The object is then reloaded on demand, i.e. when it is accessed. While loading, Microstream records a timestamp (when was the last access to this object). In the background, Microstream cleans up all objects that were loaded via lazy loading and where the last access was already 15 minutes ago. Of course, this behavior can be configured by yourself, details can be found in the documentation. If you don't need lazy loading in your application, you don't need to pay attention to anything special.

The read access is very simple: in the example I access the to-do list with the Streams API:

1private final List<Todo> todoList = new ArrayList<>();
2
3public List<Todo> all() {
4    return this.todoList;
5}
6
7public List<Todo> byUser(UUID userId) {
8    return this.todoList.stream()
9                        .filter(todo -> todo.getUserId() != null && todo.getUserId().equals(userId))
10                        .collect(Collectors.toList());
11}
12
13public Optional<Todo> byId(UUID todoId) {
14    return this.todoList.stream().filter(todo -> todo.getId().equals(todoId)).findFirst();
15}

Integration in Quarkus

To initialize the Storage Manager at application startup and to shut down the Storage Manager cleanly when the Quarkus application is closed, there is a class in the sample application that observes for the Quarkus StartupEvent and ShutdownEvent.

1@ApplicationScoped
2public class StorageManagerController {
3
4    private static final ILogger LOG = ILogger.getLogger(StorageManagerController.class);
5
6    @ConfigProperty(name = "microstream.db.postgres.url")
7    String dbUrl;
8    @ConfigProperty(name = "microstream.db.postgres.user")
9    String dbUser;
10    @ConfigProperty(name = "microstream.db.postgres.password")
11    String dbPassword;
12
13    /**
14     * Initialize storage manager on quarkus startup.
15     *
16     * @param startupEvent quarkus startup event.
17     */
18    public void onStartup(@Observes StartupEvent startupEvent) {
19        LOG.info("Initializing storage manager");
20        StorageManagerAccessor.init(this.dbUrl, this.dbUser, this.dbPassword);
21    }
22
23    /**
24     * Shutdown storage manager on quarkus shutdown.
25     *
26     * @param shutdownEvent quarkus shutdown event.
27     */
28    public void onShutdown(@Observes ShutdownEvent shutdownEvent) {
29        LOG.info("Shutting down storage manager");
30        StorageManagerAccessor.getInstance().shutdown();
31        LOG.info("Successfully shutdown storage manager");
32    }
33}

When class definitions change at runtime, this causes problems in Microstream's default configuration. This can happen with Quarkus, among others, when the application is started in development mode and hot code replacement takes place. There is also an issue regarding this on GitHub and a question on Stackoverflow. I was able to solve this problem in the to-do application by configuring a custom class loader, analogous to the example in the Microstream documentation:

1// handle changing class definitions at runtime ("hot code replacement" by quarkus by running app in development mode)
2foundation.onConnectionFoundation(connectionFoundation ->
3        connectionFoundation.setClassLoaderProvider(ClassLoaderProvider.New(Thread.currentThread()
4                                                                                  .getContextClassLoader())));

The end of O/R mappers?

Microstream did not disappoint me, the performance is really impressive. And not only the performance, but also the usage. Thanks to its pure Java approach, Microstream allows you to focus on the object graph as the only data model. Considerations on how to map this graph to tables in relational databases are eliminated. What is also eliminated is tons of JPA annotations, as well as the definition of SQL queries to retrieve data from the database. Instead, you can simply use the Streams API.

The integrated housekeeping also prevents the storage from bursting, although Microstream only adds updates to existing objects or new objects to the back of the storage in binary form. Since not only the local file system can be used as storage, but also databases or object storage (such as AWS S3), the backup functions of these platforms can also be used.

And it's not just me who is taken with Microstream, but others as well: for example Microstream is already integrated with Helidon and Micronaut. Further integrations are to follow, for example also an integration in Quarkus.

Microstream has clearly focused on storing object graphs and has written its own serializer for this purpose. And Microstream also masters these topics excellently. In return, other tasks that were taken over by persistence frameworks or the database itself move to the domain of application development. These include, for example, locking (whether pessimistic or optimistic locking, both types must be developed by yourself) or accessing the same data from multiple threads. In the sample application a very lightweight locking mechanism is implemented, but for "real" applications there is more to do. Multithreading is an issue because Microstream works directly with the original data. In comparison, when using JPA, only a copy of the data is loaded from the database for a single thread, the data is modified within this thread and stored back in the database. Microstream provides a way to work with shared data, which is documented here. However, when implementing an application, explicit attention must be paid to whether data can be updated by multiple threads at the same time.

Another aspect I would like to point out is that the data from your application is only visible through the Java application. Sure, you can look at the binary data in the corresponding storage, but for humans this data is not readable. Microstream does offer a (CSV) export as well as a client, that, in my opinion, is still in its infancy, but it is not possible to open the database and see which data is currently persisted with a few SELECT queries.

Of course, this also means that database-level interfaces with Microstream no longer exist. This may not be a problem in the era of microservices (from my point of view the database should not be used as an interface between different applications or services), but I know (unfortunately) many applications the application landscapes where the database is used as an interface (e.g. data from the database if pushed via ETL into a data warehouse of an application accesses data from another application via a provided view).

As soon as lazy loading is used, Microstream influences the design of the object graph: On the one hand, through the Lazy wrapper, which signals Microstream that objects are to be reloaded. On the other hand, extra structures have to be created for lazy loading, for example several lists or maps whose keys are used to reload the corresponding values. In the context of the to-do app, I could imagine maintaining two lists: one for open to-dos, the other for already completed ones. The list with already completed to-dos is marked with a Lazy wrapper, so that they are only loaded when needed. The fact that technology influences the domain model contradicts the principles of various modern architectural approaches (Clean/Onion Architecture, Ports and Adapters). Therefore, you should consider this point in your design and, if necessary, accept it as a trade-off. One approach would be to move the object graph for Microstream to an outer ring or adapter and introduce a mapping layer between your domain model and the Microstream object graph. This is also the way I would do it when using an O/R mapper.

Nevertheless, Microstream is definitely a technology to keep an eye on. Still, from my point of view, O/R mappers are here to stay for quite a while at least until Microstream is used more widely in the community and there is more experience with it.

References

|

share post

Likes

6

//

More articles in this subject area

Discover exciting further topics and let the codecentric world inspire you.

//

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.