Event sourcing is gaining more and more attention. This is partly due to the increasing interest in domain-driven design (DDD) and CQRS, to which event sourcing fits well in conceptual terms. But what else is it suitable for? And where does it not fit? In order to be able to answer the question of when event sourcing or CRUD is the more appropriate approach, it is advisable to clarify first of all what exactly event sourcing is – and what it is not.
In many cases event sourcing is combined with domain-driven design (DDD) and the design pattern CQRS, but it is only partly related to the two concepts. Event sourcing is a specific procedure for storing data. Unlike the traditional approach with a relational database, event sourcing does not persist the current state of a record, but instead stores the individual changes as a series of deltas that led to the current state over time.
Determining the current state
The procedure is similar to the way a bank manages an account, for example. The bank does not save the current balance. Instead, it records the deposits and withdrawals that occur over time. The current balance can then be calculated from this data: if the account was first opened with a deposit of 500 EUR, then another 200 EUR were added, and then 300 EUR were debited, the following calculation takes place:
500 (deposit)
+ 200 (deposit)
- 300 (payment)
---
= 400 (balance)
The current account balance is 400 EUR. The procedure can be continued over an arbitrary period of time, only the number of summands grows gradually. If domain-related facts that contain certain semantics (the so-called events) are stored instead of simple numbers, any process can be mapped.
Restoring the current state by playing back the individual events is called replay. As a special feature of event sourcing, it is not only possible to determine the current state, but also any state from the past. To do this, it is only necessary to stop the replay at the desired time in the past and not to play back the events completely. It is also possible to determine the historical development of the state, which provides an easy way for time series analysis and other evaluations of historical data.
Optimizing performance
Unfortunately, a replay becomes more and more complex as the number of events that need to be replayed increases. At first glance, the use of event sourcing seems to lead to read accesses becoming increasingly slow. However, it is easy to find a way out of the problem.
Since events are always only added at the end of the existing list and existing events are never changed, a replay calculated once will always produce the very same result for a certain point in time. If you try to follow the analogy with account management, this is obvious: the account balance at a given point in time is always the same, regardless of whether there were any deposits or withdrawals afterwards.
You can take advantage of this situation by saving the currently calculated state as a so-called snapshot. The entire history does not always have to be played back all along the way. Usually it is sufficient to start from the last snapshot and only look at the events that have been saved since then. As a snapshot only supplements history, and does not replace it, the older events are still available if they are required for an evaluation.
Learning from the past
A similar mechanism can also be used to precalculate special tables for reading data, similar to materialized views. In this case, there is no longer any need to replay, as there is already a table with the required data. However, this requires that they are always updated when a new event is saved.
It is particularly convenient that these read tables can also be completely recalculated if a different interpretation of the data is required. This means that not all evaluations that may be relevant need to be known from the very start: instead, they can also be calculated retrospectively if necessary. This reinterpretation of the data is possible for arbitrary queries as long as the original events provide the required semantics.
Event sourcing makes it possible to learn from the past in this way because, for example, the events of business processes can be analysed and interpreted on the basis of new findings or questions. However, this is only possible because events are enriched by semantics and intention, and they can only provide the necessary data in this way.
Implementing event sourcing
From a technical point of view, event sourcing is relatively simple: a storage for events is required, which only has to support adding and reading events. It is therefore a so-called append-only data store.
Of course, you can use a traditional relational database and limit its statements to INSERT
and SELECT
. Alternatively, there are also numerous other data storage options, such as NoSQL databases, XML files or simple text files that are stored directly in the file system.
Since compared to CRUD the statements UPDATE
and DELETE
are omitted, the access is easy to implement and allows a very good efficiency. The reason why the two actions UPDATE
and DELETE
are ignored is simply that the storage for events is intended to be a non-destructive data storage. Since the previous data is lost with every update and especially when removing records, these actions must not be used.
A data store that works according to this principle and is suitable for event sourcing is called an event store.
Using events as relational data
Incidentally, the data structure of an event store is actually relational. This seems to be a contradiction at first, since the concrete useful data of domain events hardly all use the same format. The point is, however, that this data is not relevant for the event store: all the event store needs for its work is the record ID, the order of the events and, if necessary, a timestamp. Which data is contained in an event is irrelevant for the event store.
The open source module sparbuch for Node.jsNode.js is an asynchronous event-driven JavaScript runtime and is the most effective when building scalable network applications. Node.js is free of locks, so there's no chance to dead-lock any process. implements such an event store and supports MongoDB and PostgreSQL as databases out of the box. PostgreSQL is the better and more powerful choice. If you take a look at the schema definition of the events
table, you will notice that all events can be processed using a single schema:
CREATE TABLE IF NOT EXISTS "${this.namespace}_events" (
"position" bigserial NOT NULL,
"aggregateId" uuid NOT NULL,
"revision" integer NOT NULL,
"event" jsonb NOT NULL,
"hasBeenPublished" boolean NOT NULL,
CONSTRAINT "${this.namespace}_events_pk" PRIMARY KEY("position"),
CONSTRAINT "${this.namespace}_aggregateId_revision" UNIQUE ("aggregateId", "revision")
);
The actual user data of the domain events are stored in the field event
, which is of the type jsonb
. This type is used in PostgreSQL to efficiently store arbitrary JSON data.
Similarly flexible is the schema definition of the snapshots
table, which also uses the data type jsonb
:
CREATE TABLE IF NOT EXISTS "${this.namespace}_snapshots" (
"aggregateId" uuid NOT NULL,
"revision" integer NOT NULL,
"state" jsonb NOT NULL,
CONSTRAINT "${this.namespace}_snapshots_pk" PRIMARY KEY("aggregateId", "revision")
);
What should be used when?
If you put it all together, this basically provides the criteria for deciding when to use event sourcing and when to use CRUD.
It is obvious that event sourcing is particularly suitable for those use cases where the traceability of changes is relevant. This may already be relevant for regular business data, but it is relevant for security-critical or sensitive data at least.
Rule 1: Event sourcing enables traceability of changes.
Instead of keeping a separate audit log, the individually stored events can be used to determine who could access which data at what point in time. Potentially, you can even go so far as to consider changes in the authorization of data as events, which also become part of the data set on the way. Since the domain and the security data merge in this way, this results in very powerful and reliable possibilities.
Rule 2: Event sourcing enables audit logs without any additional effort.
Event sourcing can also be extremely practical for debugging, as the legendary developer John Carmack already noted in 1998:
“The key point: Journaling of time along with other inputs turns a realtime
application into a batch process, with all the attendant benefits for quality
control and debugging. These problems, and many more, just go away. With a full input trace, you can accurately restart the session and play back to any point (conditional breakpoint on a frame number), or let a session play back at an arbitrarily degraded speed, but cover exactly the same code paths.”
An extremely interesting option of event sourcing is to be able to depict not only a reality, but also alternative realities. Since the calculated state depends on the interpretation of the individual events, events can be evaluated differently in retrospect. This also makes it possible to work with undo and redo steps, which you can get free of charge when using event sourcing without any further action.
Rule 3: Event sourcing makes it possible to reinterpret the past.
Since domain events do not always refer to all data in a record, event sourcing also supports partial updates. There are certainly two or even more events that are not in conflict with each other and can therefore all be applied at the same time. This way the conflict potential with simultaneous changes decreases dramatically, which in turn makes the use of the software with many users easier.
Rule 4: Event sourcing reduces the conflict potential of simultaneously occurring changes.
In addition, schema changes are much easier to implement because old versions of events can be updated during loading in case of doubt. The application only needs to be able to distinguish between two versions of an event type and contain additional code that transforms one version into the other. Complex and error-prone updates of entire tables such as ALTER TABLE
are completely omitted in event sourcing.
Rule 5: Event sourcing enables easy versioning of business logic.
Since the events can be used as data for a pub-sub-system in addition to pure data storage, event sourcing can also be used for integration with other systems that represent a different bounded context or even another domain.
Rule 6: Event sourcing is also suitable for integration with other systems.
When to use CRUD
Ultimately, only two aspects speak for CRUD. On the one hand, CRUD is useful if the data to be stored does not contain any semantics because it is only raw data. For example, this can be the case on the internet of things (IoT), where you have to capture and persist large amounts of sensor data. In this case, it makes sense to store data with the help of CRUD, evaluate them later, and then delete them if necessary. Event sourcing can hardly bring any advantages here.
Rule 7: CRUD is used to efficiently store raw data that does not contain semantics.
The second aspect that speaks for CRUD is the ability to check for duplicates via indices, for example. Since only the individual deltas are stored in event sourcing, it is much more difficult to determine whether two records contain the same values at a given point in time or not. A precalculated read table can help here, but this can be solved much more easily in CRUD. However, it is questionable whether the problem of uniqueness should be solved at the database level, or whether this is not rather a question of the business logic above it.
Rule 8: CRUD simplifies the search for duplicates.
The biggest criticism of CRUD, however, is the arbitrary restriction of one’s own language to just four verbs (create, read, update, delete), which can hardly do justice to a domain language. Steve Yegge already described in 2006 in his very worth reading blog entry Execution in the Kingdom of Nouns that it is precisely the verbs that are relevant for a living language.
These 9: Event sourcing focuses on professionalism and semantics, while CRUD focuses on technology.
Leaving the comfort zone
If one makes a comparison on the criteria and aspects mentioned above, CRUD scores alarmingly poorly. The ninth and final thesis sums up the problem in a nutshell: CRUD is about technology – but very few applications are created to solve technological problems. Instead, software is usually written to solve real-world domain problems. The complexity inherent in the respective domain lies in its matter of subject, which can hardly be comprehensively described with a handful of verbs. Here, CRUD simply falls short of the mark.
In addition, there is the loss of the entire history and the regular destruction of data through UPDATE
and DELETE
statements. Both are devastating for a later evaluation of business processes, since important findings can no longer be gained, as the way in which data is generated can no longer be traced.
However, the really biggest drawback of event sourcing has not yet been mentioned: Very few developers are familiar with event sourcing. CRUD has been known to practically everyone forever, which is why the use of event sourcing means that you have to leave your beloved comfort zone. You will win massively, but you first have to experience this gain to realize that it is worth the effort (which isn’t really that much, in fact).
If you use event sourcing for a while, for example in connection with CQRS and domain-driven design (DDD), the use of UPDATE
and DELETE
suddenly seems to be completely wrong, and you wonder how you could ever work with CRUD, and believe that you have a suitable data model in front of you.
This article is written by Golo Roden. The author’s bio:
“Founder and CTO of the native web. Prefers JS & Node.js, and has written the first German book on this topic, “Node. js & co.”. He works for various IT magazines, and manages several conferences.”