How to balance a database between OLAP and OLTP workflows

In this blog article, we will be comparing OLAP (Online Analytical Processing) and OLTP (Online Transaction Processing) databases. We will discuss their differences and similarities, discuss the ideal database architecture for a database that can handle both OLAP and OLTP workloads, and provide a synthesis of the two. By the end of this article, you will have a better understanding of the differences between OLAP and OLTP databases, as well as which database architecture is best suited for your needs. So let’s dive in!

OLTP (Online Transaction Processing) databases are used for fast read/write operations such as transactions, updates, and inserts. They are designed for fast retrieval of data in response to user queries and are optimized for read/write operations.

OLAP (Online Analytical Processing) databases are used for reporting and analysis, such as data mining and trend analysis. They are designed for multidimensional analysis of data and are optimized for read-only operations.

Most daily-use databases are OLTP databases, such as MySQL, PostgreSQL, and others.

OLAP databases use ETL (Extract, Transform, Load) processes to load data from an OLTP database. This allows for the data to be transformed into a format suitable for data analysis and reporting. The ETL process enables data to be extracted from the source database, transformed into the format desired for the analysis, and then loaded into the OLAP database.

However, wouldn’t it be nice to have a DBMS that can handle both, OLAP and OLTP workloads well?

Marrying both worlds

OLAP databases work best with columnar storage, which allows for large data sets to be stored and accessed quickly. This makes it ideal for data analysis and reporting. On the other hand, OLTP databases are optimized for record-wise data storage, which is better suited for read/write operations.

The ideal DBMS should be able to handle both OLAP and OLTP workloads well. It should be able to store and access data quickly and efficiently, while also being able to handle the read/write operations associated with OLTP workloads. It should also provide a robust set of features for data analysis and reporting.

So we have two kinds of data in the database:

A large set of old records where we want to do analysis on (called main storage)
A small set of recent changes that came from the OLTP workloads (called delta storage)

While the main storage can be compressed and optimized for read operations, the delta storage must be optimized for write operations.

So, the main storage is a column store while delta storage is a row store.

Whenever there is enough “old” data in the delta storage, the main storage can be rebuilt from main storage + delta storage and delta storage can start empty again.

Of course, queries on the database have to consider both: delta and main storage.

Implementation in MEMCP

MEMCP is an OpenSource GPLv3 relational DBMS that handles both OLAP and OLTP workloads well. It is written in Google’s language golang. It uses scheme as its internal query language.

Here’s a glimpse on the code:

type dataset map[string]scmer
type table struct {
        name string
        // main storage
        main_count uint // size of main storage
        columns map[string]ColumnStorage
        // delta storage
        inserts []dataset // items added to storage
        deletions map[uint]struct{} // items removed from main or inserts (based on main_count + i)
        // indexes
        indexes []StorageIndex
}

As you see, we have a rather fixed (main_count) set of main storage entries stored in columns and a delta storage made of inserts (a row storage) and deletions, an associative map which items have been deleted and thus must be ignored when traversing the storage.

The exact layout of the rebuild phase is described in the following article:

On designing an interface for columnar in-memory storage in golang

Marrying both worlds

Implementation in MEMCP

Designing a

How In-Memory

Comments are closed