On designing an interface for columnar in-memory storage in golang

The advantages for columnar storages over row based storages are the ability for good in-memory compression (low memory usage) and cache locality when accessing only few columns (performance).

Designing an Interface

When designing an interface for a storage engine for an in-memory database, a lot of considerations have to be made. Here are the design goals:

  • The interface must be simple so that using it does not require implementing all kinds of corner cases (especially with datatypes)
  • The interface must allow fast value fetches for random data access
  • The interface must allow analyzing the data and finding the perfect storage and compression format

Here’s what I came up with:

type ColumnStorage interface {
        getValue(uint) scmer // read function
        // buildup functions 1) prepare 2) scan, 3) proposeCompression(), if != nil repeat at 1, 4) init, 5) build; all values are passed through twice
        // analyze
        prepare()
        scan(uint, scmer)
        proposeCompression() ColumnStorage

        // store
        init(uint)
        build(uint, scmer)
        finish()
}

The read interface is pretty simple: getValue(i) will read the column value at recordId i and return the scmer value. scmer is a kind of „any“ datatype that is used in our scheme interpreter to further process the data.

Data Analysis on Columnar Storages

The write interface is a bit more complicated and basically works in two passes:

  1. analyze phase:
    • prepare() is called
    • every future value that shall be stored will be passed to scan(recordId, value)
    • the columnar storage will analyze the data. If it knows a better storage format than its own class, it will return an other column container when calling proposeCompression()
    • when proposeCompression() returns another container, repeat step 1 analyze phase with the new container
    • otherwise proceed with the store phase
  2. store phase
    • init(size) is called where the storage can allocate the memory for its columns
    • for each value, build(recordId, value) is called to write the data to memory
    • for cleanup of possible reverse dictionaries that are only needed in the store phase, finish() is called

During analyze phase, statistics about the data can be collected. This allows a storage engine to check how wide the integers are or if there are lots of string duplicates.

The classes implementing that interface are then structured as follows:

  • StorageSCMER is a universal value storage; fast but needs lot of RAM
  • StorageSCMER analyzes a column whether it better fits the StorageString or StorageInt scheme; otherwise (i.e. for float64 values, stay at StorageSCMER)
  • StorageInt is a bit compressed integer storage. When analyzed values do not exceed i.e. 15, the storage will only eat up 4 bits per stored integer. The numbers are stored in an array of 64 bit integers which are shifted when reading and writing
  • StorageString is a dictionary compressed string storage using golang slices

With this interface we have built a powerful interface to implement any columnar storage compression format. We can support bit compressed integers as well as dictionaries.

Evaluation

We loaded a bunch of 55MiB of .jsonl files into our storage. This is our memory usage using the storage interface:

StorageMemory Usage
MySQL InnoDB original storage55 MiB Disk Space
Export from MySQL in .jsonl format55 MiB Disk Space
Imported .jsonl as []map[string]scmer into delta storage233 MiB RAM
Moved to main storage only using StorageSCMER81 MiB RAM
Using StorageInt for integer columns46 MiB RAM
Using StorageString for string columns58 MiB RAM
Using StorageInt and StorageString23 MiB RAM
Evaluation

As you see, RAM compressed columnar storage needs only about half the RAM compared to disk based InnoDB storage or .json files.

de_DEGerman

Durch die weitere Nutzung der Seite stimmst du der Verwendung von Cookies zu. Weitere Informationen

Die Cookie-Einstellungen auf dieser Website sind auf "Cookies zulassen" eingestellt, um das beste Surferlebnis zu ermöglichen. Wenn du diese Website ohne Änderung der Cookie-Einstellungen verwendest oder auf "Akzeptieren" klickst, erklärst du sich damit einverstanden.

Schließen