Memory-Efficient Indices for In-Memory Storages

Most databases implement indices as a kind of tree. I will show you that columnar storages can do even better.

The most widely used kind of tree structure in databases is the B-Tree. A B-Tree is a n-ary tree whose nodes fit exactly into one „page“ – may it be a cache line of 64 bytes or a HDD page of 512 bytes or a memory page of 4K.

The advantage of btrees is that it balances the memory fetch time with the time spent on computing on a chunk of memory. A btree node is basically layouted like pointer|value|pointer|value|pointer|value|pointer

This means, half of the space of the index is wasted on pointers, the other half is wasted on values. Latter is problematic when values get wide – especially when dealing with strings.

We will provide a different approach for indices that are far more memory efficient and thus faster in in-memory storage engines: sorted list indices.

Sorted lists of Record IDs

In contrast to OLTP databases that need fast insert and delete procedures, in a separated main and delta storage, we don’t have to care about insert performance because we never insert into the main storage. When items are inserted, they are inserted into the delta storage and the main storage is rebuilt on demand. Deletions are also easy to handle. When scanning through a table, we can just ignore deleted items.

So instead of storing pointers and values (where on multidimensional indices, values can span multiple columns), we can just store the recordId. For a 10k items storage, this would require less then 20KiB of RAM ! The list of recordIds can then be sorted according to our sort criterion and compressed into a bit compressed integer storage.

Whenever we want to find a certain item or a range of items, we can now walk through the sorted list of recordIds and perform our binary search algorithms.

This is the data structure I came up with:

type StorageIndex struct {
        cols []string // sort equal-cols alphabetically, so similar conditions are canonical
        savings float64 // store the amount of time savings here -> add selectivity (outputted / size) on each
        sortedItems StorageInt // we can do binary searches here
        t *table
        inactive bool

Here some facts:

  • colsis the list of columns that are considered in the index
  • The list of sorted recordIds is stored in sortedItems
  • The items are sorted according to the value of cols[0]; in case of equality, cols[1] is considered and so on
  • In the beginning, we set inactive to true, so the index is considered but not built
  • In savings, we accumulate the amount of computation time we could save by having the index
  • As soon as savings exceeds a threshold, the index is built. Otherwise, we will do a full table scan.
  • The index implements func (s *StorageIndex) iterate(lower []scmer, upperLast scmer) chan uint which will start a index scan beginning at lower and will stop as soon as the last column reaches a value greater than upperLast. When inactive is set, the function will instead increase the savings value and stream all recordIds of the table

Finding the Perfect Index for a Certain Condition

In memcp, we use scheme – a lisp dialect – as the internal query plan language. This has one big advantage: Every piece of code is an array that can be examined by our optimizer.

We implemented a helper function func (t *table) iterateIndex(condition scmer) chan uint that takes a condition, finds the perfect index or creates it and then streams all recordIds that might fit into the condition into a ring buffer (chan uint)

So we define a datatype like:

type columnboundaries struct{
        col string
        lower scmer
        upper scmer

And then collect all conditions into a variable cols of the type []columnbounbdaries. For doing so, we implemented a recursive function traverseCondition(condition scmer) that scans the condition for equal?, <, >, <=, >= as well as BETWEEN ... AND ... and IN expressions. Whenever a column value is compared against a constant, we can fill in a columnboundaries object.

After collecting all boundaries, we sort cols according to the following rules:

  • equals? is more selective than < or > comparisons, so we put all boundaries with lower == upper first
  • An index scan can only consider one range column (as long as we don’t use spatial indices), so we can only keep one boundary with lower != upper
  • To save memory, we sort all equals?-boundaries alphabetically by the column name, so that similar SQL queries can use the same index
  • Instead of an index (col1, col2), the already existing index (col1, col2, col3) can be used as an alternative

As soon as the perfect index constellation is found, we create the StorageIndex object, but set it to inactive first. We then delegate the task of filling a chan uint to the StorageIndex object by calling func (s *StorageIndex) iterate(lower []scmer, upperLast scmer) chan uint

Automatic Index Building and Cost Calculation

We did some measurements on the performance of on-the-fly index building:

> (scan "PLZ" (lambda (Ort) (equal? Ort "Neugersdorf")) (lambda (PLZ Ort) (print PLZ " - " Ort)))
02727 - Neugersdorf
==> "8.275433ms"
> (scan "PLZ" (lambda (Ort) (equal? Ort "Neugersdorf")) (lambda (PLZ Ort) (print PLZ " - " Ort)))
building index on PLZ over [Ort]
02727 - Neugersdorf
==> "18.187862ms"
> (scan "PLZ" (lambda (Ort) (equal? Ort "Neugersdorf")) (lambda (PLZ Ort) (print PLZ " - " Ort)))
02727 - Neugersdorf
==> "43.871µs"
> (scan "PLZ" (lambda (Ort) (equal? Ort "Neugersdorf")) (lambda (PLZ Ort) (print PLZ " - " Ort)))
02727 - Neugersdorf
==> "45.681µs"
> (scan "PLZ" (lambda (Ort) (equal? Ort "Neugersdorf")) (lambda (PLZ Ort) (print PLZ " - " Ort)))
02727 - Neugersdorf
==> "39.021µs"

The first query did an unoptimized table scan with 8.3ms

The second query did the index build with 18.2ms

The third query already uses the index and only needs 42µs – this is a speedup of almost 200x!

So you see: A index build is about twice the cost as a full table scan. This means, as soon as a index is used the second time, we can build the index and remove the inactive flag. When we increase savings by 1.0 every time the index is used and build the index at a threshold of 2.0, we will have the optimal heuristic for index creation.

When a table is rebuild, we can of course take over our measurements stored in the savings variable to start the index build a bit earlier.


Durch die weitere Nutzung der Seite stimmst du der Verwendung von Cookies zu. Weitere Informationen

Die Cookie-Einstellungen auf dieser Website sind auf "Cookies zulassen" eingestellt, um das beste Surferlebnis zu ermöglichen. Wenn du diese Website ohne Änderung der Cookie-Einstellungen verwendest oder auf "Akzeptieren" klickst, erklärst du sich damit einverstanden.