{"id":4767,"date":"2023-01-19T13:11:17","date_gmt":"2023-01-19T12:11:17","guid":{"rendered":"https:\/\/launix.de\/launix\/?p=4767"},"modified":"2023-07-11T12:10:08","modified_gmt":"2023-07-11T10:10:08","slug":"memory-efficient-indices-for-in-memory-storages","status":"publish","type":"post","link":"https:\/\/launix.de\/launix\/memory-efficient-indices-for-in-memory-storages\/","title":{"rendered":"Memory-Efficient Indices for In-Memory Storages"},"content":{"rendered":"\n<p>Most databases implement indices as a kind of tree. I will show you that columnar storages can do even better.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p>The most widely used kind of tree structure in databases is the B-Tree. A B-Tree is a n-ary tree whose nodes fit exactly into one &#8220;page&#8221; &#8211; may it be a cache line of 64 bytes or a HDD page of 512 bytes or a memory page of 4K.<\/p>\n\n\n\n<p>The advantage of btrees is that it balances the memory fetch time with the time spent on computing on a chunk of memory. A btree node is basically layouted like <code>pointer|value|pointer|value|pointer|value|pointer<\/code><\/p>\n\n\n\n<p>This means, half of the space of the index is wasted on pointers, the other half is wasted on values. Latter is problematic when values get wide &#8211; especially when dealing with strings.<\/p>\n\n\n\n<p>We will provide a different approach for indices that are far more memory efficient and thus faster in in-memory storage engines: <strong>sorted list indices<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Sorted lists of Record IDs<\/h2>\n\n\n\n<p>In contrast to OLTP databases that need fast insert and delete procedures, in a <a href=\"https:\/\/launix.de\/launix\/how-to-balance-a-database-between-olap-and-oltp-workflows\/\" target=\"_blank\" aria-label=\"separated main and delta storage (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">separated main and delta storage<\/a>, we don&#8217;t have to care about insert performance because we never insert into the main storage. When items are inserted, they are inserted into the delta storage and the main storage is rebuilt on demand. Deletions are also easy to handle. When scanning through a table, we can just ignore deleted items.<\/p>\n\n\n\n<p>So instead of storing pointers and values (where on multidimensional indices, values can span multiple columns), we can just store the recordId. For a 10k items storage, this would require less then 20KiB of RAM ! The list of recordIds can then be sorted according to our sort criterion and compressed into a <a href=\"https:\/\/launix.de\/launix\/on-designing-an-interface-for-columnar-in-memory-storage-in-golang\/\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">bit compressed integer storage<\/a>.<\/p>\n\n\n\n<p>Whenever we want to find a certain item or a range of items, we can now walk through the sorted list of recordIds and perform our binary search algorithms.<\/p>\n\n\n\n<p>This is the data structure I came up with:<\/p>\n\n\n\n<pre><font color=\"#E9AD0C\">type<\/font> StorageIndex <font color=\"#E9AD0C\">struct<\/font> {\n        cols []<font color=\"#87FFAF\">string<\/font> <font color=\"#33C7DE\">\/\/ sort equal-cols alphabetically, so similar conditions are canonical<\/font>\n        savings <font color=\"#87FFAF\">float64<\/font> <font color=\"#33C7DE\">\/\/ store the amount of time savings here -&gt; add selectivity (outputted \/ size) on each<\/font>\n        sortedItems StorageInt <font color=\"#33C7DE\">\/\/ we can do binary searches here<\/font>\n        t *table\n        inactive <font color=\"#87FFAF\">bool<\/font>\n}\n<\/pre>\n\n\n\n<p>Here some facts:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>cols<\/code>is the list of columns that are considered in the index<\/li>\n\n\n\n<li>The list of sorted recordIds is stored in <code>sortedItems<\/code><\/li>\n\n\n\n<li>The items are sorted according to the value of <code>cols[0]<\/code>; in case of equality, <code>cols[1]<\/code> is considered and so on<\/li>\n\n\n\n<li>In the beginning, we set <code>inactive<\/code> to true, so the index is <em>considered<\/em> but not built<\/li>\n\n\n\n<li>In <code>savings<\/code>, we accumulate the amount of computation time we could save by having the index<\/li>\n\n\n\n<li>As soon as <code>savings<\/code> exceeds a threshold, the index is built. Otherwise, we will do a full table scan.<\/li>\n\n\n\n<li>The index implements <code>func (s *StorageIndex) iterate(lower []scmer, upperLast scmer) chan uint<\/code> which will start a index scan beginning at <code>lower<\/code> and will stop as soon as the last column reaches a value greater than <code>upperLast<\/code>. When <code>inactive<\/code> is set, the function will instead increase the <code>savings<\/code> value and stream all recordIds of the table<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Finding the Perfect Index for a Certain Condition<\/h2>\n\n\n\n<p>In memcp, we use <a aria-label=\"scheme (opens in a new tab)\" href=\"https:\/\/launix.de\/launix\/designing-a-programming-language-for-distributed-systems-and-highly-parallel-algorithms\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\">scheme<\/a> &#8211; a lisp dialect &#8211; as the internal query plan language. This has one big advantage: Every piece of code is an array that can be examined by our optimizer.<\/p>\n\n\n\n<p>We implemented a helper function <code>func (t *table) iterateIndex(condition scmer) chan uint<\/code> that takes a condition, finds the perfect index or creates it and then streams all recordIds that <em>might<\/em> fit into the condition into a ring buffer (<code>chan uint<\/code>)<\/p>\n\n\n\n<p>So we define a datatype like:<\/p>\n\n\n\n<pre><font color=\"#E9AD0C\">type<\/font> columnboundaries <font color=\"#E9AD0C\">struct<\/font>{\n        col <font color=\"#87FFAF\">string<\/font>\n        lower scmer\n        upper scmer\n}\n<\/pre>\n\n\n\n<p>And then collect all conditions into a variable <code>cols<\/code> of the type <code>[]columnbounbdaries<\/code>. For doing so, we implemented a recursive function <code>traverseCondition(condition scmer)<\/code> that scans the condition for <code>equal?<\/code>, <code>&lt;<\/code>, <code>&gt;<\/code>, <code>&lt;=<\/code>, <code>&gt;=<\/code> as well as <code>BETWEEN ... AND ...<\/code> and <code>IN<\/code> expressions. Whenever a column value is compared against a constant, we can fill in a <code>columnboundaries<\/code> object.<\/p>\n\n\n\n<p>After collecting all boundaries, we sort <code>cols<\/code> according to the following rules:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>equals?<\/code> is more selective than <code>&lt;<\/code> or <code>&gt;<\/code> comparisons, so we put all boundaries with <code>lower == upper<\/code> first<\/li>\n\n\n\n<li>An index scan can only consider one range column (as long as we don&#8217;t use spatial indices), so we can only keep one boundary with <code>lower != upper<\/code><\/li>\n\n\n\n<li>To save memory, we sort all <code>equals?<\/code>-boundaries alphabetically by the column name, so that similar SQL queries can use the same index<\/li>\n\n\n\n<li>Instead of an index (col1, col2), the already existing index (col1, col2, col3) can be used as an alternative<\/li>\n<\/ul>\n\n\n\n<p>As soon as the perfect index constellation is found, we create the <code>StorageIndex<\/code> object, but set it to <code>inactive<\/code> first. We then delegate the task of filling a <code>chan uint<\/code> to the <code>StorageIndex<\/code> object by calling <code>func (s *StorageIndex) iterate(lower []scmer, upperLast scmer) chan uint<\/code><\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Automatic Index Building and Cost Calculation<\/h2>\n\n\n\n<p>We did some measurements on the performance of on-the-fly index building:<\/p>\n\n\n\n<pre style=\"overflow: auto; white-space: pre;\">&gt; (scan \"PLZ\" (lambda (Ort) (equal? Ort \"Neugersdorf\")) (lambda (PLZ Ort) (print PLZ \" - \" Ort)))\n02727 - Neugersdorf\n==&gt; \"8.275433ms\"\n&gt; (scan \"PLZ\" (lambda (Ort) (equal? Ort \"Neugersdorf\")) (lambda (PLZ Ort) (print PLZ \" - \" Ort)))\nbuilding index on PLZ over [Ort]\n02727 - Neugersdorf\n==&gt; \"18.187862ms\"\n&gt; (scan \"PLZ\" (lambda (Ort) (equal? Ort \"Neugersdorf\")) (lambda (PLZ Ort) (print PLZ \" - \" Ort)))\n02727 - Neugersdorf\n==&gt; \"43.871\u00b5s\"\n&gt; (scan \"PLZ\" (lambda (Ort) (equal? Ort \"Neugersdorf\")) (lambda (PLZ Ort) (print PLZ \" - \" Ort)))\n02727 - Neugersdorf\n==&gt; \"45.681\u00b5s\"\n&gt; (scan \"PLZ\" (lambda (Ort) (equal? Ort \"Neugersdorf\")) (lambda (PLZ Ort) (print PLZ \" - \" Ort)))\n02727 - Neugersdorf\n==&gt; \"39.021\u00b5s\"\n&gt; \n<\/pre>\n\n\n\n<p>The first query did an unoptimized table scan with 8.3ms<\/p>\n\n\n\n<p>The second query did the index build with 18.2ms<\/p>\n\n\n\n<p>The third query already uses the index and only needs 42\u00b5s &#8211; this is a speedup of almost 200x!<\/p>\n\n\n\n<p>So you see: A index build is about twice the cost as a full table scan. This means, as soon as a index is used the second time, we can build the index and remove the <code>inactive<\/code> flag. When we increase <code>savings<\/code> by 1.0 every time the index is used and build the index at a threshold of 2.0, we will have the optimal heuristic for index creation.<\/p>\n\n\n\n<p>When a table is rebuild, we can of course take over our measurements stored in the <code>savings<\/code> variable to start the index build a bit earlier.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Most databases implement indices as a kind of tree. I will show you that columnar storages can do even better.<\/p>\n","protected":false},"author":2,"featured_media":4768,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":4,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","_uag_custom_page_level_css":"","footnotes":""},"categories":[129],"tags":[],"class_list":["post-4767","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-memcp","single-item"],"featured_image_urls_v2":{"full":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920.jpg",1920,1280,false],"thumbnail":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920-150x150.jpg",150,150,true],"medium":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920-300x200.jpg",300,200,true],"medium_large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920-768x512.jpg",751,501,true],"large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920-1024x683.jpg",751,501,true],"1536x1536":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920-1536x1024.jpg",1536,1024,true],"2048x2048":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920.jpg",1920,1280,false],"trp-custom-language-flag":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920.jpg",18,12,false],"xs-thumb":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920-64x64.jpg",64,64,true],"appku-shop-single":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920.jpg",620,413,false]},"post_excerpt_stackable_v2":"<p>Most databases implement indices as a kind of tree. I will show you that columnar storages can do even better. The most widely used kind of tree structure in databases is the B-Tree. A B-Tree is a n-ary tree whose nodes fit exactly into one &#8220;page&#8221; &#8211; may it be a cache line of 64 bytes or a HDD page of 512 bytes or a memory page of 4K. The advantage of btrees is that it balances the memory fetch time with the time spent on computing on a chunk of memory. A btree node is basically layouted like pointer|value|pointer|value|pointer|value|pointer&hellip;<\/p>\n","category_list_v2":"<a href=\"https:\/\/launix.de\/launix\/category\/memcp\/\" rel=\"category tag\">MemCP<\/a>","author_info_v2":{"name":"Carl-Philip H\u00e4nsch","url":"https:\/\/launix.de\/launix\/author\/carli\/"},"comments_num_v2":"0 comments","uagb_featured_image_src":{"full":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920.jpg",1920,1280,false],"thumbnail":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920-150x150.jpg",150,150,true],"medium":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920-300x200.jpg",300,200,true],"medium_large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920-768x512.jpg",751,501,true],"large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920-1024x683.jpg",751,501,true],"1536x1536":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920-1536x1024.jpg",1536,1024,true],"2048x2048":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920.jpg",1920,1280,false],"trp-custom-language-flag":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920.jpg",18,12,false],"xs-thumb":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920-64x64.jpg",64,64,true],"appku-shop-single":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/concept-18290_1920.jpg",620,413,false]},"uagb_author_info":{"display_name":"Carl-Philip H\u00e4nsch","author_link":"https:\/\/launix.de\/launix\/author\/carli\/"},"uagb_comment_info":0,"uagb_excerpt":"Most databases implement indices as a kind of tree. I will show you that columnar storages can do even better.","_links":{"self":[{"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/posts\/4767","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/comments?post=4767"}],"version-history":[{"count":6,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/posts\/4767\/revisions"}],"predecessor-version":[{"id":4847,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/posts\/4767\/revisions\/4847"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/media\/4768"}],"wp:attachment":[{"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/media?parent=4767"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/categories?post=4767"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/tags?post=4767"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}