{"id":4834,"date":"2023-01-21T10:49:14","date_gmt":"2023-01-21T09:49:14","guid":{"rendered":"https:\/\/launix.de\/launix\/?p=4834"},"modified":"2023-03-02T10:26:44","modified_gmt":"2023-03-02T09:26:44","slug":"on-compressing-null-values-in-bit-compressed-integer-storages","status":"publish","type":"post","link":"https:\/\/launix.de\/launix\/en\/on-compressing-null-values-in-bit-compressed-integer-storages\/","title":{"rendered":"On Compressing NULL values in bit-compressed Integer Storages"},"content":{"rendered":"<p>Usually, databases store NULL values in form of bitmasks. In this case, each value eats up 1 bit for the possibility to become NULL. I will prove that we can do better.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p>Uncompressed column stores of integers are a continuous column of i.e. 64 bit integers. The problem with 64 bit integers is that every possible value has a meaning. You cannot just take one value (i.e. MAXINT-1) and declare it to be NULL. Malicious users could smuggle that value in and turn a value NULL potentilly breaking the software, in worst case introducing a security risk. So the only chance to decode NULL in 64 bit integers is to introduce 65 bits.<\/p>\n\n\n\n<p>In bit-compressed integer storages however, we exactly know the value range of our stored values. Whenever we know what the highest value is that will be stored into our storage, we can declare the next highest number to be NULL. This is how we do the detection:<\/p>\n\n\n\n<pre><b>diff --git a\/storage\/storage-int.go b\/storage\/storage-int.go<\/b>\n<b>index 7a9dff0..4fed6d3 100644<\/b>\n<b>--- a\/storage\/storage-int.go<\/b>\n<b>+++ b\/storage\/storage-int.go<\/b>\n<font color=\"#2AA1B3\">@@ -24,6 +24,8 @@<\/font> type StorageInt struct {\n        chunk []uint64\n        bitsize uint8\n        hasNegative bool\n<font color=\"#26A269\">+<\/font>       <font color=\"#26A269\">hasNull bool<\/font>\n<font color=\"#26A269\">+<\/font>       <font color=\"#26A269\">null uint64 \/\/ which value is null<\/font>\n }\n<\/pre>\n<pre><font color=\"#2AA1B3\">@@ -82,7 +92,15 @@<\/font>\n func (s *StorageInt) scan(i uint, value scm.Scmer) {\n        \/\/ storage is so simple, dont need scan\n<font color=\"#26A269\">+<\/font>       <font color=\"#26A269\">if value == nil {<\/font>\n<font color=\"#26A269\">+<\/font>               <font color=\"#26A269\">s.hasNull = true<\/font>\n<font color=\"#26A269\">+<\/font>               <font color=\"#26A269\">return<\/font>\n<font color=\"#26A269\">+<\/font>       <font color=\"#26A269\">}<\/font>\n        v := toInt(value)\n<font color=\"#26A269\">+<\/font>       <font color=\"#26A269\">if v &gt;= int64(s.null) {<\/font>\n<font color=\"#26A269\">+<\/font>               <font color=\"#26A269\">\/\/ mark 1+highest value as null<\/font>\n<font color=\"#26A269\">+<\/font>               <font color=\"#26A269\">s.null = uint64(v) + 1<\/font>\n<font color=\"#26A269\">+<\/font>       <font color=\"#26A269\">}<\/font>\n        if v &lt; 0 {\n                s.hasNegative = true\n                v = -v\n<\/pre>\n<pre><font color=\"#2AA1B3\">@@ -93,20 +111,32 @@<\/font>\n func (s *StorageInt) init(i uint) {\n<font color=\"#C01C28\">-       \/\/ allocate<\/font>\n<font color=\"#26A269\">+<\/font>       <font color=\"#26A269\">if s.hasNull {<\/font>\n<font color=\"#26A269\">+<\/font>               <font color=\"#26A269\">\/\/ need an extra bit because of null??<\/font>\n<font color=\"#26A269\">+<\/font>               <font color=\"#26A269\">l := uint8(bits.Len64(uint64(s.null)))<\/font>\n<font color=\"#26A269\">+<\/font>               <font color=\"#26A269\">if l &gt; s.bitsize {<\/font>\n<font color=\"#26A269\">+<\/font>                       <font color=\"#26A269\">s.bitsize = l<\/font>\n<font color=\"#26A269\">+<\/font>               <font color=\"#26A269\">}<\/font>\n<font color=\"#26A269\">+<\/font>       <font color=\"#26A269\">}<\/font>\n        if s.hasNegative {\n                s.bitsize = s.bitsize + 1\n        }\n<\/pre>\n\n\n\n<p>The last patch is to ensure that if we have e.g. booleans with 0 and 1 (1 bit), having NULL will expand the storage to 2 bits. Otherwise, we would have an encodign problem.<\/p>\n\n\n\n<p>The read operation on the storage is fairly easy:<\/p>\n\n\n\n<pre> func (s *StorageInt) getValue(i uint) scm.Scmer {\n<font color=\"#C01C28\">-       if (s.hasNegative) {<\/font>\n<font color=\"#C01C28\">-               return scm.Number(s.getValueInt(i))<\/font>\n<font color=\"#C01C28\">-       } else {<\/font>\n<font color=\"#C01C28\">-               return scm.Number(s.getValueUInt(i))<\/font>\n<font color=\"#26A269\">+<\/font>       <font color=\"#26A269\">if (s.hasNegative) { \/\/ with sign expansion<\/font>\n<font color=\"#26A269\">+<\/font>               <font color=\"#26A269\">v := s.getValueInt(i)<\/font>\n<font color=\"#26A269\">+<\/font>               <font color=\"#26A269\">if s.hasNull &amp;&amp; uint64(v) == s.null {<\/font>\n<font color=\"#26A269\">+<\/font>                       <font color=\"#26A269\">return nil<\/font>\n<font color=\"#26A269\">+<\/font>               <font color=\"#26A269\">}<\/font>\n<font color=\"#26A269\">+<\/font>               <font color=\"#26A269\">return scm.Number(v)<\/font>\n<font color=\"#26A269\">+<\/font>       <font color=\"#26A269\">} else { \/\/ without sign expansion<\/font>\n<font color=\"#26A269\">+<\/font>               <font color=\"#26A269\">v := s.getValueUInt(i)<\/font>\n<font color=\"#26A269\">+<\/font>               <font color=\"#26A269\">if s.hasNull &amp;&amp; v == s.null {<\/font>\n<font color=\"#26A269\">+<\/font>                       <font color=\"#26A269\">return nil<\/font>\n<font color=\"#26A269\">+<\/font>               <font color=\"#26A269\">}<\/font>\n<font color=\"#26A269\">+<\/font>               <font color=\"#26A269\">return scm.Number(v)<\/font>\n        }\n }\n<\/pre>\n\n\n\n<p>So we found a way to encode NULL inside a bit compressed integer storage as a single value.<\/p>\n\n\n\n<p>I measured the RAM usage of our test workload. Before the NULL implementation, some columns of our biggest table had to use <code>StorageSCMER<\/code> because of the NULL values. This implementation took 23 MB of RAM.<\/p>\n\n\n\n<p>With the new NULL implementation, these columns have been able to be encoded in <code>StorageInt<\/code> which made the workload occupy only 16 MiB of RAM. <strong>That&#8217;s a 40% savings in overall memory!<\/strong><\/p>","protected":false},"excerpt":{"rendered":"<p>Usually, databases store NULL values in form of bitmasks. In this case, each value eats up 1 bit for the possibility to become NULL. I will prove that we can do better.<\/p>","protected":false},"author":2,"featured_media":4836,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":2,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","_uag_custom_page_level_css":"","footnotes":""},"categories":[129,128],"tags":[],"class_list":["post-4834","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-memcp","category-programming","single-item"],"featured_image_urls_v2":{"full":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920.jpg",1920,1080,false],"thumbnail":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920-150x150.jpg",150,150,true],"medium":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920-300x169.jpg",300,169,true],"medium_large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920-768x432.jpg",751,422,true],"large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920-1024x576.jpg",751,422,true],"1536x1536":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920-1536x864.jpg",1536,864,true],"2048x2048":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920.jpg",1920,1080,false],"trp-custom-language-flag":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920.jpg",18,10,false],"xs-thumb":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920-64x64.jpg",64,64,true],"appku-shop-single":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920.jpg",620,349,false]},"post_excerpt_stackable_v2":"<p>Usually, databases store NULL values in form of bitmasks. In this case, each value eats up 1 bit for the possibility to become NULL. I will prove that we can do better. Uncompressed column stores of integers are a continuous column of i.e. 64 bit integers. The problem with 64 bit integers is that every possible value has a meaning. You cannot just take one value (i.e. MAXINT-1) and declare it to be NULL. Malicious users could smuggle that value in and turn a value NULL potentilly breaking the software, in worst case introducing a security risk. So the only&hellip;<\/p>\n","category_list_v2":"<a href=\"https:\/\/launix.de\/launix\/en\/category\/memcp\/\" rel=\"category tag\">MemCP<\/a>, <a href=\"https:\/\/launix.de\/launix\/en\/category\/programming\/\" rel=\"category tag\">Programming<\/a>","author_info_v2":{"name":"Carl-Philip H\u00e4nsch","url":"https:\/\/launix.de\/launix\/en\/author\/carli\/"},"comments_num_v2":"0 comments","uagb_featured_image_src":{"full":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920.jpg",1920,1080,false],"thumbnail":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920-150x150.jpg",150,150,true],"medium":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920-300x169.jpg",300,169,true],"medium_large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920-768x432.jpg",751,422,true],"large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920-1024x576.jpg",751,422,true],"1536x1536":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920-1536x864.jpg",1536,864,true],"2048x2048":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920.jpg",1920,1080,false],"trp-custom-language-flag":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920.jpg",18,10,false],"xs-thumb":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920-64x64.jpg",64,64,true],"appku-shop-single":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/room-6614302_1920.jpg",620,349,false]},"uagb_author_info":{"display_name":"Carl-Philip H\u00e4nsch","author_link":"https:\/\/launix.de\/launix\/en\/author\/carli\/"},"uagb_comment_info":0,"uagb_excerpt":"Usually, databases store NULL values in form of bitmasks. In this case, each value eats up 1 bit for the possibility to become NULL. I will prove that we can do better.","_links":{"self":[{"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/posts\/4834","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/comments?post=4834"}],"version-history":[{"count":2,"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/posts\/4834\/revisions"}],"predecessor-version":[{"id":4837,"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/posts\/4834\/revisions\/4837"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/media\/4836"}],"wp:attachment":[{"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/media?parent=4834"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/categories?post=4834"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/tags?post=4834"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}