{"id":8128,"date":"2026-02-14T01:55:12","date_gmt":"2026-02-14T00:55:12","guid":{"rendered":"https:\/\/launix.de\/launix\/?p=8128"},"modified":"2026-02-14T01:55:13","modified_gmt":"2026-02-14T00:55:13","slug":"how-memcp-achieves-near-shannon-compression-for-enum-columns","status":"publish","type":"post","link":"https:\/\/launix.de\/launix\/how-memcp-achieves-near-shannon-compression-for-enum-columns\/","title":{"rendered":"How MemCP Achieves Near-Shannon Compression for Enum Columns"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">MemCP is a high-performance, in-memory, column-oriented database.<br>Columnar storage means every column is stored as a contiguous array, which opens up compression opportunities that row-based databases cannot exploit.<br>One of the most impactful cases is <strong>enum-like columns<\/strong>: booleans, nullable flags, status codes, categories &#8212; any column where a small set of values repeats millions of times.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This post compares three encoding strategies for such columns, using measurements from our microbenchmark suite running on an AMD Ryzen 9 7900X3D.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Three Encodings<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Byte Encoding (Traditional Databases)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The simplest approach: store one byte per value.<br>A MySQL <code>TINYINT<\/code> or <code>BOOL<\/code> column uses exactly this.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Storage<\/strong>: 8 bits\/element, regardless of distribution<\/li>\n\n\n\n<li><strong>Access<\/strong>: O(1) direct array indexing<\/li>\n\n\n\n<li><strong>Waste<\/strong>: For a boolean column that is 99% <code>false<\/code>, 7.92 of those 8 bits carry zero information<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">For 60,000 booleans: <strong>60,000 bytes<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">PFOR Bit-Packing (MemCP old)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Patched Frame-Of-Reference packs each value into the minimum number of bits.<br>For k distinct values, that is ceil(log2(k)) bits per element.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Storage<\/strong>: ceil(log2(k)) bits\/element<\/li>\n\n\n\n<li><strong>Access<\/strong>: O(1) with bit-shift arithmetic<\/li>\n\n\n\n<li><strong>Improvement<\/strong>: A 2-value boolean column drops from 8 to 1 bit\/element &#8212; an 8x improvement<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">For 60,000 booleans: <strong>7,500 bytes<\/strong> (1 bit each).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">But PFOR is blind to the actual distribution.<br>A column that is 99% <code>false<\/code> and 1% <code>true<\/code> still uses 1 bit per element, even though information theory says the optimal cost is only 0.081 bits\/element.<br>PFOR wastes 12x the theoretical minimum.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Bitcompress: k-ary rANS (MemCP new)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">MemCP&#8217;s new encoding uses <strong>rANS (range Asymmetric Numeral Systems)<\/strong>, an entropy coder that approaches Shannon&#8217;s theoretical limit.<br>Instead of assigning fixed-width bit slots, rANS encodes each symbol with a cost proportional to its information content: -log2(probability) bits.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Storage<\/strong>: approaches -sum(p_i * log2(p_i)) bits\/element (Shannon entropy)<\/li>\n\n\n\n<li><strong>Access<\/strong>: O(1) sequential, O(log n) random via 2-level jump index<\/li>\n\n\n\n<li><strong>Key advantage<\/strong>: the rarer a value is, the more bits it gets; the more common, the fewer &#8212; automatically<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">For 60,000 booleans (99% false):<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Encoding<\/th><th>Size<\/th><th>bits\/elem<\/th><th>vs Shannon<\/th><\/tr><\/thead><tbody><tr><td>Byte<\/td><td>60,000 B<\/td><td>8.000<\/td><td>99x<\/td><\/tr><tr><td>PFOR<\/td><td>7,500 B<\/td><td>1.000<\/td><td>12x<\/td><\/tr><tr><td><strong>Bitcompress<\/strong><\/td><td><strong>1,086 B<\/strong><\/td><td><strong>0.088<\/strong><\/td><td><strong>1.09x<\/strong><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">That is <strong>55x smaller than PFOR<\/strong> and <strong>within 9% of the theoretical limit<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How Bitcompress Works<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The codec operates on 8-bit precision.<br>The 256-slot interval [0, 256) is partitioned proportionally to each symbol&#8217;s probability.<br>A 99\/1 boolean gets slots [0, 253) for <code>false<\/code> and [253, 256) for <code>true<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Encoding<\/strong> packs symbols into a 64-bit buffer via modular arithmetic:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>bufferx, rest = fastDivMod(buffer, width, invWidth)\nbuffer = (bufferx &lt;&lt; 8) + lo + rest<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Division is replaced by multiplication using a precomputed inverse (Barrett reduction via <code>bits.Mul64<\/code>), making the hot path division-free.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Decoding<\/strong> extracts the bottom 8 bits to identify the symbol, then reverses the arithmetic:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>slice = buffer &amp; 0xFF\nsymbol = lookup(slice)\nbuffer = (buffer &gt;&gt; 8) * width + slice - lo<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">When the buffer would overflow, it is flushed as a 64-bit chunk.<br>A 2-level jump index enables O(log n) random access into the chunk stream:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>L2 (uint16)<\/strong>: per-chunk cumulative element count, relative to L1 base<\/li>\n\n\n\n<li><strong>L1 (uint32)<\/strong>: absolute cumulative count every K chunks<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">This supports billions of elements while using only 2 bytes of index per chunk.<br>A sequential decode cache makes forward iteration O(1) per element.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Measured Performance<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">All benchmarks on AMD Ryzen 9 7900X3D, Go 1.24, <code>-benchtime=2s<\/code>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Access Latency (60k elements, boolean 99\/1 skew)<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Access Pattern<\/th><th>Latency<\/th><\/tr><\/thead><tbody><tr><td>Sequential scan<\/td><td>4 ns\/elem<\/td><\/tr><tr><td>Sparse scan (50% density)<\/td><td>9 ns\/elem<\/td><\/tr><tr><td>Sparse scan (10% density)<\/td><td>16 ns\/elem<\/td><\/tr><tr><td>Random access<\/td><td>674 ns\/elem<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Sequential and sparse access &#8212; the dominant patterns in analytical queries (WHERE filters, aggregations) &#8212; are extremely fast.<br>Random access is slower because it requires binary search plus decoding within the target chunk, but this pattern is rare in columnar analytics.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Scaling: 1 Million Elements<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The 2-level jump index scales cleanly.<br>For 1M booleans (99% false, 1% true):<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Encoding<\/th><th>Total Size<\/th><th>bits\/elem<\/th><\/tr><\/thead><tbody><tr><td>Byte<\/td><td>1,000,000 B<\/td><td>8.000<\/td><\/tr><tr><td>PFOR<\/td><td>125,000 B<\/td><td>1.000<\/td><\/tr><tr><td><strong>Bitcompress<\/strong><\/td><td><strong>14,270 B<\/strong><\/td><td><strong>0.088<\/strong><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>1 million booleans in 14 KB.<\/strong> That is 70x smaller than PFOR.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Sequential scan still runs at 4 ns\/element &#8212; the same speed as at 60K.<br>The jump index adds only 2,934 bytes (1,379 L2 entries + 44 L1 entries).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Import\/Export Throughput<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Operation<\/th><th>5k elements<\/th><\/tr><\/thead><tbody><tr><td>Import (encode)<\/td><td>45 us<\/td><\/tr><tr><td>Export (decode)<\/td><td>22 us<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Encoding is compute-bound (one multiply per element); decoding is faster because it avoids division entirely.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Where Bitcompress Wins Big<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The advantage of entropy coding grows with skew.<br>Here is the compression ratio across different distributions:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Distribution<\/th><th>Byte<\/th><th>PFOR<\/th><th>Bitcompress<\/th><th>Shannon<\/th><\/tr><\/thead><tbody><tr><td>Boolean 50\/50 (uniform)<\/td><td>8.000<\/td><td>1.000<\/td><td>1.094<\/td><td>1.000<\/td><\/tr><tr><td>Boolean 95\/5<\/td><td>8.000<\/td><td>1.000<\/td><td>0.320<\/td><td>0.286<\/td><\/tr><tr><td>Boolean 99\/1<\/td><td>8.000<\/td><td>1.000<\/td><td>0.088<\/td><td>0.081<\/td><\/tr><tr><td>4-way 80\/15\/4\/1<\/td><td>8.000<\/td><td>2.000<\/td><td>1.011<\/td><td>0.920<\/td><\/tr><tr><td>5-way skewed<\/td><td>8.000<\/td><td>3.000<\/td><td>1.994<\/td><td>1.817<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The pattern: <strong>the more skewed the distribution, the bigger the gap between PFOR and Bitcompress.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For the uniform case (50\/50 boolean), Bitcompress is slightly worse than PFOR due to chunk overhead.<br>But real-world data is almost never uniform.<br>Nullable columns are overwhelmingly non-NULL.<br>Status columns cluster around a few common states.<br>Boolean flags are usually one value.<br>These are exactly the distributions where Bitcompress delivers 10-100x compression over PFOR.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Sweet Spot: Low-Entropy Enums in Large Databases<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Consider a production database with 10 million rows and a <code>is_deleted<\/code> boolean column that is 99.5% <code>false<\/code>.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Encoding<\/th><th>Column Size<\/th><\/tr><\/thead><tbody><tr><td>Byte (MySQL)<\/td><td>10 MB<\/td><\/tr><tr><td>PFOR (MemCP old)<\/td><td>1.25 MB<\/td><\/tr><tr><td><strong>Bitcompress (MemCP new)<\/strong><\/td><td><strong>~70 KB<\/strong><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">At 70 KB, the entire column fits in L1 cache.<br>At 1.25 MB, it spills to L2.<br>At 10 MB, it spills to L3 or main memory.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For an in-memory database like MemCP, this is not just a storage win &#8212; it is a <strong>speed win<\/strong>.<br>Less memory means fewer cache misses, which means faster scans, joins, and aggregations.<br>And because sequential decoding runs at 4 ns\/element, the decompression cost is negligible compared to the cache-miss latency it avoids.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><\/th><th>Byte<\/th><th>PFOR<\/th><th>Bitcompress<\/th><\/tr><\/thead><tbody><tr><td>Bits\/elem (99\/1 bool)<\/td><td>8.000<\/td><td>1.000<\/td><td>0.088<\/td><\/tr><tr><td>Compression vs Shannon<\/td><td>99x<\/td><td>12x<\/td><td>1.09x<\/td><\/tr><tr><td>Sequential access<\/td><td>1 ns<\/td><td>2 ns<\/td><td>4 ns<\/td><\/tr><tr><td>Supports skewed data<\/td><td>No benefit<\/td><td>No benefit<\/td><td>Massive benefit<\/td><\/tr><tr><td>Max symbols<\/td><td>256<\/td><td>any<\/td><td>256<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Bitcompress trades a small constant overhead on sequential access (4 ns vs 1-2 ns for uncompressed) for compression ratios that approach the theoretical limit.<br>For the skewed distributions that dominate real-world enum columns, this trade-off is overwhelmingly positive: smaller working sets, fewer cache misses, and faster end-to-end query execution.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The implementation is open source as part of <a href=\"https:\/\/memcp.org\">MemCP<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>MemCP is a high-performance, in-memory, column-oriented database.Columnar storage means every column is stored as a contiguous array, which opens up compression opportunities that row-based databases cannot exploit.One of the most impactful cases is enum-like columns: booleans, nullable flags, status codes, categories &#8212; any column where a small set of values repeats millions of times. This&#8230;<\/p>\n","protected":false},"author":2,"featured_media":8130,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","_uag_custom_page_level_css":"","footnotes":""},"categories":[129,128],"tags":[],"class_list":["post-8128","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-memcp","category-programming","single-item"],"featured_image_urls_v2":{"full":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11.png",1536,1024,false],"thumbnail":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11-150x150.png",150,150,true],"medium":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11-300x200.png",300,200,true],"medium_large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11-768x512.png",751,501,true],"large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11-1024x683.png",751,501,true],"1536x1536":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11.png",1536,1024,false],"2048x2048":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11.png",1536,1024,false],"trp-custom-language-flag":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11-18x12.png",18,12,true],"xs-thumb":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11-64x64.png",64,64,true],"appku-shop-single":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11-620x500.png",620,500,true]},"post_excerpt_stackable_v2":"<p>MemCP is a high-performance, in-memory, column-oriented database.Columnar storage means every column is stored as a contiguous array, which opens up compression opportunities that row-based databases cannot exploit.One of the most impactful cases is enum-like columns: booleans, nullable flags, status codes, categories &#8212; any column where a small set of values repeats millions of times. This post compares three encoding strategies for such columns, using measurements from our microbenchmark suite running on an AMD Ryzen 9 7900X3D. The Three Encodings Byte Encoding (Traditional Databases) The simplest approach: store one byte per value.A MySQL TINYINT or BOOL column uses exactly this. Storage:&hellip;<\/p>\n","category_list_v2":"<a href=\"https:\/\/launix.de\/launix\/category\/memcp\/\" rel=\"category tag\">MemCP<\/a>, <a href=\"https:\/\/launix.de\/launix\/category\/programming\/\" rel=\"category tag\">Programming<\/a>","author_info_v2":{"name":"Carl-Philip H\u00e4nsch","url":"https:\/\/launix.de\/launix\/author\/carli\/"},"comments_num_v2":"0 comments","uagb_featured_image_src":{"full":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11.png",1536,1024,false],"thumbnail":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11-150x150.png",150,150,true],"medium":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11-300x200.png",300,200,true],"medium_large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11-768x512.png",751,501,true],"large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11-1024x683.png",751,501,true],"1536x1536":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11.png",1536,1024,false],"2048x2048":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11.png",1536,1024,false],"trp-custom-language-flag":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11-18x12.png",18,12,true],"xs-thumb":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11-64x64.png",64,64,true],"appku-shop-single":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2026\/02\/5e9d5858-ec22-43fe-b3c2-e2ad9dfd2b11-620x500.png",620,500,true]},"uagb_author_info":{"display_name":"Carl-Philip H\u00e4nsch","author_link":"https:\/\/launix.de\/launix\/author\/carli\/"},"uagb_comment_info":0,"uagb_excerpt":"MemCP is a high-performance, in-memory, column-oriented database.Columnar storage means every column is stored as a contiguous array, which opens up compression opportunities that row-based databases cannot exploit.One of the most impactful cases is enum-like columns: booleans, nullable flags, status codes, categories &#8212; any column where a small set of values repeats millions of times. This...","_links":{"self":[{"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/posts\/8128","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/comments?post=8128"}],"version-history":[{"count":1,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/posts\/8128\/revisions"}],"predecessor-version":[{"id":8129,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/posts\/8128\/revisions\/8129"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/media\/8130"}],"wp:attachment":[{"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/media?parent=8128"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/categories?post=8128"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/tags?post=8128"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}