{"id":4705,"date":"2023-01-18T00:32:28","date_gmt":"2023-01-17T23:32:28","guid":{"rendered":"https:\/\/launix.de\/launix\/?p=4705"},"modified":"2023-07-11T12:10:13","modified_gmt":"2023-07-11T10:10:13","slug":"designing-a-programming-language-for-distributed-systems-and-highly-parallel-algorithms","status":"publish","type":"post","link":"https:\/\/launix.de\/launix\/designing-a-programming-language-for-distributed-systems-and-highly-parallel-algorithms\/","title":{"rendered":"Designing a programming language for distributed systems and highly parallel algorithms"},"content":{"rendered":"\n<p>Almost 99% of all newly invented are imperative programming languages. But imperative languages have one drawback: their parallelization is hard.<\/p>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\">Drawbacks of Imperative Programming Languages<\/h2>\n\n\n\n<p>But why is this the case for imperative languages?<\/p>\n\n\n\n<p>The answer is: state. The concept of an imperative language is that commands are executed which change the content of variables or complex objects in the memory. When trying to create an optimizing compiler that from itself finds parallelizable parts in the code, the compiler has to keep track of data dependencies and the random side effects of each command and function call.<\/p>\n\n\n\n<p>The possibly simplest solution to this problem is to tell the compiler exactly which loops are parallelizable. This however forces the developer to write nearly side-effect-free code. So why not going the pure way &#8211; to design a programming language that <strong>does not allow side-effects<\/strong>?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Functional World<\/h2>\n\n\n\n<p>Welcome to the world of pure-functional programming languages. <strong>Pure functional programming languages are pure when every function will compute its result only and only from its inputs.<\/strong> This makes up a great basis for highly parallel map-reduce algorithms like we need in our clusterable in-memory database.<\/p>\n\n\n\n<p>I took the scheme interpreter of <a aria-label=\" (opens in a new tab)\" href=\"https:\/\/pkelchte.wordpress.com\/2013\/12\/31\/scm-go\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\">Pieter Kelchtermans<\/a> written in golang and added some extra features:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>I removed the <code>set!<\/code> instruction because it is the only function to cause global side effects<br>All other functions are local to the current environment and as long as you don&#8217;t change the environment, every piece of code can be run in parallel without affecting each other<\/li>\n\n\n\n<li>I made <code>begin<\/code> to open its own environment, so self recursion can be done by defining a function in a begin block<\/li>\n\n\n\n<li>I fixed <code>if<\/code><\/li>\n\n\n\n<li>I also allowed strings as native datatypes as well as the <code>concat<\/code> function which will concatenate all strings to one string<\/li>\n\n\n\n<li>I added a serialization mechanism to fully recover values and turn them into valid scheme code again.<\/li>\n<\/ul>\n\n\n\n<pre><font color=\"#26A269\"><b>carli@launix-MS-7C51<\/b><\/font>:<font color=\"#12488B\"><b>~\/projekte\/memcp\/server-node-golang<\/b><\/font>$ make\ngo run *.go\n&gt; 45\n==&gt; 45\n&gt; (+ 1 2)\n==&gt; 3\n&gt; (define currified_add (lambda (a) (lambda (b) (+ a b))))\n==&gt; \"ok\"\n&gt; ((currified_add 4) 5)\n==&gt; 9\n&gt; (define add_1 (currified_add 1))\n==&gt; \"ok\"\n&gt; (add_1 6)\n==&gt; 7\n&gt; (add_1 (add_1 3))\n==&gt; 5\n&gt; (define name \"Peter\") \n==&gt; \"ok\"\n&gt; (concat \"Hello \" name)\n==&gt; \"Hello Peter\"\n&gt; \n<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Serialization Makes Programs Clusterable<\/h2>\n\n\n\n<p>Consider a in-memory storage of multiple Terabyte where every node can only handle 128GiB of RAM. Wouldn&#8217;t it be nice to transfer one computation to all nodes of the cluster?<\/p>\n\n\n\n<p>Here, we need some kind of serialization that unchains our code and current execution state from the machine and memory it is currently loaded on.<\/p>\n\n\n\n<p>Serialization means, to make a networkable stream of bytes out of a random structured object zoo. Our previously defined helper function <code>add_1<\/code> can be serialized as follows:<\/p>\n\n\n\n<pre><font color=\"#26A269\"><b>carli@launix-MS-7C51<\/b><\/font>:<font color=\"#12488B\"><b>~\/projekte\/memcp\/server-node-golang<\/b><\/font>$ make\ngo run *.go\n&gt; (define currified_add (lambda (a) (lambda (b) (+ a b))))\n==&gt; \"ok\"\n&gt; (define add_1 (currified_add 1))\n==&gt; \"ok\"\n&gt; add_1\n==&gt; (lambda (b) (begin (define a 1) (+ a b)))\n<\/pre>\n\n\n\n<p>And this is the code I expanded <a href=\"https:\/\/pkelchte.wordpress.com\/\" class=\"ek-link\">Pieter Kelchtermans<\/a> scheme interpreter with:<\/p>\n\n\n\n<pre><font color=\"#E9AD0C\">func<\/font> Serialize(b *bytes.Buffer, v scmer, en *env) {\n        <font color=\"#E9AD0C\">if<\/font> en != &amp;globalenv {\n                b.WriteString(<font color=\"#C061CB\">\"(begin \"<\/font>)\n                <font color=\"#E9AD0C\">for<\/font> k, v := <font color=\"#E9AD0C\">range<\/font> en.vars {\n                        <font color=\"#33C7DE\">\/\/ if symbol is defined in a lambda, print the real value<\/font>\n                        b.WriteString(<font color=\"#C061CB\">\"(define \"<\/font>)\n                        b.WriteString(<font color=\"#87FFAF\">string<\/font>(k)) <font color=\"#33C7DE\">\/\/ what if k contains spaces?? can it?<\/font>\n                        b.WriteString(<font color=\"#C061CB\">\" \"<\/font>)\n                        Serialize(b, v, en.outer)\n                        b.WriteString(<font color=\"#C061CB\">\") \"<\/font>)\n                }\n                Serialize(b, v, en.outer)\n                b.WriteString(<font color=\"#C061CB\">\")\"<\/font>)\n                <font color=\"#E9AD0C\">return<\/font>\n        }\n        <font color=\"#E9AD0C\">switch<\/font> v := v.(<font color=\"#E9AD0C\">type<\/font>) {\n        <font color=\"#E9AD0C\">case<\/font> []scmer:\n                b.WriteByte(<font color=\"#C061CB\">'('<\/font>)\n                <font color=\"#E9AD0C\">for<\/font> i, x := <font color=\"#E9AD0C\">range<\/font> v {\n                        <font color=\"#E9AD0C\">if<\/font> i != <font color=\"#C061CB\">0<\/font> {\n                                b.WriteByte(<font color=\"#C061CB\">' '<\/font>)\n                        }\n                        Serialize(b, x, en)\n                }\n                b.WriteByte(<font color=\"#C061CB\">')'<\/font>)\n        <font color=\"#E9AD0C\">case<\/font> <font color=\"#E9AD0C\">func<\/font>(...scmer) scmer:\n                <font color=\"#33C7DE\">\/\/ native func serialization is the hardest; reverse the env!<\/font>\n                <font color=\"#33C7DE\">\/\/ when later functional JIT is done, this must also handle deoptimization<\/font>\n                en2 := en\n                <font color=\"#E9AD0C\">for<\/font> en2 != <font color=\"#C061CB\">nil<\/font> {\n                        <font color=\"#E9AD0C\">for<\/font> k, v2 := <font color=\"#E9AD0C\">range<\/font> en2.vars {\n                                <font color=\"#33C7DE\">\/\/ compare function pointers (hacky but golang dosent give another opt)<\/font>\n                                <font color=\"#E9AD0C\">if<\/font> fmt.Sprint(v) == fmt.Sprint(v2) {\n                                        <font color=\"#33C7DE\">\/\/ found the right global function<\/font>\n                                        b.WriteString(fmt.Sprint(k)) <font color=\"#33C7DE\">\/\/ print out variable name<\/font>\n                                        <font color=\"#E9AD0C\">return<\/font>\n                                }\n                        }\n                        en2 = en2.outer\n                }\n                b.WriteString(<font color=\"#C061CB\">\"[unserializable native func]\"<\/font>)\n        <font color=\"#E9AD0C\">case<\/font> proc:\n                b.WriteString(<font color=\"#C061CB\">\"(lambda \"<\/font>)\n                Serialize(b, v.params, &amp;globalenv)\n                b.WriteByte(<font color=\"#C061CB\">' '<\/font>)\n                Serialize(b, v.body, v.en)\n                b.WriteByte(<font color=\"#C061CB\">')'<\/font>)\n        <font color=\"#E9AD0C\">case<\/font> symbol:\n                <font color=\"#33C7DE\">\/\/ print as symbol (because we already used a begin-block for defining our env)<\/font>\n                b.WriteString(fmt.Sprint(v))\n        <font color=\"#E9AD0C\">case<\/font> <font color=\"#87FFAF\">string<\/font>:\n                b.WriteByte(<font color=\"#C061CB\">'\"'<\/font>)\n                b.WriteString(strings.NewReplacer(<font color=\"#C061CB\">\"<\/font><font color=\"#FFD7D7\">\\\"<\/font><font color=\"#C061CB\">\"<\/font>, <font color=\"#C061CB\">\"<\/font><font color=\"#FFD7D7\">\\\\\\\"<\/font><font color=\"#C061CB\">\"<\/font>, <font color=\"#C061CB\">\"<\/font><font color=\"#FFD7D7\">\\\\<\/font><font color=\"#C061CB\">\"<\/font>, <font color=\"#C061CB\">\"<\/font><font color=\"#FFD7D7\">\\\\\\\\<\/font><font color=\"#C061CB\">\"<\/font>, <font color=\"#C061CB\">\"<\/font><font color=\"#FFD7D7\">\\r<\/font><font color=\"#C061CB\">\"<\/font>, <font color=\"#C061CB\">\"<\/font><font color=\"#FFD7D7\">\\\\<\/font><font color=\"#C061CB\">r\"<\/font>, <font color=\"#C061CB\">\"<\/font><font color=\"#FFD7D7\">\\n<\/font><font color=\"#C061CB\">\"<\/font>, <font color=\"#C061CB\">\"<\/font><font color=\"#FFD7D7\">\\\\<\/font><font color=\"#C061CB\">n\"<\/font>).Replace(v))\n                b.WriteByte(<font color=\"#C061CB\">'\"'<\/font>)\n        <font color=\"#E9AD0C\">default<\/font>:\n                b.WriteString(fmt.Sprint(v))\n        }\n}\n<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>What did we achieve?<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>We chose scheme to be our language of choice<\/li>\n\n\n\n<li>We added some useful functions to scheme to fit our needs (string processing&#8230;)<\/li>\n\n\n\n<li>We implemented a serialization function that can recreate scheme code from memory objects that can be loaded on other machines<\/li>\n\n\n\n<li>Now we can start implementing our highly-parallel map-reduce algorithms that can take map and reduce lambda-functions, execute them in parallel and enjoy the highly parallel result<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Almost 99% of all newly invented are imperative programming languages. But imperative languages have one drawback: their parallelization is hard.<\/p>\n","protected":false},"author":2,"featured_media":4706,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":2,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","_uag_custom_page_level_css":"","footnotes":""},"categories":[129],"tags":[],"class_list":["post-4705","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-memcp","single-item"],"featured_image_urls_v2":{"full":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920.jpg",1920,1440,false],"thumbnail":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920-150x150.jpg",150,150,true],"medium":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920-300x225.jpg",300,225,true],"medium_large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920-768x576.jpg",751,563,true],"large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920-1024x768.jpg",751,563,true],"1536x1536":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920-1536x1152.jpg",1536,1152,true],"2048x2048":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920.jpg",1920,1440,false],"trp-custom-language-flag":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920.jpg",16,12,false],"xs-thumb":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920-64x64.jpg",64,64,true],"appku-shop-single":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920.jpg",620,465,false]},"post_excerpt_stackable_v2":"<p>Almost 99% of all newly invented are imperative programming languages. But imperative languages have one drawback: their parallelization is hard. Drawbacks of Imperative Programming Languages But why is this the case for imperative languages? The answer is: state. The concept of an imperative language is that commands are executed which change the content of variables or complex objects in the memory. When trying to create an optimizing compiler that from itself finds parallelizable parts in the code, the compiler has to keep track of data dependencies and the random side effects of each command and function call. The possibly simplest&hellip;<\/p>\n","category_list_v2":"<a href=\"https:\/\/launix.de\/launix\/category\/memcp\/\" rel=\"category tag\">MemCP<\/a>","author_info_v2":{"name":"Carl-Philip H\u00e4nsch","url":"https:\/\/launix.de\/launix\/author\/carli\/"},"comments_num_v2":"0 comments","uagb_featured_image_src":{"full":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920.jpg",1920,1440,false],"thumbnail":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920-150x150.jpg",150,150,true],"medium":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920-300x225.jpg",300,225,true],"medium_large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920-768x576.jpg",751,563,true],"large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920-1024x768.jpg",751,563,true],"1536x1536":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920-1536x1152.jpg",1536,1152,true],"2048x2048":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920.jpg",1920,1440,false],"trp-custom-language-flag":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920.jpg",16,12,false],"xs-thumb":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920-64x64.jpg",64,64,true],"appku-shop-single":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/01\/rails-7182_1920.jpg",620,465,false]},"uagb_author_info":{"display_name":"Carl-Philip H\u00e4nsch","author_link":"https:\/\/launix.de\/launix\/author\/carli\/"},"uagb_comment_info":0,"uagb_excerpt":"Almost 99% of all newly invented are imperative programming languages. But imperative languages have one drawback: their parallelization is hard.","_links":{"self":[{"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/posts\/4705","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/comments?post=4705"}],"version-history":[{"count":4,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/posts\/4705\/revisions"}],"predecessor-version":[{"id":4851,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/posts\/4705\/revisions\/4851"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/media\/4706"}],"wp:attachment":[{"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/media?parent=4705"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/categories?post=4705"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/tags?post=4705"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}