{"id":4869,"date":"2023-02-02T23:46:44","date_gmt":"2023-02-02T22:46:44","guid":{"rendered":"https:\/\/launix.de\/launix\/?p=4869"},"modified":"2023-03-02T10:26:44","modified_gmt":"2023-03-02T09:26:44","slug":"writing-a-sql-parser-in-scheme","status":"publish","type":"post","link":"https:\/\/launix.de\/launix\/en\/writing-a-sql-parser-in-scheme\/","title":{"rendered":"Writing a SQL parser in scheme"},"content":{"rendered":"\n<p>Functional programming languages are a neat tool to write parsers, especially when equipped with a pattern matching mechanism.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p>Here&#8217;s the beginning of our SQL parser in memdb:<\/p>\n\n\n\n<pre>(<font color=\"#E9AD0C\">define<\/font> parse_sql (<font color=\"#E9AD0C\">lambda<\/font> (s) (<font color=\"#E9AD0C\">begin<\/font>\n                        \n        (<font color=\"#E9AD0C\">define<\/font> identifier (<font color=\"#E9AD0C\">lambda<\/font> (s) (match s\n                (regex <font color=\"#C061CB\">&quot;(?is)^`(.*)`(.*)&quot;<\/font> _ id rest) <font color=\"#FFD7D7\">&apos;(<\/font>id<font color=\"#FFD7D7\"> <\/font>rest<font color=\"#FFD7D7\">)<\/font>\n                (regex <font color=\"#C061CB\">&quot;(?is)^([a-zA-Z_][a-zA-Z_0-9]*)(.*)&quot;<\/font> _ id rest) <font color=\"#FFD7D7\">&apos;(<\/font>id<font color=\"#FFD7D7\"> <\/font>rest<font color=\"#FFD7D7\">)<\/font>\n                (<font color=\"#33C7DE\"><b>error<\/b><\/font> (concat <font color=\"#C061CB\">&quot;expected identifier, found &quot;<\/font> s))\n        )))                             \n                                        \n        (match s                        \n                (regex <font color=\"#C061CB\">&quot;(?s)^\\\\s*(?m:--.*?$)(.*)&quot;<\/font> _ rest) \/* comment *\/ (parse_sql rest)\n                (concat <font color=\"#C061CB\">&quot;\\n&quot;<\/font> rest) (parse_sql rest)\n                (regex <font color=\"#C061CB\">&quot;(?is)^\\\\s+(.*)&quot;<\/font> _ rest) (parse_sql rest)\n                (regex <font color=\"#C061CB\">&quot;(?is)^CREATE(?:\\\\s|\\\\n)+TABLE(?:\\\\s|\\\\n)+(.*)&quot;<\/font> _ rest) (match (identifier rest) <font color=\"#FFD7D7\">&apos;(<\/font>id<font color=\"#FFD7D7\"> <\/font>rest<font color=\"#FFD7D7\">)<\/font> (concat <font color=\"#C061CB\">&quot;tablecreate &quot;<\/font> id <font color=\"#C061CB\">&quot;, decl: &quot;<\/font> rest) (<font color=\"#33C7DE\"><b>error<\/b><\/font> <font color=\"#C061CB\">&quot;expected identifier&quot;<\/font>))\n                (<font color=\"#33C7DE\"><b>error<\/b><\/font> (concat <font color=\"#C061CB\">&quot;unknown SQL syntax: &quot;<\/font> s))\n        )               \n))) <\/pre>\n\n\n\n<p>As you can see, you can nearly write EBNF syntax in scheme despite the exception that you are bound to left linearity.<\/p>\n\n\n\n<p>I extended scheme with the <code>match<\/code> instruction which takes the following parameters:<\/p>\n\n\n\n<p><code>(match value [pattern result] ... [default])<\/code><\/p>\n\n\n\n<p>Where <code>pattern<\/code> can be one of:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>\"string\"<\/code> matches string equality<\/li>\n\n\n\n<li><code>variable<\/code> matches any value and stores the value into the variable<\/li>\n\n\n\n<li><code>(concat \"string\" variable)<\/code> splits away a fixed prefix<\/li>\n\n\n\n<li><code>(concat variable \"string\" variable)<\/code> splits away a fixed infix<\/li>\n\n\n\n<li><code>(concat variable \"string\")<\/code> splits away a fixed postfix<\/li>\n\n\n\n<li><code>'(pattern pattern pattern ...)<\/code> matches only a list with the same amount of items and matching patterns<\/li>\n\n\n\n<li><code>(regex \"pattern\" variable variable variable)<\/code> will parse the value as a regex and store all captured groups into variables<\/li>\n<\/ul>\n\n\n\n<p>When a value matches the pattern, the result is evaluated. Matched variables can be used in the <code>result<\/code> expression. If no match has been found, <code>default<\/code> is evaluated.<\/p>\n\n\n\n<p>This way, we can write parsers very intuitively, but also optimizers that traverse tree structure patterns become very easy to implement.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Functional programming languages are a neat tool to write parsers, especially when equipped with a pattern matching mechanism.<\/p>","protected":false},"author":2,"featured_media":4870,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","_uag_custom_page_level_css":"","footnotes":""},"categories":[129,128],"tags":[],"class_list":["post-4869","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-memcp","category-programming","single-item"],"featured_image_urls_v2":{"full":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920.jpg",1920,1440,false],"thumbnail":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920-150x150.jpg",150,150,true],"medium":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920-300x225.jpg",300,225,true],"medium_large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920-768x576.jpg",751,563,true],"large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920-1024x768.jpg",751,563,true],"1536x1536":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920-1536x1152.jpg",1536,1152,true],"2048x2048":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920.jpg",1920,1440,false],"trp-custom-language-flag":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920.jpg",16,12,false],"xs-thumb":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920-64x64.jpg",64,64,true],"appku-shop-single":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920.jpg",620,465,false]},"post_excerpt_stackable_v2":"<p>Functional programming languages are a neat tool to write parsers, especially when equipped with a pattern matching mechanism. Here&#8217;s the beginning of our SQL parser in memdb: (define parse_sql (lambda (s) (begin (define identifier (lambda (s) (match s (regex &quot;(?is)^`(.*)`(.*)&quot; _ id rest) &apos;(id rest) (regex &quot;(?is)^([a-zA-Z_][a-zA-Z_0-9]*)(.*)&quot; _ id rest) &apos;(id rest) (error (concat &quot;expected identifier, found &quot; s)) ))) (match s (regex &quot;(?s)^\\\\s*(?m:&#8211;.*?$)(.*)&quot; _ rest) \/* comment *\/ (parse_sql rest) (concat &quot;\\n&quot; rest) (parse_sql rest) (regex &quot;(?is)^\\\\s+(.*)&quot; _ rest) (parse_sql rest) (regex &quot;(?is)^CREATE(?:\\\\s|\\\\n)+TABLE(?:\\\\s|\\\\n)+(.*)&quot; _ rest) (match (identifier rest) &apos;(id rest) (concat &quot;tablecreate &quot; id &quot;, decl: &quot; rest) (error&hellip;<\/p>\n","category_list_v2":"<a href=\"https:\/\/launix.de\/launix\/en\/category\/memcp\/\" rel=\"category tag\">MemCP<\/a>, <a href=\"https:\/\/launix.de\/launix\/en\/category\/programming\/\" rel=\"category tag\">Programming<\/a>","author_info_v2":{"name":"Carl-Philip H\u00e4nsch","url":"https:\/\/launix.de\/launix\/en\/author\/carli\/"},"comments_num_v2":"0 comments","uagb_featured_image_src":{"full":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920.jpg",1920,1440,false],"thumbnail":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920-150x150.jpg",150,150,true],"medium":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920-300x225.jpg",300,225,true],"medium_large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920-768x576.jpg",751,563,true],"large":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920-1024x768.jpg",751,563,true],"1536x1536":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920-1536x1152.jpg",1536,1152,true],"2048x2048":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920.jpg",1920,1440,false],"trp-custom-language-flag":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920.jpg",16,12,false],"xs-thumb":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920-64x64.jpg",64,64,true],"appku-shop-single":["https:\/\/launix.de\/launix\/wp-content\/uploads\/2023\/02\/data-5933101_1920.jpg",620,465,false]},"uagb_author_info":{"display_name":"Carl-Philip H\u00e4nsch","author_link":"https:\/\/launix.de\/launix\/en\/author\/carli\/"},"uagb_comment_info":0,"uagb_excerpt":"Functional programming languages are a neat tool to write parsers, especially when equipped with a pattern matching mechanism.","_links":{"self":[{"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/posts\/4869","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/comments?post=4869"}],"version-history":[{"count":1,"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/posts\/4869\/revisions"}],"predecessor-version":[{"id":4871,"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/posts\/4869\/revisions\/4871"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/media\/4870"}],"wp:attachment":[{"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/media?parent=4869"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/categories?post=4869"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/launix.de\/launix\/en\/wp-json\/wp\/v2\/tags?post=4869"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}