Almost 99% of all newly invented are imperative programming languages. But imperative languages have one drawback: their parallelization is hard.
Drawbacks of Imperative Programming Languages
But why is this the case for imperative languages?
The answer is: state. The concept of an imperative language is that commands are executed which change the content of variables or complex objects in the memory. When trying to create an optimizing compiler that from itself finds parallelizable parts in the code, the compiler has to keep track of data dependencies and the random side effects of each command and function call.
The possibly simplest solution to this problem is to tell the compiler exactly which loops are parallelizable. This however forces the developer to write nearly side-effect-free code. So why not going the pure way – to design a programming language that does not allow side-effects?
The Functional World
Welcome to the world of pure-functional programming languages. Pure functional programming languages are pure when every function will compute its result only and only from its inputs. This makes up a great basis for highly parallel map-reduce algorithms like we need in our clusterable in-memory database.
I took the scheme interpreter of Pieter Kelchtermans written in golang and added some extra features:
- I removed the
set!
instruction because it is the only function to cause global side effects
All other functions are local to the current environment and as long as you don’t change the environment, every piece of code can be run in parallel without affecting each other - I made
begin
to open its own environment, so self recursion can be done by defining a function in a begin block - I fixed
if
- I also allowed strings as native datatypes as well as the
concat
function which will concatenate all strings to one string - I added a serialization mechanism to fully recover values and turn them into valid scheme code again.
carli@launix-MS-7C51:~/projekte/memcp/server-node-golang$ make go run *.go > 45 ==> 45 > (+ 1 2) ==> 3 > (define currified_add (lambda (a) (lambda (b) (+ a b)))) ==> "ok" > ((currified_add 4) 5) ==> 9 > (define add_1 (currified_add 1)) ==> "ok" > (add_1 6) ==> 7 > (add_1 (add_1 3)) ==> 5 > (define name "Peter") ==> "ok" > (concat "Hello " name) ==> "Hello Peter" >
Serialization Makes Programs Clusterable
Consider a in-memory storage of multiple Terabyte where every node can only handle 128GiB of RAM. Wouldn’t it be nice to transfer one computation to all nodes of the cluster?
Here, we need some kind of serialization that unchains our code and current execution state from the machine and memory it is currently loaded on.
Serialization means, to make a networkable stream of bytes out of a random structured object zoo. Our previously defined helper function add_1
can be serialized as follows:
carli@launix-MS-7C51:~/projekte/memcp/server-node-golang$ make go run *.go > (define currified_add (lambda (a) (lambda (b) (+ a b)))) ==> "ok" > (define add_1 (currified_add 1)) ==> "ok" > add_1 ==> (lambda (b) (begin (define a 1) (+ a b)))
And this is the code I expanded Pieter Kelchtermans scheme interpreter with:
func Serialize(b *bytes.Buffer, v scmer, en *env) { if en != &globalenv { b.WriteString("(begin ") for k, v := range en.vars { // if symbol is defined in a lambda, print the real value b.WriteString("(define ") b.WriteString(string(k)) // what if k contains spaces?? can it? b.WriteString(" ") Serialize(b, v, en.outer) b.WriteString(") ") } Serialize(b, v, en.outer) b.WriteString(")") return } switch v := v.(type) { case []scmer: b.WriteByte('(') for i, x := range v { if i != 0 { b.WriteByte(' ') } Serialize(b, x, en) } b.WriteByte(')') case func(...scmer) scmer: // native func serialization is the hardest; reverse the env! // when later functional JIT is done, this must also handle deoptimization en2 := en for en2 != nil { for k, v2 := range en2.vars { // compare function pointers (hacky but golang dosent give another opt) if fmt.Sprint(v) == fmt.Sprint(v2) { // found the right global function b.WriteString(fmt.Sprint(k)) // print out variable name return } } en2 = en2.outer } b.WriteString("[unserializable native func]") case proc: b.WriteString("(lambda ") Serialize(b, v.params, &globalenv) b.WriteByte(' ') Serialize(b, v.body, v.en) b.WriteByte(')') case symbol: // print as symbol (because we already used a begin-block for defining our env) b.WriteString(fmt.Sprint(v)) case string: b.WriteByte('"') b.WriteString(strings.NewReplacer("\"", "\\\"", "\\", "\\\\", "\r", "\\r", "\n", "\\n").Replace(v)) b.WriteByte('"') default: b.WriteString(fmt.Sprint(v)) } }
Conclusion
What did we achieve?
- We chose scheme to be our language of choice
- We added some useful functions to scheme to fit our needs (string processing…)
- We implemented a serialization function that can recreate scheme code from memory objects that can be loaded on other machines
- Now we can start implementing our highly-parallel map-reduce algorithms that can take map and reduce lambda-functions, execute them in parallel and enjoy the highly parallel result
Comments are closed