Julia and the Blockchain

The blockchain cannot be described just as a revolution. It is a tsunami-like phenomenon, slowly advancing and gradually enveloping everything along its way by the force of its progression. 
 - William Mougayar

I think the whole narrative of blockchain without bitcoin will amount to very little. 
- Fred Ehrsam

We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.
Why We Created Julia - Jeff Bezanson, Stefan Karpinski, Viral Shah, Alan Edelman

Although the title of this post sounds like it might be some sort of bizarre children's book, it's actually my attempt to look at two subjects that interested me for some time: the Julia language and blockchains. My usua; approach to learning something new is to pick a simple problem and write some code to get a basic understanding of the subject.. In this case, I thought I might try to combine a couple interests and code a simple blockchain in Julia.

Julia


Julia is a high-level dynamic language designed to operate in the same domains as R, Python, and MATLAB. Julia is a relatively new language. The Julia project was started in 2009. The first release was in 2012. Julia is designed for numerical computing. It provides distributed parallel execution and has extensive libraries for linear algebra, random number generation, signal processing, and string processing and more.

What's to like

  • Julia features multiple dispatch allowing polymorphism.
  • It has a dynamic types. Types can be inferred, but they can also be declared. I like having the ability to type function arguments. It make for better code documentation. In Julia, declaring type helps the optimizer to perform more extensive optimization.
  • It's FAST. Julia has a JIT compiler and generates fast code. It can run at speeds approaching C on benchmarks. It's possible to make R, or Python faster, but this usually involves vectorizing the code and for Python external libraries like numpy. Julia is fast out of the box. Most programmers find it easy to think in terms of loops. Julia loops are fast.
  • It's designed for parallel computing. 
  • You can easily call Python, C, and other languages.
  • Syntax is clean and easy for someone who programs in Python to pick up.
  • Julia has a nice REPL (Read, Execute, Print, Loop) for interactive programming.
  • Julia has Lisp-like macros.

What's not to like

  • Julia is still new. The current version is 0.6.  That means some libraries are incomplete. Docs can be incomplete, missing, or out of date. Unlike with Python or R, googling Julia problems might not find anything, even on Stack Overflow.
  • Arrays are 1-indexed, not 0-indexed. 
  • List comprehensions can't use conditionals.
  • Julia's package manager links to GitHub. Packages must usually be compiled on first use. 
  • Loading some packages, for example Gadfly, is slow.
  • Julia's OOP design is different from Python and Java. It's possible to do inheritance with some work, but Julia types rely on composition rather than inheritance.
  • Julia allows variable interpolation is strings. For example, name ="Bill";  s = "Hello, $name" Maybe I don't understand this feature well enough, but it seems like it might lead to security issues.
  • Julia is one of those programming language names like R, you have to be careful when searching. If you just google Julia, you are more likely to get hist for the movie Julia than the language.

Blockchain


Even if you have been living in a cave for the last few years, you have heard of BitCoin and probably other cryptocurrencies. Blockchains are an important piece of technology underlying cryptocurrencies. Cryptocurrencies depend on much more than blockchains and blockchains have other possible applications than cryptocurrency. Blockchains have been proposed as the basis for smart contracts, IoT, tamper proof documents and much more. Blockchain fever is rampant in certain areas of the tech world. How much is hype? Probably most, but blockchains are a useful basis for many types of distributed apps that require security.

At its most basic, a blockchain is simply a list of records called blocks. However, unlike something like a linked list, the records aren't linked by address or index. Rather, the blocks are linked by cryptographic hashes. Each block contains a hash of the previous block, a timestamp, and data. A block's hash is constructed from the block's data, timestamp, and the hash of the previous block. Having each block contain the hash of the previous block, allows us to know if the chain has been tampered with.

With a cryptocurrency system, keeping track of blockchains, checking chain validity, maintaining wallets, public and private keys etc. in a distributed environment is an extremely complicated IT project. Writing a tiny blockchain app, on the other hand, is relatively simple. There are good examples on the web in R, Python, and Java among others.

DNABlocks

An outfit called Genecoin is advertising "Make a Backup of Yourself Using Bitcoin". I'm not really sure how many bitcoins I'm worth. I think their idea is to use distributed blockchains to store genomic information. I'll try a simple version of that idea - a server that uses a blockchain to store DNA and RNA sequence data.

For each sequence, I'll store a header and the sequence string. The header is just a FASTA format header that can contain a single line of information. The block will also store a timestep of its creation, a not necessarily unique numerical ID, the hash of the current block, and the hash of the previous block.


type Block
    header :: String
    seq :: String
    id :: Int64
    previous_hash :: String
    timestamp :: DateTime
    hash :: String
end



Julia structures are delimited with end statements. In the block definition, I have declared the type of each field. This isn't absolutely necessary, but it's  great aid to debugging.

Hash

A hash function takes some input data and maps the data to a value of a fixed length, usually smaller than the original data. Hashes are used extensively in cryptographic applications. Designing a proper hash function is not easy. A good cryptographic hash should be collision resistant, i.e. different data should not produce the same output. The US National Security Agency (NSA) publishes a set of algorithms called Secure Hash Algorithms (SHA-2). Julia provides the SHA library to compute hashes. For example, using the Julia REPL

julia> using SHA  # load the SHA library

julia> bytes2hex(sha256("GATTACA"))  # hash some data and turn it to a hex string
"d74f6c423e80cbf69d76149048e458a10c96f927c896ea9ff4f44616b643eb22"



To calculate a block's hash, we can use sha256 function to generate the hash from the current data and the hash of the previous block.

function calculate_hash(header ::String,
                        seq :: String,
                        previous_hash :: String,
                        now_time :: DateTime,
                        id :: Int64) :: String
    return(bytes2hex(sha256(previous_hash *
                            Dates.format(now_time, "yyyy-mm-dd HH:MM:SS") *
                            header *
                            seq *
                            string(id))))
end

function new_block(header::String, seq :: String, previous_hash :: String, id :: Int64) :: Block
    now_time = now()
    hash = calculate_hash(header, seq, previous_hash, now_time, id)
    return Block(header, seq, id, previous_hash, now_time, hash)
end


Notice that we can tell the compiler the function return type. This helps with error checking and code optimization. "*" is the Julia string concatenation operator.

To build a block chain, we just add the newly created blocks to the end of an array.

header = ">Ecoli-111 a simple sequence"
seq = ""TAATGTTTGTGCTGGTTTTTGTGGCATCGGGCGAGAA"
id = 21
push!(blockchain, new_block(header, seq,  blockchain[end].hash, id))


blockchain[end] is the final block so far in the chain.

How does this secure the blockchain? We compute a hash by concatenating the data, the timestamp, and the hash of the previous block. We can only compute a valid hash if we know the hash of the previous block. The hash of the previous block contains the hash of the preceding block and so on. This provides us with a chain of evidence about the integrity of the blockchain. If we alter one block, we would have to calculate all the subsequent hashes. In a distributed environment, techniques are used that make it difficult to recreate the entire chain; it's extremely computationally expensive to create new blocks, so you couldn't do it fast enough. Also, everyone has access to the chain and has their own copy of the chain so they can always check the validity of any changes.

In the gene blocks app, we will store the blocks in an array. To check validity, we simple loop through the array and compare each block's stored value for the previous hash to the calculated value of the previous block's hash. If they don't match, we display an error message.

function is_chain_valid(blockchain :: Array{Block}) :: Bool
    for i in length(blockchain):-1:2
        current_block = blockchain[i]
        prev_block = blockchain[i-1]

        current_hash = calculate_hash(current_block.header,
                                      current_block.seq,
                                      current_block.previous_hash,
                                      current_block.timestamp,
                                      current_block.id)
        if current_hash != current_block.hash
            println("ERROR: Current hashes don't match ", i)
            return false
        end
        
        if current_block.previous_hash != prev_block.hash
            println("ERROR: Previous hashes don't match - block ", i)
            println("current_block: ", current_block.previous_hash)
            println("prev_block: ",  prev_block.hash)
            return false
        end
    end

    return true
end



blockchain[1] contains the oldest block. This block is called the genesis block. It is a dummy block with hard coded values to get the chain started.

A Blockchain Server

The code above is enough to create a blockchain. We'll get a little fancier and put the blockchain on a server to show the flexibility of Julia.

There is a nice web framework for Julia called Genie. It's an MVC framework that provides boilerplate code to set up a server. You fill in the pieces needed to complete web app. Genie provides a REPL for interactive testing and a server app to run the code. The good thing about Genie is that it seems to provide a pretty complete framework. The bad thing is that the docs are far from complete. This means that you sometimes have to dig through the source to find out how to do various operation.  It also means that I'm probably not using Genie as efficiently as I could.

I won't create a full MVC app. I will just use the controller to make a server app that accepts DNA or RNA sequence strings, maintain them in a blockchain, and also save the blockchain to a file in JSON format. This is probably one of the least efficient uses of a blockchain imaginable but I wanted to test out the Genie framework.

Creating an app with Julia and Genie is easy.

julia> using Genie
julia> Genie.REPL.new_app("dna_demo")  # create a new app

genie> Genie.REPL.new_controller("DNA") # create a controller for the app

The commands above will create a new directory called dna_demo. There will be a number of subdirectories under dna_demo/ containing general Julia code for the app. The third line creates a new controller dna_demo/app/resources/dnas/DNAsController.jl. This file is where I will add our code.

Next, we edit dna_demo/config/routes.jl to include the commands we want to process. In our case, we want to send a sequence to the server using GET and a FASTA file using a POST command. We also want to tell the server to save the blockchain in a file in JSON format and shutdown. For the latter task, we'll require a username and password.

using Router

route("/") do
  Router.serve_static_file("/welcome.html")
end

route("/dnas/seq/:id", DNAsController.sequence)

route("/dnas/fasta/:id", DNAsController.fasta, method=POST)

route("/dnas/quit/:id", DNAsController.save_and_quit, method=POST)


The first route command is provided by Genie. It just serves up a welcome message. The next route statement says send a GET command to the controller in the dnas directory and pass the data to the routine DNAsController.sequence. :id is an integer id. It can be anything. The next two commands are similar except that they use the POST method.

Next we need to edit dna_demo/app/resources/dnas/DNAsController.jl to process our commands. I have left out comments in the interest of brevity. Download the full source to get some descriptive comments.

function sequence()
    id = parse(Int64, @params(:id))
    header = @params(:header)
    if length(header) == 0
        header = ">" * string(abs(rand(Int64)))
    elseif header[1] != '>'
        header = ">" * header
    end
    seq = @params(:seq)
    push!(blockchain, new_block(header, seq,  blockchain[end].hash, id))

    return nothing
end

function fasta()
    req = @params(:REQUEST)
    id = parse(Int64, @params(:id))
    
    buffer = join([Char(x) for x in req.data])
    seqs = read_fasta(buffer)

    for seq in seqs
        push!(blockchain, new_block(seq.header, seq.seq,  blockchain[end].hash, id))
    end
    
    return nothing
end

function save_and_quit()
    auth = "Basic YmlsbDoxMjM0NQ=="  # bill 12345 not NSA level security

    req = @params(:REQUEST)
    user_auth = req.headers["Authorization"]

    if user_auth != auth
        return "Invalid username/password."
    end
        
    write_block_file(block_file, blockchain)
    exit(0)
end


Data can be retrieved  by the controller using the @params macro. This is one area where the Genie docs are lacking. I had to dig through source code and web forums to figure out how to actually get the data. GET data is passed by id. POST data is retrieved from @params(:REQUEST). The file contents are passed to fasta() as an array of Int64s. We turn it into a string buffer and then read the FASTA data from the buffer. Whatever you do, don't use the method of authentication that I use here.

We need a few helper functions.

type Sequence
    header :: String
    seq :: String
end

function is_nucleotide(s :: String) :: Bool
    nucleotide_pattern = r"^[ATCGU]+$"  # regex
    if ! ismatch(nucleotide_pattern, s)
        println("ERROR: Invalid data.")
        return false
    end
    
    return true
end

function read_fasta(buffer :: String) :: Array{Sequence}
    seqs = Sequence[]
    
    b = IOBuffer(buffer)
    header = ""
    seq_str = ""
    while ! eof(b)
        s = readline(b)
        s = lstrip(rstrip(chomp(s)))
        if length(s) > 0
            if s[1] == '>'
                if header != ""
                    if is_nucleotide(seq_str)
                        push!(seqs, Sequence(header, seq_str))
                    else
                        return Sequence[]
                    end                    
                end
                header = s
                seq_str = ""                    
            else
                seq_str *= uppercase(s)
            end
        end
    end
    close(b)
    
    if header != ""
        push!(seqs, Sequence(header, seq_str))
    end

    return seqs
end

function read_block_file(block_file :: String) :: Array{Block}
    blockchain = Block[]
    
    f = open(block_file)
    for line in eachline(f)
        j = JSON.parse(line)
        push!(blockchain,
              Block(j["header"],
                    j["seq"],
                    parse(Int64, j["id"]),
                    j["previous_hash"],
                    DateTime(j["timestamp"]),
                    j["hash"]))
    end
    close(f)

    return blockchain
end

function write_block_file(block_file :: String, blockchain :: Array{Block}) :: Void
    open(block_file, "w") do f
        for i in 1:length(blockchain)
            s = JSON.json(Dict("header" => blockchain[i].header,
                               "seq" => blockchain[i].seq,
                               "id" => string(blockchain[i].id),
                               "previous_hash" => blockchain[i].previous_hash,
                               "timestamp" => string(blockchain[i].timestamp),
                               "hash" => blockchain[i].hash))
            write(f, s * "\n")
        end
    end
end

function genesis_block() :: Block
    return new_block("0", "0", "0", 0)
end


read_block_file reads a series of JSON lines containing the previously saved block information. JSON.parse copies the JSON data into a dictionary. write_block_file saves the blockchain in JSON format. JSON.json takes a dictionary and returns a JSON string.

To complete the controller, we add some code to get started. On startup we either load a pre-existing file in JSON format or start the  blockchain with a genesis block.

println("Starting!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")

blockchain = Block[]

if isfile(block_file)
    blockchain = read_block_file(block_file)
    if ! is_chain_valid(blockchain)
        println("Fatal Error!")
        exit()
    end
    println("Blockchain loaded.")
else
    push!(blockchain, genesis_block())        
    println("New blockchain initialized.")
end


We now have a complete server app. To run it, cd to dna_demo and run bin/repl or bin/server. In the REPL, I usually tell the logging module Lumberjack to suppress log messages to the console.  AppStart starts the server listening on port 8000.

using Lumberjack
remove_truck("console")
AppServer.startup()


We can use curl to talk to the server.

# send a single sequence
curl "localhost:8000/dnas/seq/10?header=test&seq=ATCGGGGACA"

# send a file
curl -T "crp.fasta" -X POST "localhost:8000/dnas/fasta/11"

# save the blockchain and quit
curl -u bill -X POST "localhost:8000/dnas/quit/10"
Enter host password for user 'bill':


The next time the server is started, the data will be loaded into the blockchain from the file.

The source code is available here.

About Julia

In conclusion, I like Julia and Genie a lot. Despite some rough edges I enjoyed coding in Julia more than I have in any language in a long time. This includes Python, which is my main "goto" language. I doubt it will overtake Python or R in popularity any time soon. There is just too much established code in those languages, but Julia being open source, fast, and easy to learn is a strong contender for their niche. If I were a vendor of a proprietary system like MATLAB or SAS, I would be worried about what will be happening in five to ten years when Julia really hits its stride.

1 comment:

  1. Your article has piqued a lot of positive interest. I can see why since you have done such a good job of making it interesting. tikpay

    ReplyDelete