- William Mougayar
I think the whole narrative of blockchain without bitcoin will amount to very little.
- Fred Ehrsam
We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.
Why We Created Julia - Jeff Bezanson, Stefan Karpinski, Viral Shah, Alan Edelman
Although the title of this post sounds like it might be some sort of bizarre children's book, it's actually my attempt to look at two subjects that interested me for some time: the Julia language and blockchains. My usua; approach to learning something new is to pick a simple problem and write some code to get a basic understanding of the subject.. In this case, I thought I might try to combine a couple interests and code a simple blockchain in Julia.
Julia
Julia is a high-level dynamic language designed to operate in the same domains as R, Python, and MATLAB. Julia is a relatively new language. The Julia project was started in 2009. The first release was in 2012. Julia is designed for numerical computing. It provides distributed parallel execution and has extensive libraries for linear algebra, random number generation, signal processing, and string processing and more.
What's to like
- Julia features multiple dispatch allowing polymorphism.
- It has a dynamic types. Types can be inferred, but they can also be declared. I like having the ability to type function arguments. It make for better code documentation. In Julia, declaring type helps the optimizer to perform more extensive optimization.
- It's FAST. Julia has a JIT compiler and generates fast code. It can run at speeds approaching C on benchmarks. It's possible to make R, or Python faster, but this usually involves vectorizing the code and for Python external libraries like numpy. Julia is fast out of the box. Most programmers find it easy to think in terms of loops. Julia loops are fast.
- It's designed for parallel computing.
- You can easily call Python, C, and other languages.
- Syntax is clean and easy for someone who programs in Python to pick up.
- Julia has a nice REPL (Read, Execute, Print, Loop) for interactive programming.
- Julia has Lisp-like macros.
What's not to like
- Julia is still new. The current version is 0.6. That means some libraries are incomplete. Docs can be incomplete, missing, or out of date. Unlike with Python or R, googling Julia problems might not find anything, even on Stack Overflow.
- Arrays are 1-indexed, not 0-indexed.
- List comprehensions can't use conditionals.
- Julia's package manager links to GitHub. Packages must usually be compiled on first use.
- Loading some packages, for example Gadfly, is slow.
- Julia's OOP design is different from Python and Java. It's possible to do inheritance with some work, but Julia types rely on composition rather than inheritance.
- Julia allows variable interpolation is strings. For example, name ="Bill"; s = "Hello, $name" Maybe I don't understand this feature well enough, but it seems like it might lead to security issues.
- Julia is one of those programming language names like R, you have to be careful when searching. If you just google Julia, you are more likely to get hist for the movie Julia than the language.
Blockchain
Even if you have been living in a cave for the last few years, you have heard of BitCoin and probably other cryptocurrencies. Blockchains are an important piece of technology underlying cryptocurrencies. Cryptocurrencies depend on much more than blockchains and blockchains have other possible applications than cryptocurrency. Blockchains have been proposed as the basis for smart contracts, IoT, tamper proof documents and much more. Blockchain fever is rampant in certain areas of the tech world. How much is hype? Probably most, but blockchains are a useful basis for many types of distributed apps that require security.
At its most basic, a blockchain is simply a list of records called blocks. However, unlike something like a linked list, the records aren't linked by address or index. Rather, the blocks are linked by cryptographic hashes. Each block contains a hash of the previous block, a timestamp, and data. A block's hash is constructed from the block's data, timestamp, and the hash of the previous block. Having each block contain the hash of the previous block, allows us to know if the chain has been tampered with.
With a cryptocurrency system, keeping track of blockchains, checking chain validity, maintaining wallets, public and private keys etc. in a distributed environment is an extremely complicated IT project. Writing a tiny blockchain app, on the other hand, is relatively simple. There are good examples on the web in R, Python, and Java among others.
Julia structures are delimited with end statements. In the block definition, I have declared the type of each field. This isn't absolutely necessary, but it's great aid to debugging.
With a cryptocurrency system, keeping track of blockchains, checking chain validity, maintaining wallets, public and private keys etc. in a distributed environment is an extremely complicated IT project. Writing a tiny blockchain app, on the other hand, is relatively simple. There are good examples on the web in R, Python, and Java among others.
DNABlocks
An outfit called Genecoin is advertising "Make a Backup of Yourself Using Bitcoin". I'm not really sure how many bitcoins I'm worth. I think their idea is to use distributed blockchains to store genomic information. I'll try a simple version of that idea - a server that uses a blockchain to store DNA and RNA sequence data.
For each sequence, I'll store a header and the sequence string. The header is just a FASTA format header that can contain a single line of information. The block will also store a timestep of its creation, a not necessarily unique numerical ID, the hash of the current block, and the hash of the previous block.
type Block
header :: String
seq :: String
id :: Int64
previous_hash :: String
timestamp :: DateTime
hash :: String
end
Julia structures are delimited with end statements. In the block definition, I have declared the type of each field. This isn't absolutely necessary, but it's great aid to debugging.
Hash
A hash function takes some input data and maps the data to a value of a fixed length, usually smaller than the original data. Hashes are used extensively in cryptographic applications. Designing a proper hash function is not easy. A good cryptographic hash should be collision resistant, i.e. different data should not produce the same output. The US National Security Agency (NSA) publishes a set of algorithms called Secure Hash Algorithms (SHA-2). Julia provides the SHA library to compute hashes. For example, using the Julia REPL
julia> using SHA # load the SHA library
julia> bytes2hex(sha256("GATTACA")) # hash some data and turn it to a hex string
"d74f6c423e80cbf69d76149048e458a10c96f927c896ea9ff4f44616b643eb22"
To calculate a block's hash, we can use sha256 function to generate the hash from the current data and the hash of the previous block.
function calculate_hash(header ::String,
seq :: String,
previous_hash :: String,
now_time :: DateTime,
id :: Int64) :: String
return(bytes2hex(sha256(previous_hash *
Dates.format(now_time, "yyyy-mm-dd HH:MM:SS") *
header *
seq *
string(id))))
end
function new_block(header::String, seq :: String, previous_hash :: String, id :: Int64) :: Block
now_time = now()
hash = calculate_hash(header, seq, previous_hash, now_time, id)
return Block(header, seq, id, previous_hash, now_time, hash)
end
Notice that we can tell the compiler the function return type. This helps with error checking and code optimization. "*" is the Julia string concatenation operator.
To build a block chain, we just add the newly created blocks to the end of an array.
header = ">Ecoli-111 a simple sequence"
seq = ""TAATGTTTGTGCTGGTTTTTGTGGCATCGGGCGAGAA"
id = 21
push!(blockchain, new_block(header, seq, blockchain[end].hash, id))
blockchain[end] is the final block so far in the chain.
How does this secure the blockchain? We compute a hash by concatenating the data, the timestamp, and the hash of the previous block. We can only compute a valid hash if we know the hash of the previous block. The hash of the previous block contains the hash of the preceding block and so on. This provides us with a chain of evidence about the integrity of the blockchain. If we alter one block, we would have to calculate all the subsequent hashes. In a distributed environment, techniques are used that make it difficult to recreate the entire chain; it's extremely computationally expensive to create new blocks, so you couldn't do it fast enough. Also, everyone has access to the chain and has their own copy of the chain so they can always check the validity of any changes.
In the gene blocks app, we will store the blocks in an array. To check validity, we simple loop through the array and compare each block's stored value for the previous hash to the calculated value of the previous block's hash. If they don't match, we display an error message.
blockchain[1] contains the oldest block. This block is called the genesis block. It is a dummy block with hard coded values to get the chain started.
There is a nice web framework for Julia called Genie. It's an MVC framework that provides boilerplate code to set up a server. You fill in the pieces needed to complete web app. Genie provides a REPL for interactive testing and a server app to run the code. The good thing about Genie is that it seems to provide a pretty complete framework. The bad thing is that the docs are far from complete. This means that you sometimes have to dig through the source to find out how to do various operation. It also means that I'm probably not using Genie as efficiently as I could.
I won't create a full MVC app. I will just use the controller to make a server app that accepts DNA or RNA sequence strings, maintain them in a blockchain, and also save the blockchain to a file in JSON format. This is probably one of the least efficient uses of a blockchain imaginable but I wanted to test out the Genie framework.
Creating an app with Julia and Genie is easy.
The commands above will create a new directory called dna_demo. There will be a number of subdirectories under dna_demo/ containing general Julia code for the app. The third line creates a new controller dna_demo/app/resources/dnas/DNAsController.jl. This file is where I will add our code.
Next, we edit dna_demo/config/routes.jl to include the commands we want to process. In our case, we want to send a sequence to the server using GET and a FASTA file using a POST command. We also want to tell the server to save the blockchain in a file in JSON format and shutdown. For the latter task, we'll require a username and password.
The first route command is provided by Genie. It just serves up a welcome message. The next route statement says send a GET command to the controller in the dnas directory and pass the data to the routine DNAsController.sequence. :id is an integer id. It can be anything. The next two commands are similar except that they use the POST method.
Next we need to edit dna_demo/app/resources/dnas/DNAsController.jl to process our commands. I have left out comments in the interest of brevity. Download the full source to get some descriptive comments.
Data can be retrieved by the controller using the @params macro. This is one area where the Genie docs are lacking. I had to dig through source code and web forums to figure out how to actually get the data. GET data is passed by id. POST data is retrieved from @params(:REQUEST). The file contents are passed to fasta() as an array of Int64s. We turn it into a string buffer and then read the FASTA data from the buffer. Whatever you do, don't use the method of authentication that I use here.
We need a few helper functions.
read_block_file reads a series of JSON lines containing the previously saved block information. JSON.parse copies the JSON data into a dictionary. write_block_file saves the blockchain in JSON format. JSON.json takes a dictionary and returns a JSON string.
To complete the controller, we add some code to get started. On startup we either load a pre-existing file in JSON format or start the blockchain with a genesis block.
We now have a complete server app. To run it, cd to dna_demo and run bin/repl or bin/server. In the REPL, I usually tell the logging module Lumberjack to suppress log messages to the console. AppStart starts the server listening on port 8000.
We can use curl to talk to the server.
The next time the server is started, the data will be loaded into the blockchain from the file.
The source code is available here.
In the gene blocks app, we will store the blocks in an array. To check validity, we simple loop through the array and compare each block's stored value for the previous hash to the calculated value of the previous block's hash. If they don't match, we display an error message.
function is_chain_valid(blockchain :: Array{Block}) :: Bool
for i in length(blockchain):-1:2
current_block = blockchain[i]
prev_block = blockchain[i-1]
current_hash = calculate_hash(current_block.header,
current_block.seq,
current_block.previous_hash,
current_block.timestamp,
current_block.id)
if current_hash != current_block.hash
println("ERROR: Current hashes don't match ", i)
return false
end
if current_block.previous_hash != prev_block.hash
println("ERROR: Previous hashes don't match - block ", i)
println("current_block: ", current_block.previous_hash)
println("prev_block: ", prev_block.hash)
return false
end
end
return true
end
blockchain[1] contains the oldest block. This block is called the genesis block. It is a dummy block with hard coded values to get the chain started.
A Blockchain Server
The code above is enough to create a blockchain. We'll get a little fancier and put the blockchain on a server to show the flexibility of Julia.There is a nice web framework for Julia called Genie. It's an MVC framework that provides boilerplate code to set up a server. You fill in the pieces needed to complete web app. Genie provides a REPL for interactive testing and a server app to run the code. The good thing about Genie is that it seems to provide a pretty complete framework. The bad thing is that the docs are far from complete. This means that you sometimes have to dig through the source to find out how to do various operation. It also means that I'm probably not using Genie as efficiently as I could.
I won't create a full MVC app. I will just use the controller to make a server app that accepts DNA or RNA sequence strings, maintain them in a blockchain, and also save the blockchain to a file in JSON format. This is probably one of the least efficient uses of a blockchain imaginable but I wanted to test out the Genie framework.
Creating an app with Julia and Genie is easy.
julia> using Genie
julia> Genie.REPL.new_app("dna_demo") # create a new app
genie> Genie.REPL.new_controller("DNA") # create a controller for the app
The commands above will create a new directory called dna_demo. There will be a number of subdirectories under dna_demo/ containing general Julia code for the app. The third line creates a new controller dna_demo/app/resources/dnas/DNAsController.jl. This file is where I will add our code.
Next, we edit dna_demo/config/routes.jl to include the commands we want to process. In our case, we want to send a sequence to the server using GET and a FASTA file using a POST command. We also want to tell the server to save the blockchain in a file in JSON format and shutdown. For the latter task, we'll require a username and password.
using Router
route("/") do
Router.serve_static_file("/welcome.html")
end
route("/dnas/seq/:id", DNAsController.sequence)
route("/dnas/fasta/:id", DNAsController.fasta, method=POST)
route("/dnas/quit/:id", DNAsController.save_and_quit, method=POST)
The first route command is provided by Genie. It just serves up a welcome message. The next route statement says send a GET command to the controller in the dnas directory and pass the data to the routine DNAsController.sequence. :id is an integer id. It can be anything. The next two commands are similar except that they use the POST method.
Next we need to edit dna_demo/app/resources/dnas/DNAsController.jl to process our commands. I have left out comments in the interest of brevity. Download the full source to get some descriptive comments.
function sequence()
id = parse(Int64, @params(:id))
header = @params(:header)
if length(header) == 0
header = ">" * string(abs(rand(Int64)))
elseif header[1] != '>'
header = ">" * header
end
seq = @params(:seq)
push!(blockchain, new_block(header, seq, blockchain[end].hash, id))
return nothing
end
function fasta()
req = @params(:REQUEST)
id = parse(Int64, @params(:id))
buffer = join([Char(x) for x in req.data])
seqs = read_fasta(buffer)
for seq in seqs
push!(blockchain, new_block(seq.header, seq.seq, blockchain[end].hash, id))
end
return nothing
end
function save_and_quit()
auth = "Basic YmlsbDoxMjM0NQ==" # bill 12345 not NSA level security
req = @params(:REQUEST)
user_auth = req.headers["Authorization"]
if user_auth != auth
return "Invalid username/password."
end
write_block_file(block_file, blockchain)
exit(0)
end
Data can be retrieved by the controller using the @params macro. This is one area where the Genie docs are lacking. I had to dig through source code and web forums to figure out how to actually get the data. GET data is passed by id. POST data is retrieved from @params(:REQUEST). The file contents are passed to fasta() as an array of Int64s. We turn it into a string buffer and then read the FASTA data from the buffer. Whatever you do, don't use the method of authentication that I use here.
We need a few helper functions.
type Sequence
header :: String
seq :: String
end
function is_nucleotide(s :: String) :: Bool
nucleotide_pattern = r"^[ATCGU]+$" # regex
if ! ismatch(nucleotide_pattern, s)
println("ERROR: Invalid data.")
return false
end
return true
end
function read_fasta(buffer :: String) :: Array{Sequence}
seqs = Sequence[]
b = IOBuffer(buffer)
header = ""
seq_str = ""
while ! eof(b)
s = readline(b)
s = lstrip(rstrip(chomp(s)))
if length(s) > 0
if s[1] == '>'
if header != ""
if is_nucleotide(seq_str)
push!(seqs, Sequence(header, seq_str))
else
return Sequence[]
end
end
header = s
seq_str = ""
else
seq_str *= uppercase(s)
end
end
end
close(b)
if header != ""
push!(seqs, Sequence(header, seq_str))
end
return seqs
end
function read_block_file(block_file :: String) :: Array{Block}
blockchain = Block[]
f = open(block_file)
for line in eachline(f)
j = JSON.parse(line)
push!(blockchain,
Block(j["header"],
j["seq"],
parse(Int64, j["id"]),
j["previous_hash"],
DateTime(j["timestamp"]),
j["hash"]))
end
close(f)
return blockchain
end
function write_block_file(block_file :: String, blockchain :: Array{Block}) :: Void
open(block_file, "w") do f
for i in 1:length(blockchain)
s = JSON.json(Dict("header" => blockchain[i].header,
"seq" => blockchain[i].seq,
"id" => string(blockchain[i].id),
"previous_hash" => blockchain[i].previous_hash,
"timestamp" => string(blockchain[i].timestamp),
"hash" => blockchain[i].hash))
write(f, s * "\n")
end
end
end
function genesis_block() :: Block
return new_block("0", "0", "0", 0)
end
read_block_file reads a series of JSON lines containing the previously saved block information. JSON.parse copies the JSON data into a dictionary. write_block_file saves the blockchain in JSON format. JSON.json takes a dictionary and returns a JSON string.
To complete the controller, we add some code to get started. On startup we either load a pre-existing file in JSON format or start the blockchain with a genesis block.
println("Starting!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
blockchain = Block[]
if isfile(block_file)
blockchain = read_block_file(block_file)
if ! is_chain_valid(blockchain)
println("Fatal Error!")
exit()
end
println("Blockchain loaded.")
else
push!(blockchain, genesis_block())
println("New blockchain initialized.")
end
We now have a complete server app. To run it, cd to dna_demo and run bin/repl or bin/server. In the REPL, I usually tell the logging module Lumberjack to suppress log messages to the console. AppStart starts the server listening on port 8000.
using Lumberjack
remove_truck("console")
AppServer.startup()
We can use curl to talk to the server.
# send a single sequence
curl "localhost:8000/dnas/seq/10?header=test&seq=ATCGGGGACA"
# send a file
curl -T "crp.fasta" -X POST "localhost:8000/dnas/fasta/11"
# save the blockchain and quit
curl -u bill -X POST "localhost:8000/dnas/quit/10"
Enter host password for user 'bill':
The next time the server is started, the data will be loaded into the blockchain from the file.
The source code is available here.
Your article has piqued a lot of positive interest. I can see why since you have done such a good job of making it interesting. tikpay
ReplyDelete