Julia as Glue

 A language that doesn't affect the way you think about programming is not worth knowing.
~ Alan J. Perlis

Is it possible that software is not like anything else, that it is meant to be discarded: that the whole point is to always see it as a soap bubble?
~  Alan J Perlis


Is Julia the solution to the two language problem? The two language problem is usually described as arising from using a language like Python or R and discovering that it is two slow for the task at hand. In order to get adequate performance, key routines are written in C or C++. A prime example is Numpy for Python. Another example is using Python for data munging and R for statistical analysis and visualization.

I have been using the Python and R combo for some time now. I have started looking at Julia as a possible replacement for both which, if it is as fast as advertised, has the bonus of speeding up processing. One of the key tasks in data munging is gluing together unrelated programs. Is Julie effective glue?

I recently ran into a problem downloading stock market closing prices from Yahoo finance. I have a program, written in Python, that reads a list of stock symbols and downloads monthly closing prices. Recently, it began to fail, timing out when fetching data. This problem went along with other net problems, occasional timeouts on websites. I decided to see if I could track down what was going on with Yahoo and perhaps solve the other problem too.

I wanted to test to see if I had a connection problem on the first couple of hops, in which case the problem was with my ISP or was there a problem with Yahoo Finance in particular. I suspected the former. I wanted to gather a little data before contacting technical support.

I run Windows 10 as my OS, but all of my programming is done using Linux via wsl-2. Windows provides a nice net app called PathPing. It shows you the hops between your modem and the destination and then tests the route 100 times and counts dropped packets. If things are working well there should be no or very few missing packets. 

Since the problem appeared inconsistently, I would have to run the PathPing app several times and accumulate results. My normal approach would be to write some "glue" code in Python to run the app a few times and save the results. I wondered how difficult it would be to do this in Julia and could I spawn a Windows app from a Linux Julia program.

Julia Glue

It turns out that it's pretty easy to run tasks and grab the output. The code is shown below. The code is simple. It runs PathPing max_iter times and accumulates the results in a CSV file.

#=
timeout.jl - Run Windows 10 utility PathPing.exe and accumulate dropped packed Errors

authore: Bill Thompson
copyright: April 23, 2021
license: GPL 3
=#

using DataFrames
using CSV
using ArgParse

function GetArgs()
    # read arguments from command line
    # from https://argparsejl.readthedocs.io/en/latest/argparse.html
    s = ArgParseSettings(description = "Run Windows PathPing and collect data.",
                         version = "1.0",
                         add_version = true)

    @add_arg_table s begin
        "--target", "-t"
        required = false
        help = "Traget website. (default = finance.yahoo.com)."
        arg_type = String
        default = "finance.yahoo.com"

        "--csv_file", "-c"
        required = true
        help = "CSV file with results. (required)."
        arg_type = String

        "--num_traces", "-n"
        required = false
        help = "Number of traces to run."
        arg_type = Int
        default = 10

        "--wait_time", "-w"
        required = false
        help = "Number of seconds to wait between runs."
        arg_type = Int
        default = 10
    end

    return parse_args(s)
end

function main()
    args = GetArgs()
    csv_file = args["csv_file"]
    target = args["target"]
    max_iter = args["num_traces"]
    wait_time = args["wait_time"]

    # see https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/pathping
    cmd = `PathPing.exe /4 $target`

    iter = 1
    while iter <= max_iter
        println("Iteration ", iter, " of ", max_iter)
        
        # Run the command and grab the output.
        trace = read(cmd)
        trace_list = split(String(trace), "\r\n")  # Windowes tacks a \r on each line end
        
        # Get existing data if it's there otherwise create a dataframe.
        # Do this inside loop so we don't lose anything
        # if we quit early.
        if isfile(csv_file)
            df = DataFrame(CSV.File(csv_file))
        else
            df = DataFrame(ID = String[], Errors = Int[], IP = String[])
        end

        # Loop through each line  and look for urls, lost packets
        # IP addresses.
        # regex - find lost packets, url and IP address
        r = r"(\d+)\/ 100.+%\s+([A-Za-z0-9\.\-]+) \[(\d+\.\d+\.\d+\.\d+)\]"
        for i in 1:length(trace_list)
            m = match(r, trace_list[i])
            if ! isnothing(m)
                println(m[1], " ", m[2], " ", m[3])
                errs = parse(Int, m[1])
                if errs > 0
                    if m[2] in df[!, :ID]
                        df[df[!, :ID] .== m[2], :Errors] .+= errs
                    else
                        push!(df, [m[2], errs, m[3]])
                    end
                end
            end
        end

        # save the counts
        if size(df)[1] > 0
            CSV.write(csv_file, df)
        end

        sleep(wait_time)  # wait and try again
        iter += 1
    end
end

main()

The Good and the Bad


The Good

  • Julia has all the functions needed to run external apps.
  • It's as easy as Python to process the apps output.
  • String interpolation is handy.
  • Julia has regexes built in.

The Bad

  • Finding answers online is more difficult that with Python or R. For those languages, usually the correct answer jumps out immediately from Stack Overflow. This is probably a function of Julia's age.
  • The way Julia applies types in different functions can be confusing.
  • Multiple dispatch means that a common function like collect() has many implementations and the docs tend to be generic, not specific to a type.
  • Error messages don't always indicate the line where the error occurred.
  • Julia VScode debugger is unacceptably S-L-O-W.
My summary: So far my biggest problem with Julia is with answers online. I usually know what I want to do. It's not always easy to find an answer to the question "How do X in Julia?"

No comments:

Post a Comment