I really do. I want to like Julia. There is much to like: a great REPL, multiple dispatch; concurrent, parallel, and distributed computing; direct calling of C and Fortran libraries; dynamic type system; nice package manager; macros, etc.
There are problems with Julia that may be showstoppers for me. My usual workflow with Python, R, Java, or C++ is to write code in small pieces and incrementally test, building the program one routine at a time. For example, I typically write the input routine, test; write the data processing steps, one step at a time and test; write plotting routines and test. Test the whole program and fix any problems. I really should write the tests first like I tell students, but sometimes I cheat.
The process described above is common. Julia makes approaching programming in that manner frustrating. The source of the frustration is compile time latency. Julia's JIT compiler results in great execution speeds, but you pay a price each time you load Julia or load a library module. I understand why this happens. It's so Julia can do type analysis at compile time for multiple dispatch. Multiple dispatch is key to Julia's functioning, so compile times may improve in the future, but it will always be slower than a language like Python which doesn't do as much static analysis.
How Bad Is It?
Recently, I was trying to use Julia for a small project. The details don't matter. The program flow was to read a small CSV file, do some analysis using two of the data columns, and produces four or five plots. I wasn't familiar with Julia's plotting routines so I decided to test plotting before proceeding with analysis. I started in my usual way. I fired up vscode and wrote some code to read the CSV file and use two of the data columns to make a scatter plot.
using DataFrames using CSV using PyCall @pyimport matplotlib @pyimport matplotlib.pyplot as plt function plot_temperature(df) size = 3 f1 = plt.figure() ax1 = f1.gca() ax1.plot(df.Year, df.Temperature_Anomaly, "o", markersize = size) plt.xlabel("Year") plt.ylabel("Temerature Anomaly") plt.title("Temperature Anomaly") plt.show(block = false) end function main() input_file = "../data//HadCRUT.4.6.0.0.annual_ns_avg.csv" df = DataFrame(CSV.File(input_file)) plot_temperature(df) end main()
$ time julia test_plot.jl ┌ Warning: `vendor()` is deprecated, use `BLAS.get_config()` and inspect the output instead │ caller = npyinitialize() at numpy.jl:67 └ @ PyCall ~/.julia/packages/PyCall/L0fLP/src/numpy.jl:67 real 0m25.150s user 0m24.921s sys 0m1.319s
import pandas as pd import matplotlib.pyplot as plt def plot_temperature(df): size = 3 f1 = plt.figure() ax1 = f1.gca() ax1.plot(df["Year"], df["Temperature_Anomaly"], "o", markersize = size) plt.xlabel("Year") plt.ylabel("Temerature Anomaly") plt.title("Temperature Anomaly") plt.show(block = False) def main(): input_file = "../data//HadCRUT.4.6.0.0.annual_ns_avg.csv" df = pd.read_csv(input_file) plot_temperature(df) if __name__ == "__main__": main()
$ time python test_plot.py real 0m0.671s user 0m0.691s sys 0m1.132s
test_plot <- function() { library(ggplot2) df <- read.csv("../data/HadCRUT.4.6.0.0.annual_ns_avg.csv") p <- ggplot(df) + geom_point(aes(x = Year, y = Temperature_Anomaly)) + labs(x = 'Year', y = 'Temperature Anomaly') + ggtitle('Temperature Anomaly') x11() plot(p) } test_plot()
$ time Rscript test_plot.R real 0m0.715s user 0m0.518s sys 0m0.050s
What Else?
using Plots gr() f = plot(1:10, rand(10), reuse = false) f2 = plot(11:20, rand(10), reuse = false) display(f) display(f2) println("Press <Enter> to exit...") readline()
using Plots; gaston() f = plot(1:10, rand(10), reuse = false) f2 = plot(11:20, rand(10), reuse = false) display(f) display(f2) println("Press <Enter> to exit...") readline()
No comments:
Post a Comment