I have been wanting to try TensorFlow for some time. Tensorflow is a library, developed by Google, for dataflow programming. I got interested in dataflow long ago in my misspent youth when I was a primary developer of something called KnowledgePro which among other things did a simplified version of dataflow.
TensorFlow's killer app is deep learning. However, it has many other possible applications in areas such as machine learning and search. It is built around the notion of stateful data graphs. The idea is that you build a graph of a computation where the nodes are data, constants, or operations. The nodes are stateful in the sense that they maintain their values. The arcs link the nodes into a computational graph. Tensors flow along the arcs. Tensor are multidimensional arrays holding the constants, variables, and results of computation.
Luckily, you don't have to actually set up the graph as arcs and nodes. TensorFlow programs look like programs in any other language. The underlying code builds the graph from statements. There are bindings for the TensorFlow library in Python, C++, R, and other languages. The real power of TensorFlow comes not only from the programming model, but also from the fact that it can automatically use CUDA if you have a GPU installed, speeding up computation considerably.
To use TensorFlow, you write a series of statements describing the computation and then create a session. The session initializes variables and processes the graph, including running the calculations on a GPU if available.
As an example, I tried a very simple linear regression using the R version of the TensorFlow API.
First, we generate some data
x <- seq(0,1,length=100)
y <- 2 + 1.5*x + rnorm(100, 0, 0.1)
data <- data.frame(X=x, Y=y)
ggplot(data, aes(x=X, y=Y))+geom_point()
Here's what the data looks like.
Finding the slope and intercept in R would be trivial with the lm() function. Instead, we'll use TensorFlow and try to find a best fit line by gradient descent.
Installing TensorFlow in R is pretty easy with RStudio.
install.packages("devtools")
devtools::install_github("rstudio/tensorflow")
We will try to fit the data shown above to a linear function $y = wx + b$. To do this, we'll use TensorFlow's built-in GradientDescentOptimizer to estimate w, the slope, and b, the intercept by minimizing the sum of the squared differences between the fitted values and the data.
A TensorFlow program consists of a declarative part describing the computational graph and a session to run the graph.
To start, we define a couple of constants to hold X and Y. tf is an R object that we use to access the TensorFlow library. data is our data frame with the X and Y data.
X = tf$constant(data$X)
Y = tf$constant(data$Y)
Next, we create some variables for slope and intercept.
W = tf$Variable(0.0, name = "weights")
b = tf$Variable(0.0, name = "bias")
We will create an operation to initialize the variables. Defining a variable does not give it a value. We will use a built in function to do the initialization. Note that here we are defining the operation, not doing the initialization. That comes later. We will also define our predictive function and our loss function.
# an op to initialize the variables when the session starts
init_op = tf$global_variables_initializer()
# predict Y given b and W
pred = tf$add(tf$multiply(X, W),b)
# the loss function
sq = tf$square(Y-pred)
loss = tf$reduce_mean(sq, name = "loss") # sum the squares
The prediction function could have been written simply as $y = wx + b$.
The final declaration defines the optimizer.
# the optimizer - minimize the loss
optimizer = tf$train$GradientDescentOptimizer(learning_rate=learning_rate)$minimize(loss)
train$GradientDescentOptimizer() is a builtin optimization method. learning_rate is related to the step size as the optimizer attempts to move down the gradient of the function. Choose a rate that is too high and you might skip over an optimum. Make it too low, and the routine will be slow to converge. See this discussion for some guidelines. For our simple example, we'll just guess.
To run the graph, we create a session.
loss_val = 1.0e+30
with(tf$Session() %as% sess, {
sess$run(init_op) # intialize the variables
# training
for(i in 1:epochs) { # run for many epochs
# run the optimizer
# sess$run can return a list of values. In this case we want to know the loss
# and maybe print b and W.
res = sess$run(list(optimizer, loss, b, W))
loss_diff = abs(loss_val-res[2][[1]])
loss_val = res[2][[1]]
if (loss_diff < tol) { # are we done?
break
}
if(debug && (i %% 50 == 0)) {
cat(sprintf('epoch = %d loss = %f, b = %f, w = %f\n',
i, loss_val, res[3][[1]], res[4][[1]]))
}
}
# display the final results
w_value = sess$run(W)
b_value = sess$run(b)
cat(sprintf('intercept = %f slope = %f\n', b_value, w_value))
})
The session exits when we have exhausted the number of iterations or the change in the loss has flattened.
I tried the above method with $leaning\_rate = 0.01$. Here are the results. This computation exhausted the maximum 1000 epochs that I allowed. Possibly, the learning rate is too low.
intercept = 2.025227 slope = 1.448584
SSE = 1.298184 R2 = 0.939887
For comparison, R's nls function produced the following. The TensorFlow results are close but not quite there yet.
Formula: Y ~ a + b * x
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 1.97172 0.02208 89.31 <2e-16 ***
b 1.54842 0.03814 40.59 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1112 on 98 degrees of freedom
Number of iterations to convergence: 1
Achieved convergence tolerance: 8.312e-10
f SSE = 1.212168 f R2 = 0.943870
Here's a plot of the predicted lines.
Changing the learning rate to 0.1 gives a result closer to that of nls.
intercept = 1.972263 slope = 1.547410
SSE = 1.212176 R2 = 0.943870
Visually, there is no difference in the predicted lines.
We only see one line because the TensorFlow line is covered by the predicted line from nls.
Here's the whole R method as a function.
tf_grad <- function(data, epochs=1000, learning_rate=0.01, tol = 1.0e-10, debug=FALSE) {
# tf_grad - perform a linear fit of data using TensorFlow's gradient descent optimizer
# arguments:
# data - a data.frame with columns X and Y
# epochs - the maximum number of iterations calling gradient descent
# tol - exit when teh difference between losses is less than this value
# debug - if TRUE, prints fit aND loss values every 50 iterations
# returns:
# a list - w, the slope, b, the intercept, and f, the nls model
require(tensorflow)
require(ggplot2)
m <- dim(data)[1]
# define some constants to hold X and Y data
X = tf$constant(data$X)
Y = tf$constant(data$Y)
# some variables
W = tf$Variable(0.0, name = "weights")
b = tf$Variable(0.0, name = "bias")
# an op to initialize the variables when the session starts
init_op = tf$global_variables_initializer()
# predict Y given b and W
pred = tf$add(tf$multiply(X, W),b)
# the loss function
sq = tf$square(Y-pred)
loss = tf$reduce_mean(sq, name = "loss")
# the optimizer - minize the loss
optimizer = tf$train$GradientDescentOptimizer(learning_rate=learning_rate)$minimize(loss)
# create a session
loss_val = 1.0e+30
with(tf$Session() %as% sess, {
sess$run(init_op) # intialize the variables
# training
for(i in 1:epochs) { # run for many epochs
# run the optimizer
# sess$run can return a list of values. In this case we want to know the loss
# and maybe print b and W.
res = sess$run(list(optimizer, loss, b, W))
loss_diff = abs(loss_val-res[2][[1]])
loss_val = res[2][[1]]
if (loss_diff < tol) { # are we done?
break
}
if(debug && (i %% 50 == 0)) {
cat(sprintf('epoch = %d loss = %f, b = %f, w = %f\n',
i, loss_val, res[3][[1]], res[4][[1]]))
}
}
# display the final results
w_value = sess$run(W)
b_value = sess$run(b)
cat(sprintf('intercept = %f slope = %f\n', b_value, w_value))
})
# get the SSE and R-squared
y = w_value * data$X + b_value
R2 <- 1 - (sum((data$Y-y)^2)/sum((data$Y-mean(data$Y))^2))
cat(sprintf('SSE = %f R2 = %f\n', sum((y-data$Y)^2), R2))
# use non-linear least squares for comparison
f = nls(Y ~ a + b * x, data=data, start=c(a=1, b=data$X[1]))
print(summary(f))
fR2 <- 1 - (sum(resid(f)^2)/sum((data$Y-mean(data$Y))^2))
cat(sprintf('f SSE = %f f R2 = %f\n', sum(resid(f)^2), fR2))
# plot the results
q <- ggplot(data, aes(x=X, y=Y)) +
geom_point() +
geom_abline(aes(slope=w_value, intercept=b_value,
color='tensorflow')) +
geom_abline(aes(slope=coef(f)[2], intercept=coef(f)[1],
color='lm')) +
labs(color='Method')
print(q)
return(list(w = w_value, b = b_value, f = f))
}
Incidentally, not everyone likes TensorFlow.
No comments:
Post a Comment