# Keras with Eager Execution

Eager execution is a way to train a Keras model without building a graph. Operations return values, not tensors. Consequently, you can inspect what goes in and comes out of an operation simply by printing a variable’s contents. This is an important advantage in model development and debugging.

You can use eager execution with Keras as long as you use the TensorFlow implementation. This guide gives an outline of the workflow by way of a simple regression example. Specifically, you will see how to:

• Set up your environment for eager execution
• Define the main ingredients: a Keras model, an optimizer and a loss function
• Feed data to the training routine
• Write a simple training loop that does backprop on the model’s weights
• Make predictions on the test set
• Save the model’s weights

## Requirements

To use eager execution with Keras, you need a current version of the R package keras with a TensorFlow backend of version at least 1.9.

The following preamble is required when using eager execution:

library(keras)
# make sure we use the tensorflow implementation of Keras
# this line has to be executed immediately after loading the library
use_implementation("tensorflow")

library(tensorflow)
# enable eager execution
# the argument device_policy is needed only when using a GPU
tfe_enable_eager_execution(device_policy = "silent")

When in doubt, check if you are in fact using eager execution:

tf$executing_eagerly() ## Define a model Models for use with eager execution are defined as Keras custom models. Custom models are usually made up of normal Keras layers, which you configure as usual. However, you are free to implement custom logic in the model’s (implicit) call function. Our simple regression example will use iris to predict Sepal.Width from Petal.Length, Sepal.Length and Petal.Width. Here is a model that can be used for that purpose: # model instantiator iris_regression_model <- function(name = NULL) { keras_model_custom(name = name, function(self) { # define any number of layers here self$dense1 <- layer_dense(units = 32)
self$dropout <- layer_dropout(rate = 0.5) self$dense2 <- layer_dense(units = 1)

# this is the "call" function that defines what happens when the model is called
function (x, mask = NULL) {
x %>%
self$dense1() %>% self$dropout() %>%
self$dense2() } }) } The model is created simply by instantiating it via its wrapper: model <- iris_regression_model() At this point, the shapes of the model’s weights are still unknown (note how no input_shape has been defined for its first layer). You can, however, already call the model on some dummy data: model(k_constant(matrix(1:6, nrow = 2, ncol = 3))) tf.Tensor( [[-1.1474639] [-1.0472134]], shape=(2, 1), dtype=float32) After that call, you can inspect the model’s weights using model$weights

This will not just display the tensor shapes, but the actual weight values.

## Losses and optimizers

An appropriate loss function for a regression task like this is mean squared error:

mse_loss <- function(y_true, y_pred, x) {
# it's required to use a TensorFlow function here, not loss_mean_squared_error() from Keras
mse <- tf$losses$mean_squared_error(y_true, y_pred)
# here you could compute and add other losses
mse
}

Note how we have to use loss functions from TensorFlow, not the Keras equivalents. In the same vein, we need to use an optimizer from the tf$train module. # have to use an optimizer from tf$train, not Keras
optimizer <- tf$train$AdamOptimizer()

## Use tfdatasets to feed the data

In eager execution, you use tfdatasets to stream input and target data to the model. In our simple iris example, we use tensor_slices_dataset to directly create a dataset from the underlying R matrices x_train and y_train.

However, a wide variety of other dataset creation functions is available. Datasets also allow for a variety of pre-processing transformations.

x_train <-
iris[1:120, c("Petal.Length", "Sepal.Length", "Petal.Width")] %>% as.matrix()
y_train <-
iris[1:120, c("Sepal.Width")] %>% as.matrix()

# Convert to approriate tensor floating point type for backend
x_train <- k_constant(x_train)
y_train <- k_constant(y_train)

# same for test set
x_test <-
iris[121:150, c("Petal.Length", "Sepal.Length", "Petal.Width")] %>% as.matrix()
y_test <-
iris[121:150, c("Sepal.Width")] %>% as.matrix()
x_test <- k_constant(x_test)
y_test <- k_constant(y_test)

library(tfdatasets)
train_dataset <- tensor_slices_dataset(list (x_train, y_train)) %>%
dataset_batch(10)
test_dataset <- tensor_slices_dataset(list (x_test, y_test)) %>%
dataset_batch(10)

Data is accessed from a dataset via make_iterator_one_shot (to create an iterator) and iterator_get_next (to obtain the next batch).

iter <- make_iterator_one_shot(train_dataset)
batch <-  iterator_get_next(iter)

Datasets are available in non-eager (graph) execution as well. However, in eager mode, we can examine the actual values returned from the iterator:

batch
[[1]]
tf.Tensor(
[[1.4 5.1 0.2]
[1.4 4.9 0.2]
[1.3 4.7 0.2]
[1.5 4.6 0.2]
[1.4 5.  0.2]
[1.7 5.4 0.4]
[1.4 4.6 0.3]
[1.5 5.  0.2]
[1.4 4.4 0.2]
[1.5 4.9 0.1]], shape=(10, 3), dtype=float32)

[[2]]
tf.Tensor(
[[3.5]
[3. ]
[3.2]
[3.1]
[3.6]
[3.9]
[3.4]
[3.4]
[2.9]
[3.1]], shape=(10, 1), dtype=float32)

## Training loop

With eager execution, you take full control over the training process.

In general, you will have at least two loops: an outer loop over epochs, and an inner loop over batches of data returned by the iterator (implemented implicitly by until_out_of_range). The iterator is recreated at the start of each new epoch.

n_epochs <- 10

for (i in seq_len(n_epochs)) {

iter <- make_iterator_one_shot(train_dataset)
total_loss <- 0

until_out_of_range({

# get a new batch and run forward pass on it

# calculate loss

# calculate gradients of loss w.r.t. model weights

# update model weights

})

cat("Total loss (epoch): ", i, ": ", as.numeric(total_loss), "\n")
}

Filling in the missing pieces in the above outline, we will see that

• Forward propagation is simply a call to model().
• This call has to happen inside the context of a GradientTape that records all operations.
• Loss is calculated using the loss function defined before.
• From the loss on the one hand and the model’s current weights on the other hand, GradientTape then determines the gradients.
• Finally, the optimizer applies the gradients to the weights in its algorithm-specific way.

Here is the complete code for the training loop:

n_epochs <- 10

# loop over epochs
for (i in seq_len(n_epochs)) {

# create fresh iterator from dataset
iter <- make_iterator_one_shot(train_dataset)

# accumulate current epoch's loss (for display purposes only)
total_loss <- 0

# loop once through the dataset
until_out_of_range({

# get next batch
batch <-  iterator_get_next(iter)
x <- batch[[1]]
y <- batch[[2]]

# forward pass is recorded by tf$GradientTape with(tf$GradientTape() %as% tape, {

# run model on current batch
preds <- model(x)

# compute the loss
loss <- mse_loss(y, preds, x)
})

# update total loss
total_loss <- total_loss + loss

# get gradients of loss w.r.t. model weights
gradients <- tape$gradient(loss, model$variables)

# update model weights
optimizer$apply_gradients( purrr::transpose(list(gradients, model$variables)),
global_step = tf$train$get_or_create_global_step()
)

})

cat("Total loss (epoch): ", i, ": ", as.numeric(total_loss), "\n")
}

## Predictions on the test set

Getting predictions on the test set is just a call to model, just like training has been.

model(x_test)

## Saving and restoring model weights

To save model weights, create an instance of tf$Checkpoint and pass it the objects to be saved: in our case, the model and the optimizer. This has to happen after the respective objects have been created, but before the training loop. checkpoint_dir <- "./checkpoints" checkpoint_prefix <- file.path(checkpoint_dir, "ckpt") checkpoint <- tf$train$Checkpoint( optimizer = optimizer, model = model ) Then at the end of each epoch, you save the model’s current weights, like so: checkpoint$save(file_prefix = checkpoint_prefix)

This call saves model weights only, not the complete graph. Thus on restore, you re-create all components in the same way as above, and then load saved the model weights using e.g.

# restore from recent checkpoint, you can also use a different one
checkpoint$restore(tf$train$latest_checkpoint(checkpoint_dir)) You can then obtain predictions from the restored model, on the test set as a whole or batch-wise, using an iterator. model(x_test) iter <- make_iterator_one_shot(test_dataset) until_out_of_range({ batch <- iterator_get_next(iter) preds <- model(batch[[1]]) print(preds) }) ## Complete example Here is the complete example. library(keras) use_implementation("tensorflow") library(tensorflow) tfe_enable_eager_execution(device_policy = "silent") library(tfdatasets) # Prepare training and test sets ------------------------------------------ x_train <- iris[1:120, c("Petal.Length", "Sepal.Length", "Petal.Width")] %>% as.matrix() x_train <- k_constant(x_train) y_train <- iris[1:120, c("Sepal.Width")] %>% as.matrix() y_train <- k_constant(y_train) x_test <- iris[121:150, c("Petal.Length", "Sepal.Length", "Petal.Width")] %>% as.matrix() x_test <- k_constant(x_test) y_test <- iris[121:150, c("Sepal.Width")] %>% as.matrix() y_test <- k_constant(y_test) # Create datasets for training and testing -------------------------------- train_dataset <- tensor_slices_dataset(list (x_train, y_train)) %>% dataset_batch(10) test_dataset <- tensor_slices_dataset(list (x_test, y_test)) %>% dataset_batch(10) # Create model ------------------------------------------------------------ iris_regression_model <- function(name = NULL) { keras_model_custom(name = name, function(self) { self$dense1 <- layer_dense(units = 32, input_shape = 3)
self$dropout <- layer_dropout(rate = 0.5) self$dense2 <- layer_dense(units = 1)

function (x, mask = NULL) {
self$dense1(x) %>% self$dropout() %>%
self$dense2() } }) } model <- iris_regression_model() # Define loss function and optimizer -------------------------------------- mse_loss <- function(y_true, y_pred, x) { mse <- tf$losses$mean_squared_error(y_true, y_pred) mse } optimizer <- tf$train$AdamOptimizer() # Set up checkpointing ---------------------------------------------------- checkpoint_dir <- "./checkpoints" checkpoint_prefix <- file.path(checkpoint_dir, "ckpt") checkpoint <- tf$train$Checkpoint(optimizer = optimizer, model = model) n_epochs <- 10 # change to TRUE if you want to restore weights restore <- FALSE if (!restore) { for (i in seq_len(n_epochs)) { iter <- make_iterator_one_shot(train_dataset) total_loss <- 0 until_out_of_range({ batch <- iterator_get_next(iter) x <- batch[[1]] y <- batch[[2]] with(tf$GradientTape() %as% tape, {
preds <- model(x)
loss <- mse_loss(y, preds, x)
})

total_loss <- total_loss + loss
gradients <- tape$gradient(loss, model$variables)

optimizer$apply_gradients(purrr::transpose(list(gradients, model$variables)),
global_step = tf$train$get_or_create_global_step())

})

cat("Total loss (epoch): ", i, ": ", as.numeric(total_loss), "\n")

checkpoint$save(file_prefix = checkpoint_prefix) } } else { checkpoint$restore(tf$train$latest_checkpoint(checkpoint_dir))
}

# Get model predictions on test set ---------------------------------------

model(x_test)

iter <- make_iterator_one_shot(test_dataset)
until_out_of_range({
batch <-  iterator_get_next(iter)
preds <- model(batch[[1]])
print(preds)
})

## Where to from here

In this guide, the task - and consequently, the custom model, associated loss and training routine - have been chosen for their simplicity. Visit the TensorFlow for R blog for case studies and paper implementations that use more intricate custom logic.