class: middle, inverse ## Torch for R ### Daniel Falbel ### RStudio, PBC January 27, 2021 --- class: normal, middle ## Who am I? .pull-left[ - Daniel Falbel - Live in São Paulo, Brazil - Software engineer at RStudio, PBC - Working in the Multiverse team ] --- class: normal, middle ## Outline .pull-left[ - What's Torch? - The Torch components - Contributing - Future work ] --- class: normal, middle ## What's Torch? .pull-left[ Torch is an R package with 2 core features: - Array computation with strong GPU accelation - Deep neural networks built on a tape-based autograd system ] --- class: normal, middle ## Why Torch? .pull-left[ Torch is based on PyTorch, a framework which's rapidly increasing popularity among deep learning researchers. We believe others can use torch's GPU acceleration to implement fast machine learning algorithms using it's convenient interface. ] .pull-right[  Papers with code [trends](https://paperswithcode.com/trends) section. ] --- class: normal, middle ## Torch and TensorFlow .pull-left[ - Torch for R is in early development stage. TensorFlow is more mature. - Torch uses LibTorch (C++ library) and TensorFlow uses the python implementation via reticulate. ] .pull-right[ - Torch is has a more lower level API. Keras has a very high level and concise API. ] --- class: normal, middle ## Implementation .pull-left[ - Almost all `torch_*` functions are autogenerated from the LibTorch declaration file. - Most Neural network modules, optimizers and datasets and dataloaders code is writen in R. ] .pull-right[  ] --- class: normal, middle ## example_00.R Click [here](https://gist.github.com/dfalbel/10f2fca89dd1e7713be62785435e9064#file-example_00-r) for a link. --- class: normal, middle ## Torch components  --- class: normal, middle ## Tensors --- class: normal, middle ## Tensors  The `torch_tensor` is the core data structure in torch. --- class: normal, middle ## Creating tensors from R objects .pull-left[ - Tensors can be created from R objects like `numeric` vectors, matrix's and arrays. - Currently only integer, doubles a logicals are supported. - **Note**: doubles are converted to float, because most operations in torch are optimized for it. ] .pull-right[ ```r torch_tensor(c(1L, 2L, 3L)) ``` ``` ## torch_tensor ## 1 ## 2 ## 3 ## [ CPULongType{3} ] ``` ```r m <- matrix(c(1,2,3,4), ncol = 2) torch_tensor(m) ``` ``` ## torch_tensor ## 1 3 ## 2 4 ## [ CPUFloatType{2,2} ] ``` ] --- class: normal, middle ## Initialization functions .pull-left[ - Tensors can be created using the initialization functions. - These functions have a convenient interface for creating multi-dimensional arrays with any size. - See more info [here](https://torch.mlverse.org/docs/articles/tensor-creation.html#using-creation-functions-1). ] .pull-right[ ```r # 2x2 matrix, standard normal torch_randn(2, 2) ``` ``` ## torch_tensor ## -0.4151 0.8593 ## -0.4053 -0.2437 ## [ CPUFloatType{2,2} ] ``` ```r # lengh 3 vector, [0,1] uniform torch_rand(3) ``` ``` ## torch_tensor ## 0.8956 ## 0.0358 ## 0.6335 ## [ CPUFloatType{3} ] ``` ] --- class: normal, middle ## Indexing .pull-left[ - Indexing tensors is supported but differs from R indexing in a few cases. - Negative indexing doesn't remove elements, instead it selects starting from the end. - See the docs [here](https://torch.mlverse.org/docs/articles/indexing.html). ] .pull-right[ ```r x <- torch_tensor(1:5) x[1] ``` ``` ## torch_tensor ## 1 ## [ CPULongType{} ] ``` ```r x[-1] ``` ``` ## torch_tensor ## 5 ## [ CPULongType{} ] ``` ] --- class: normal, middle ## Indexing .pull-left[ - Interval selection works as expected ] .pull-right[ ```r x <- torch_tensor(1:5) x[1:3] ``` ``` ## torch_tensor ## 1 ## 2 ## 3 ## [ CPULongType{3} ] ``` ```r x[-3:N] ``` ``` ## torch_tensor ## 3 ## 4 ## 5 ## [ CPULongType{3} ] ``` ] --- class: normal, middle ## Indexing .pull-left[ - Selecting without specifying the number of dimensions. - Adding new dimensions is also supported. ] .pull-right[ ```r x <- torch_randn(2,2,3) x[.., 1]$shape ``` ``` ## [1] 2 2 ``` ```r x[.., 1, drop = FALSE]$shape ``` ``` ## [1] 2 2 1 ``` ```r x[.., newaxis]$shape ``` ``` ## [1] 2 2 3 1 ``` ] --- class: normal, middle ## Indexing .pull-left[ - Subset assignment is also supported. ] .pull-right[ ```r x <- torch_tensor(c(1,2,3)) x[1] <- 10 x[2:3] <- c(9, 8) x ``` ``` ## torch_tensor ## 10 ## 9 ## 8 ## [ CPUFloatType{3} ] ``` ] --- class: normal, middle ## Accessing attributes .pull-left[ - Tensor attributes can be accessed using the `$` operator. - All tensors have a data type (`dtype`), a shape, device and the `requires_grad` flag. ] .pull-right[ ```r x <- torch_randn(2,2) x$shape ``` ``` ## [1] 2 2 ``` ```r x$dtype ``` ``` ## torch_Float ``` ```r x$device ``` ``` ## torch_device(type='cpu') ``` ```r x$requires_grad ``` ``` ## [1] FALSE ``` ] --- class: normal, middle ## Modifying attributes .pull-left[ - You can change all tensor attributes using the `to` method. - Use named arguments. ] .pull-right[ ```r x <- torch_tensor(1:5) x$dtype ``` ``` ## torch_Long ``` ```r x <- x$to(dtype = torch_float()) x$dtype ``` ``` ## torch_Float ``` ```r x <- x$to(device = "cuda") x$device ## torch_device(type='cuda', index=0) ``` ] --- class: normal, middle ## Array computation --- class: normal, middle ## Array computation .pull-left[ - torch features a comprehensive tensor computation library. - More than 200 functions and methods to manipulate tensors. - Many times you can choose between using the method or the function directly. - Functions have the `torch_*` prefix. ] .pull-left[ ```r x <- torch_randn(2,3) torch_mean(x) ``` ``` ## torch_tensor ## 0.551289 ## [ CPUFloatType{} ] ``` ```r x$mean() ``` ``` ## torch_tensor ## 0.551289 ## [ CPUFloatType{} ] ``` ```r torch_sum(x, dim = 2) ``` ``` ## torch_tensor ## 0.0732 ## 3.2346 ## [ CPUFloatType{2} ] ``` ] --- class: normal, middle ## Other useful functions .pull-left[ - There are many other functions for many math operations you can think. - See the full list [here](https://torch.mlverse.org/docs/reference/index.html#section-mathematical-operations-on-tensors). ] .pull-right[ ```r x <- torch_tensor(c(1, 2)) y <- torch_tensor(c(3, 4)) torch_cat(list(x, y)) ``` ``` ## torch_tensor ## 1 ## 2 ## 3 ## 4 ## [ CPUFloatType{4} ] ``` ```r torch_unbind(x, dim = 1) ``` ``` ## [[1]] ## torch_tensor ## 1 ## [ CPUFloatType{} ] ## ## [[2]] ## torch_tensor ## 2 ## [ CPUFloatType{} ] ``` ] --- class: normal, middle ## Methods .pull-left[ - Tensor methods are acessed using the `$` operator. - Methods with the names ending with `_` operate **in-place**. - Full list available [here](https://torch.mlverse.org/docs/articles/tensor/index.html). ] .pull-right[ ```r x <- torch_tensor(c(1,2)) x$mean() ``` ``` ## torch_tensor ## 1.5 ## [ CPUFloatType{} ] ``` ```r y <- x$add_(1L); x ``` ``` ## torch_tensor ## 2 ## 3 ## [ CPUFloatType{2} ] ``` ] --- class: normal, middle ## Autograd --- class: normal, middle ## What's autograd? .pull-left[ - Autograd can automatically compute exact derivatives of tensor operations. - It's the core feature that allows torch to be used for training neural networks. - You need to set `requires_grad = TRUE` if you want torch to track the operations and be able to compute derivatives. ] .pull-right[ ```r x <- torch_tensor( 2, requires_grad = TRUE ) y <- x ^ 3 # torch will compute dy/d* y$backward() x$grad # 3 * x ^ 2 ``` ``` ## torch_tensor ## 12 ## [ CPUFloatType{1} ] ``` ] --- class: normal, middle ## Autograd <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Automatic differentiation is really pretty fantastic. There are so many things where I would think “In principle that is differentiable, but there is no way in hell I am going to work it out, so I’ll do something else instead”, but it Just Works with AD.</p>— John Carmack (@ID_AA_Carmack) <a href="https://twitter.com/ID_AA_Carmack/status/1353027631130832896?ref_src=twsrc%5Etfw">January 23, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> --- class: normal, middle ## example_01.R Click [here](https://gist.github.com/dfalbel/10f2fca89dd1e7713be62785435e9064#file-example_01-r) for a link. --- class: normal, middle ## Extensions .pull-left[ - The autograd system can be extended if you need to add a function that can't be composed of other torch functions. - See documentation [here](https://torch.mlverse.org/docs/articles/extending-autograd.html) ] .pull-right[ ```r mul_constant <- autograd_function( forward = function(ctx, tensor, constant) { ctx$save_for_backward(constant = constant) tensor * constant }, backward = function(ctx, grad_output) { v <- ctx$saved_variables list( tensor = grad_output * v$constant ) } ) x <- torch_tensor(1, requires_grad = TRUE) o <- mul_constant(x, 2) o$backward() x$grad ``` ``` ## torch_tensor ## 2 ## [ CPUFloatType{1} ] ``` ] --- class: normal, middle ## Read more .pull-left[ - Read Sigrid's blog post [introducing autograd](https://blogs.rstudio.com/ai/posts/2020-10-05-torch-network-with-autograd/) for further discussion. - The white paper describing the PyTorch's implementation of automatic differentiation. See [here](https://openreview.net/forum?id=BJJsrmfCZ) ] .pull-right[  ] --- class: normal, middle ## Neural network modules --- class: normal, middle ## Neural network modules .pull-left[ - All models and layers use `nn_modules`. - Models and layers are functions that transform input data. They have 'weights' (parameters) in their state. - nn_module's make it easy to handle the state of the models. - Modules are a convenient way to reuse code too. ] .pull-right[ .panelset[ .panel[.panel-name[Without nn_module] ```r Linear <- function(in_feat, out_feat) { w <- torch_randn(in_feat, out_feat) b <- torch_zeros(out_feat) function(input) { torch_mm(input, w) + b } } linear <- Linear(10, 1) input <- torch_randn(2, 10) linear(input) ``` ``` ## torch_tensor ## -0.2812 ## 1.7096 ## [ CPUFloatType{2,1} ] ``` ] .panel[.panel-name[With nn_module] ```r Linear <- nn_module( initialize = function(in_feat, out_feat) { self$w <- nn_parameter(torch_randn(in_feat, out_feat)) self$b <- nn_parameter(torch_zeros(out_feat)) }, forward = function(input) { torch_mm(input, self$w) + self$b } ) linear <- Linear(10, 1) input <- torch_randn(2, 10) linear(input) ``` ``` ## torch_tensor ## -0.1271 ## -0.2898 ## [ CPUFloatType{2,1} ] ``` ] ] ] --- class: normal, middle ## Handling the state .pull-left[ - It's easy to access the parameters of the model. - It's easy to move the model parameters to the 'cuda' device, or back to the 'cpu'. ] .pull-right[ ```r # list all parameters str(linear$parameters) ``` ``` ## List of 2 ## $ w:Float [1:10, 1:1] ## $ b:Float [1:1] ``` ```r # acess parameters individually linear$w linear$b # move model to the cuda device model$to(device = "cuda") ``` ] --- class: normal, middle ## It's all implemented in R .pull-left[ - All modules in the torch package are implemented this way. - There are many code examples for you to use and learn from. ] .pull-right[  (The implementation of the [linear module](https://github.com/mlverse/torch/blob/master/R/nn-linear.R#L59-L84)) ] --- class: normal, middle ## Modules can handle sub-modules .pull-left[ - Parameters of submodules are correctly tracked. - Modules are a good abstraction for models, ie. combination of other modules (or layers). - Some modules might not have parameters like `nn_relu()`. In this case you could also use the `nnf_relu` function in the forward call. ] .pull-left[ ```r mlp_module <- nn_module( initialize = function(in_feat, hidden_feat, out_feat) { self$fc1 <- nn_linear(in_feat, hidden_feat) self$relu <- nn_relu() self$fc2 <- nn_linear(hidden_feat, out_feat) }, forward = function(input) { input %>% self$fc1() %>% self$relu() %>% self$fc2() } ) mlp <- mlp_module(10, 20, 1) str(mlp$parameters) ``` ``` ## List of 4 ## $ fc1.weight:Float [1:20, 1:10] ## $ fc1.bias :Float [1:20] ## $ fc2.weight:Float [1:1, 1:20] ## $ fc2.bias :Float [1:1] ``` ] --- class: normal, middle ## Sequential models .pull-left[ - You can use `nn_sequential` if the forward function of your model just call's all submodules in order and you don't need an initialize function. - You can also have sequential models inside `nn_module`'s. ] .pull-right[ ```r mlp <- nn_sequential( nn_linear(10, 20), nn_relu(), nn_linear(20, 1) ) str(mlp$parameters) ``` ``` ## List of 4 ## $ 0.weight:Float [1:20, 1:10] ## $ 0.bias :Float [1:20] ## $ 2.weight:Float [1:1, 1:20] ## $ 2.bias :Float [1:1] ``` ] --- class: normal, middle ## The functional API .pull-left[ - Most `nn_modules` use the respective functional interface in their implementation. For example: nn_relu uses nnf_relu, nn_conv2d uses nnf_conv2d. - The functional is usually the forward method of the `nn_modules`. - You can choose the interface that you prefer and is better for your use case. - **Note:** We have almost complete feature parity with PyTorch on `nnf_*` functions but not on `nn_*` modules. ] .pull-right[ ```r input <- torch_tensor(c(-1, 1)) nnf_relu(input) ``` ``` ## torch_tensor ## 0 ## 1 ## [ CPUFloatType{2} ] ``` ```r relu <- nn_relu() relu(input) ``` ``` ## torch_tensor ## 0 ## 1 ## [ CPUFloatType{2} ] ``` ] --- class: normal, middle ## Optimizers --- class: normal, middle ## Optimizers .pull-left[ - Optimizers are torch's abstraction for for defining the optimization steps. - They encapsulate code responsible for updating weights in a model. - They are also implemented in R! See the SGD implementation [here](https://github.com/mlverse/torch/blob/master/R/optim-sgd.R) - Most optimizers are quite simple to implement, but it get's tricky when the optimizer must store some state, like Adam, SGD with momentum and others. ] .pull-right[ .panelset[ .panel[.panel-name[Without optim] ```r parameters <- ... learning_rate <- 0.001 for (parameter in parameters) { # you need to temporarily disable the autograd tracking # as this operations is not part of the model training. with_no_grad({ parameter$sub_(parameter$grad * learning_rate) parameter$grad$zero_() }) } ``` ] .panel[.panel-name[With optim] ```r # the optimizer can keep track of state, etc. optim <- optim_sgd(parameters, lr = 0.001) optim$zero_grad() ... # loss backward.. optim$step() ``` ] ] ] --- class: normal, middle ## Optimizers .pull-left[ - Many optimizers are already implemented thanks to [Krzysztof Joachimiak](https://github.com/krzjoa). - If you want to contribute, comment [here](https://github.com/mlverse/torch/issues/147). ] .pull-right[ - optim_sgd - optim_adam - optim_adagrad - optim_adadelta - optim_asgd - optim_rmsprop - optim_rprop - optim_lbfgs ] --- class: normal, middle ## Learning rate schedulers .pull-left[ - Related to the optimizers we also implemented some learning rate schedulers. - Varying the learning rate during training is a common technique for faster convergence. - See a learning rate scheduler in action in [this post](https://blogs.rstudio.com/ai/posts/2020-10-19-torch-image-classification/) by Sigrid. ] .pull-right[  ] --- class: normal, middle ## Datasets --- class: normal, middle ## Datasets .pull-left[ - Your dataset may not fit completely on RAM, and that's fine because you usually only need a single batch on RAM. - You need to pass the implementation for 3 methods: - `initialize` takes inputs for the datasets - `.getitem` retrieves an index and returns the data - `.length` returns the number of observations in the dataset ] .pull-right[ ```r mydataset <- dataset( initialize = function(paths_to_imgs, labels) { self$paths <- path_to_imgs self$labels <- labels }, .getitem = function(i) { img <- jpeg::readJPEG(self$paths[i]) list(x = img, y = self$labels[i]) }, .length = function() { length(self$paths) } ) # initialize the datase ds <- mydataset(c("hello.jpg", "bye.jpg"), labels = c(0, 1)) ds[1] # take the item with index 1 of the dataset length(ds) # returns the lenght of the dataset ``` ] --- class: normal, middle ## Dataset .pull-left[ - Your dataset `initialize` function can do anything you want. A common pattern is to make your initialize function to download the data and cache in a local directory. - It's also common that the initialize functions prepares the data to a format that can be easily consumed by `.getitem`. - Currently we only support `map` datasets. We plan to support other kind of datasets. Watch [this talk](https://www.youtube.com/watch?v=sCsPzVumtR8&list=PL_lsbAsL_o2BY-RrqVDKDcywKnuUTp-f3&index=6) for other examples and details. ] .pull-right[ - The .getitem can also do anything you want, including tranforming and normalizing examples. For example, in torchvision we implement a large number of transforms that can be used for image data augmnetation. These transforms are usually applied in this method. - Some common uses of the .getitem method are: reading data from disk, subsetting data from RAM, querying a database and others. - See a few examples of implemented datasets [here](https://github.com/mlverse/torchvision/blob/main/R/dataset-cifar.R), [here](https://github.com/mlverse/torchdatasets/blob/master/R/bird-species.R) and [here](https://github.com/curso-r/torchaudio/blob/master/R/dataset-speechcommands.R). ] --- class: normal, middle ## Dataloaders .pull-left[ - Dataloaders can be conviniently used to pull batches from datasets. It supports any kind of datasets and can shuffle them. - You can easily iterate over a dataloader with the `enumerate` function. - You can also use `dataloader_make_iter` and `dataloader_next` to manually iterate over them. ] .pull-left[ ```r x <- torch_randn(100, 10) y <- torch_randn(100, 1) ds <- tensor_dataset(x = x, y = y) dl <- dataloader(ds, batch_size = 50) for (batch in enumerate(dl)){ str(batch$x) str(batch$y) } ``` ``` ## Float [1:50, 1:10] ## Float [1:50, 1:1] ## Float [1:50, 1:10] ## Float [1:50, 1:1] ``` ] --- class: normal, middle ## Dataloaders .pull-left[ - Dataloaders can make use of the `num_workers` argument to take batches in parallel. - There are 3 main things that affect performance of parallel dataloaders: - number of workers - time per batch: ie. time to run `.getitem` batch size times. - size of the returned tensor. ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-26-1.png" width="500px" height="350px" /> ] --- class: normal, middle ## A full toy example --- class: normal, middle ## JIT .pull-left[ - In torch 0.2.0 we added initial support to JIT ('Just in time compile') torch programs to TorchScript. - Currently we only support *tracing* R functions. When `tracing` we invoke an R function with example inputs and record all operations that occurred when the function was run. ] .pull-right[ ```r w <- torch_randn(10, 1) b <- torch_randn(1) fn <- function(x) { a <- torch_mm(x, w) a + b } fn(torch_ones(2, 10)) ``` ``` ## torch_tensor ## 1.4917 ## 1.4917 ## [ CPUFloatType{2,1} ] ``` ] --- class: normal, middle ## JIT Now we use the `jit_trace` function to compile this R function into TorchScript: ```r x <- torch_ones(2, 10) tr_fn <- jit_trace(fn, x) tr_fn(x) ``` ``` ## torch_tensor ## 1.4917 ## 1.4917 ## [ CPUFloatType{2,1} ] ``` ```r tr_fn$graph ``` ``` ## graph(%0 : Float(2:10, 10:1, requires_grad=0, device=cpu)): ## %1 : Float(10:1, 1:1, requires_grad=0, device=cpu) = prim::Constant[value= 0.4607 -1.0476 1.0510 0.1441 1.4061 -0.9854 -0.3466 -0.1256 0.0654 1.9389 [ CPUFloatType{10,1} ]]() ## %2 : Float(2:1, 1:1, requires_grad=0, device=cpu) = aten::mm(%0, %1) ## %3 : Float(1:1, requires_grad=0, device=cpu) = prim::Constant[value={-1.06925}]() ## %4 : int = prim::Constant[value=1]() ## %5 : Float(2:1, 1:1, requires_grad=0, device=cpu) = aten::add(%2, %3, %4) ## return (%5) ``` --- class: normal, middle ## JIT .pull-left[ - We can now save the traced function to the disk with the `jit_save` function. - Then we can reload the saved function in Python just like any TorchScript program. - It's also [possible to use](https://community.rstudio.com/t/r-model-serving-using-python-torchserve/94303/2?u=dfalbel) [torchserve](https://github.com/pytorch/serve) for high performance environments. ] .pull-right[ ```r jit_save(tr_fn, "linear.pt") ``` ```python import torch fn = torch.jit.load("linear.pt") fn(torch.ones(2, 10)) ``` ``` ## tensor([[1.4917], ## [1.4917]]) ``` ] --- class: normal, middle ## Future work - Currently only functions can be traced from R. You make trace `nn_modules` using some kind of [hack](https://github.com/mlverse/torch/blob/master/tests/testthat/test-trace.R#L27-L70). - We will support tracing `nn_modules` in the future to enable speedups in training as well as easily serializing your model for deployment. --- class: normal, middle ## Contributing --- class: normal, middle ## Contributing .pull-left[ - No matter what are your current skills you can contribute to torch development. - We have a few [open issues](https://github.com/mlverse/torch). Feel free to comment if you want to fix that issue! I will help as much as I can :) - If you think the documentation is not clear or some details are missing, please open an issue! This helps a lot! ] .pull-right[ - Also open issues for bug reports, feature-requests and/or questions. - If you are planning to add a new feature and don't know how to start, open an issue and we can discuss how to do it. - You can also contribute extensions to torch! ] --- class: normal, middle ## Extensions and support for torch .pull-left[ - [torchvision](https://github.com/mlverse/torchvision) is an extension package for computer vision tasks. It implements many different trasnformations for image data, datasets and pre-trained models. - [torchaudio](https://github.com/curso-r/torchaudio) is an extension package for audio related tasks. It's developed by @athospd and already supports many functions for audio data transformation, datasets and models. - The [targets](https://github.com/wlandau/targets) package already support serializing torch models. targets is a Function-oriented Make-like declarative workflows for R. ] .pull-right[ - We have benn working on [torchdatasets](https://github.com/mlverse/torchdatasets) package with other datasets that can be useful for examples and etc, but do not fit in other specific packages. - There's also the [tabnet](https://github.com/mlverse/tabnet) package that implements the TabNet model with a tidymodels like interface. - [lantern](https://github.com/tidymodels/lantern): a tidymodels interface for fitting multilayer perceptrons and linear models. - Your idea? ] --- class: normal, middle ## Future work .pull-left[ * Better interop with PyTorch. Interchanging tensors between both languages with zero-cost. * Improve performance. Specially the performance of the dispatcher. ] .pull-right[ * Better support for JIT tracing `nn_modules`. * Support for ONNX. Similarly to the JIT, we should be able to trace models and export to the ONNX format. ] --- class: normal, middle ## Learn more - [Torch for R website](https://torch.mlverse.org) includes many tutorials and links to blog posts. - [RStudio AI blog](https://blogs.rstudio.com/ai/) contain many end-to-end examples. This is also where to get news about torch. - [Torch book](https://mlverse.github.io/torchbook) a work-in-progress about deep learning with torch. - The [documentation website](https://torch.mlverse.org/docs/) has guides for serialization, indexing and etc. --- class: middle, center, inverse ## Thanks very much! 