Working with R ============== .. contents:: Using omega-ml in R ------------------- omega-ml works with R through the reticulate package, which enables R users to have the same API and fidelity to omega-ml as in Python. .. code:: r # load omegaml library(reticulate) om <- import("omegaml") # try a few things $om$datasets$list() $om$models$list() # store data(mtcars) om$datasets$put(mtcars, 'mtcars') => Metadata(name=mtcars,bucket=omegaml,prefix=data/,kind=pandas.dfrows,created=2022-02-23 18:21:09.250569) # retrieve om$datasets$get('mtcars') ... # create a model library(caret) model <- train(mpg ~ wt, data = mtcars, method = "lm") predict(model, mtcars) om$models$put('r$model', 'mtcars-model') om$datasets$put(mtcars, 'mtcars') # predict from runtime result <- om$runtime$model('mtcars-model')$predict('mtcars') result$get() Specifics for omega-ml in R --------------------------- While all omega-ml functionality is available to R users and in principle works the same as in Python there a few specifics that you should be aware of. Storing R objects as datasets +++++++++++++++++++++++++++++ .. _reticulate's type conversion: https://rstudio.github.io/reticulate/index.html#type-conversions The following R objects can be stored in :code:`om$datasets`. The data conversion is done according to `reticulate's type conversion`_: * list, vector => will be stored as Python list, dict * matrix/array => will be stored as Python numpy arrays * data.frame => will be stored as Python's pandas.DataFrame Getting help ++++++++++++ To retrieve a description on any stored object, use the :code:`$help` method on any of the stores: .. code:: r cat(om$models$help('mtcars-model')) => Python Library Documentation: RModelBackend in module omegaml.backends.rsystem.rmodels object class RModelBackend(omegaml.backends.basemodel.BaseModelBackend) | RModelBackend(model_store=None, data_store=None, tracking=None, **kwargs) | | Method resolution order: | RModelBackend | omegaml.backends.basemodel.BaseModelBackend | omegaml.backends.basecommon.BackendBaseCommon | builtins.object | | Methods defined here: | | predict(self, modelname, Xname, rName=None, pure_python=True, **kwargs) | predict using data stored in Xname | | :param modelname: the name of the model object | :param Xname: the name of the X data set | :param rName: the name of the result data object or None | :param pure_python: if True return a python object. If False return | a dataframe. Defaults to True to support any client. | :param kwargs: kwargs passed to the model's predict method | :return: return the predicted outcome | To retrieve help on any Python object, use the :code:`om$help()` function. This is the equivalent to reticulate's :code:`help()`, however also works on custom objects. .. code:: library(reticulate) om <- import("omegaml") # enable help om <- om$setup() cat(om$help(om)) => Python Library Documentation: Omega in module omegaml.omega object class Omega(omegaml.store.combined.CombinedOmegaStoreMixin) | Omega(defaults=None, mongo_url=None, celeryconf=None, bucket=None, **kwargs) | | Client API to omegaml | | Provides the following APIs: | | * :code:`datasets` - access to datasets stored in the cluster | * :code:`models` - access to models stored in the cluster | * :code:`runtimes` - access to the cluster compute resources | * :code:`jobs` - access to jobs stored and executed in the cluster | * :code:`scripts` - access to lambda modules stored and executed in the cluster ... Storing and retrieving R models ------------------------------- R models are serialized by :code:`saveRDS` and :code:`readRDS`. To this end, the models cannot be passed to :code:`om$models.put` directly. Instead you must specify the R variable that holds the model: .. code:: r library(caret) model <- train(...) om$models$put('r$model', 'mtcars-model') => Metadata(name=mtcars-model,bucket=omegaml,prefix=models/,kind=model.r,created=2021-11-23 18:40:14.055000) Similarly, when retrieving a model, it is returned as a Python object. This enables the Python part of omega-ml to transparently interact with the model. To retrieve the R model itself, use the :code:`rmodel()` function: .. code:: r rmodel(om$models$get('mtcars-model')) Linear Regression 32 samples 1 predictor ... Using the runtime to fit R models --------------------------------- The only method supported by fitted R models stored in omega-ml is :code:`model$predict()`. To fit models using the omega-ml runtime, you should write a job (notebook) or a script. Processing large datasets ------------------------- Using :code:`MDataFrame.transform` omega-ml can process datasets that are larger than memory in chunks, applying a function to each chunk. In Python, this processing can be done in parallel. In R, this can only be done in sequence (note the :code:`n_jobs=1L`, specifying sequential processing). .. code:: r mdf <- om$datasets$getl('r-dataframe') convert <- function(df, n) { df } mdf$transform(convert, n_jobs=1L)$persist('foo', store=om$datasets) Submitting jobs in parallel --------------------------- It is not currently supported to submit R notebooks for parallel processing via :code:`om$runtime$job()$map()`. However, it is possible to submit R notebooks in parallel using the runtime's :code:`parallel()` and :code:`mapreduce()`. Consider this example notebook, stored as 'r-parallel': .. code:: r # r-parallel.ipynb print("hello from R") Let's run this in parallel. The result is a list of the :code:`Metadata` entries created for the resulting notebooks. .. code:: r with(om$runtime$parallel() %as% crt, { crt$job('r-parallel')$run() crt$job('r-parallel')$run() crt$job('r-parallel')$run() result <- crt$run() }) result$get() => '' '' '' Running a worker for R models and scripts ----------------------------------------- omega-ml workers are distributed processes that wait for commands, such as *fit model M with data X, Y* or *predict from data X*. Once a command is received, the worker retrieves these objects (M, X, Y) using the *Metadata* and then executes the requested command. To setup the omega-ml runtime for R, use the following command. This will start an R session that runs the omega-ml worker. This works the same as the omega-ml worker in Python, except that it is enabled to process R models and datasets. .. code:: bash $ om runtime celery rworker In order to dedicate R and Python workers, e.g. on different VMs, specify the worker label using the :code:`CELERY_Q` envvar. Then use :code:`om$runtime$require()` to specify this label when issuing runtime tasks. .. code:: bash # start R worker $ CELERY_Q=default:R om runtime celery rworker # in R om$runtime$require('default:R')$ping() => $message 'ping return message'$time'2022-02-24T10:46:58.642168' ...