Working with R

Using omega-ml in R

omega-ml works with R through the reticulate package, which enables R users to have the same API and fidelity to omega-ml as in Python.

# load omegaml
om <- import("omegaml")

# try a few things

# store
om$datasets$put(mtcars, 'mtcars')
=> Metadata(name=mtcars,bucket=omegaml,prefix=data/,kind=pandas.dfrows,created=2022-02-23 18:21:09.250569)

# retrieve

# create a model
model <- train(mpg ~ wt,
               data = mtcars,
              method = "lm")
predict(model, mtcars)
om$models$put('r$model', 'mtcars-model')
om$datasets$put(mtcars, 'mtcars')

# predict from runtime
result <- om$runtime$model('mtcars-model')$predict('mtcars')

Specifics for omega-ml in R

While all omega-ml functionality is available to R users and in principle works the same as in Python there a few specifics that you should be aware of.

Storing R objects as datasets

The following R objects can be stored in om$datasets. The data conversion is done according to reticulate’s type conversion:

  • list, vector => will be stored as Python list, dict

  • matrix/array => will be stored as Python numpy arrays

  • data.frame => will be stored as Python’s pandas.DataFrame

Getting help

To retrieve a description on any stored object, use the $help method on any of the stores:

Python Library Documentation: RModelBackend in module omegaml.backends.rsystem.rmodels object

class RModelBackend(omegaml.backends.basemodel.BaseModelBackend)
 |  RModelBackend(model_store=None, data_store=None, tracking=None, **kwargs)
 |  Method resolution order:
 |      RModelBackend
 |      omegaml.backends.basemodel.BaseModelBackend
 |      omegaml.backends.basecommon.BackendBaseCommon
 |      builtins.object
 |  Methods defined here:
 |  predict(self, modelname, Xname, rName=None, pure_python=True, **kwargs)
 |      predict using data stored in Xname
 |      :param modelname: the name of the model object
 |      :param Xname: the name of the X data set
 |      :param rName: the name of the result data object or None
 |      :param pure_python: if True return a python object. If False return
 |          a dataframe. Defaults to True to support any client.
 |      :param kwargs: kwargs passed to the model's predict method
 |      :return: return the predicted outcome

To retrieve help on any Python object, use the om$help() function. This is the equivalent to reticulate’s help(), however also works on custom objects.

om <- import("omegaml")

# enable help
om <- om$setup()
Python Library Documentation: Omega in module object

class Omega(
 |  Omega(defaults=None, mongo_url=None, celeryconf=None, bucket=None, **kwargs)
 |  Client API to omegaml
 |  Provides the following APIs:
 |  * :code:`datasets` - access to datasets stored in the cluster
 |  * :code:`models` - access to models stored in the cluster
 |  * :code:`runtimes` - access to the cluster compute resources
 |  * :code:`jobs` - access to jobs stored and executed in the cluster
 |  * :code:`scripts` - access to lambda modules stored and executed in the cluster

Storing and retrieving R models

R models are serialized by saveRDS and readRDS. To this end, the models cannot be passed to om$models.put directly. Instead you must specify the R variable that holds the model:


model <- train(...)
om$models$put('r$model', 'mtcars-model')
=> Metadata(name=mtcars-model,bucket=omegaml,prefix=models/,kind=model.r,created=2021-11-23 18:40:14.055000)

Similarly, when retrieving a model, it is returned as a Python object. This enables the Python part of omega-ml to transparently interact with the model. To retrieve the R model itself, use the rmodel() function:

Linear Regression

32 samples
 1 predictor

Using the runtime to fit R models

The only method supported by fitted R models stored in omega-ml is model$predict(). To fit models using the omega-ml runtime, you should write a job (notebook) or a script.

Processing large datasets

Using MDataFrame.transform omega-ml can process datasets that are larger than memory in chunks, applying a function to each chunk. In Python, this processing can be done in parallel. In R, this can only be done in sequence (note the n_jobs=1L, specifying sequential processing).

mdf <- om$datasets$getl('r-dataframe')
convert <- function(df, n) {

mdf$transform(convert, n_jobs=1L)$persist('foo', store=om$datasets)

Submitting jobs in parallel

It is not currently supported to submit R notebooks for parallel processing via om$runtime$job()$map(). However, it is possible to submit R notebooks in parallel using the runtime’s parallel() and mapreduce().

Consider this example notebook, stored as ‘r-parallel’:

# r-parallel.ipynb
print("hello from R")

Let’s run this in parallel. The result is a list of the Metadata entries created for the resulting notebooks.

with(om$runtime$parallel() %as% crt, {
    result <- crt$run()

'<Metadata: Metadata(name=results/r-parallel_2022-02-24 17:24:15.230259.ipynb,bucket=omegaml,prefix=jobs/,kind=script.ipynb,created=2022-02-24 17:24:16.519483)>'
'<Metadata: Metadata(name=results/r-parallel_2022-02-24 17:24:16.610554.ipynb,bucket=omegaml,prefix=jobs/,kind=script.ipynb,created=2022-02-24 17:24:17.856297)>'
'<Metadata: Metadata(name=results/r-parallel_2022-02-24 17:24:17.933696.ipynb,bucket=omegaml,prefix=jobs/,kind=script.ipynb,created=2022-02-24 17:24:19.096552)>'

Running a worker for R models and scripts

omega-ml workers are distributed processes that wait for commands, such as fit model M with data X, Y or predict from data X. Once a command is received, the worker retrieves these objects (M, X, Y) using the Metadata and then executes the requested command.

To setup the omega-ml runtime for R, use the following command. This will start an R session that runs the omega-ml worker. This works the same as the omega-ml worker in Python, except that it is enabled to process R models and datasets.

$ om runtime celery rworker

In order to dedicate R and Python workers, e.g. on different VMs, specify the worker label using the CELERY_Q envvar. Then use om$runtime$require() to specify this label when issuing runtime tasks.

# start R worker
$ CELERY_Q=default:R om runtime celery rworker

# in R
$message 'ping return message'$time'2022-02-24T10:46:58.642168'