Working with R¶
Using omega-ml in R¶
omega-ml works with R through the reticulate package, which enables R users to have the same API and fidelity to omega-ml as in Python.
# load omegaml
library(reticulate)
om <- import("omegaml")
# try a few things
$om$datasets$list()
$om$models$list()
# store
data(mtcars)
om$datasets$put(mtcars, 'mtcars')
=> Metadata(name=mtcars,bucket=omegaml,prefix=data/,kind=pandas.dfrows,created=2022-02-23 18:21:09.250569)
# retrieve
om$datasets$get('mtcars')
...
# create a model
library(caret)
model <- train(mpg ~ wt,
data = mtcars,
method = "lm")
predict(model, mtcars)
om$models$put('r$model', 'mtcars-model')
om$datasets$put(mtcars, 'mtcars')
# predict from runtime
result <- om$runtime$model('mtcars-model')$predict('mtcars')
result$get()
Specifics for omega-ml in R¶
While all omega-ml functionality is available to R users and in principle works the same as in Python there a few specifics that you should be aware of.
Storing R objects as datasets¶
The following R objects can be stored in om$datasets
. The data
conversion is done according to reticulate’s type conversion:
list, vector => will be stored as Python list, dict
matrix/array => will be stored as Python numpy arrays
data.frame => will be stored as Python’s pandas.DataFrame
Getting help¶
To retrieve a description on any stored object, use the $help
method
on any of the stores:
cat(om$models$help('mtcars-model'))
=>
Python Library Documentation: RModelBackend in module omegaml.backends.rsystem.rmodels object
class RModelBackend(omegaml.backends.basemodel.BaseModelBackend)
| RModelBackend(model_store=None, data_store=None, tracking=None, **kwargs)
|
| Method resolution order:
| RModelBackend
| omegaml.backends.basemodel.BaseModelBackend
| omegaml.backends.basecommon.BackendBaseCommon
| builtins.object
|
| Methods defined here:
|
| predict(self, modelname, Xname, rName=None, pure_python=True, **kwargs)
| predict using data stored in Xname
|
| :param modelname: the name of the model object
| :param Xname: the name of the X data set
| :param rName: the name of the result data object or None
| :param pure_python: if True return a python object. If False return
| a dataframe. Defaults to True to support any client.
| :param kwargs: kwargs passed to the model's predict method
| :return: return the predicted outcome
|
To retrieve help on any Python object, use the om$help()
function.
This is the equivalent to reticulate’s help()
, however also works
on custom objects.
library(reticulate)
om <- import("omegaml")
# enable help
om <- om$setup()
cat(om$help(om))
=>
Python Library Documentation: Omega in module omegaml.omega object
class Omega(omegaml.store.combined.CombinedOmegaStoreMixin)
| Omega(defaults=None, mongo_url=None, celeryconf=None, bucket=None, **kwargs)
|
| Client API to omegaml
|
| Provides the following APIs:
|
| * :code:`datasets` - access to datasets stored in the cluster
| * :code:`models` - access to models stored in the cluster
| * :code:`runtimes` - access to the cluster compute resources
| * :code:`jobs` - access to jobs stored and executed in the cluster
| * :code:`scripts` - access to lambda modules stored and executed in the cluster
...
Storing and retrieving R models¶
R models are serialized by saveRDS
and readRDS
. To this end,
the models cannot be passed to om$models.put
directly. Instead you
must specify the R variable that holds the model:
library(caret)
model <- train(...)
om$models$put('r$model', 'mtcars-model')
=> Metadata(name=mtcars-model,bucket=omegaml,prefix=models/,kind=model.r,created=2021-11-23 18:40:14.055000)
Similarly, when retrieving a model, it is returned as a Python object. This
enables the Python part of omega-ml to transparently interact with the model.
To retrieve the R model itself, use the rmodel()
function:
rmodel(om$models$get('mtcars-model'))
Linear Regression
32 samples
1 predictor
...
Using the runtime to fit R models¶
The only method supported by fitted R models stored in omega-ml is model$predict()
.
To fit models using the omega-ml runtime, you should write a job (notebook) or
a script.
Processing large datasets¶
Using MDataFrame.transform
omega-ml can process datasets that are
larger than memory in chunks, applying a function to each chunk. In Python,
this processing can be done in parallel. In R, this can only be done in
sequence (note the n_jobs=1L
, specifying sequential processing).
mdf <- om$datasets$getl('r-dataframe')
convert <- function(df, n) {
df
}
mdf$transform(convert, n_jobs=1L)$persist('foo', store=om$datasets)
Submitting jobs in parallel¶
It is not currently supported to submit R notebooks for parallel processing
via om$runtime$job()$map()
. However, it is possible to submit
R notebooks in parallel using the runtime’s parallel()
and mapreduce()
.
Consider this example notebook, stored as ‘r-parallel’:
# r-parallel.ipynb
print("hello from R")
Let’s run this in parallel. The result is a list of the Metadata
entries created for the resulting notebooks.
with(om$runtime$parallel() %as% crt, {
crt$job('r-parallel')$run()
crt$job('r-parallel')$run()
crt$job('r-parallel')$run()
result <- crt$run()
})
result$get()
=>
'<Metadata: Metadata(name=results/r-parallel_2022-02-24 17:24:15.230259.ipynb,bucket=omegaml,prefix=jobs/,kind=script.ipynb,created=2022-02-24 17:24:16.519483)>'
'<Metadata: Metadata(name=results/r-parallel_2022-02-24 17:24:16.610554.ipynb,bucket=omegaml,prefix=jobs/,kind=script.ipynb,created=2022-02-24 17:24:17.856297)>'
'<Metadata: Metadata(name=results/r-parallel_2022-02-24 17:24:17.933696.ipynb,bucket=omegaml,prefix=jobs/,kind=script.ipynb,created=2022-02-24 17:24:19.096552)>'
Running a worker for R models and scripts¶
omega-ml workers are distributed processes that wait for commands, such as fit model M with data X, Y or predict from data X. Once a command is received, the worker retrieves these objects (M, X, Y) using the Metadata and then executes the requested command.
To setup the omega-ml runtime for R, use the following command. This will start an R session that runs the omega-ml worker. This works the same as the omega-ml worker in Python, except that it is enabled to process R models and datasets.
$ om runtime celery rworker
In order to dedicate R and Python workers, e.g. on different VMs, specify
the worker label using the CELERY_Q
envvar. Then use
om$runtime$require()
to specify this label when issuing runtime
tasks.
# start R worker
$ CELERY_Q=default:R om runtime celery rworker
# in R
om$runtime$require('default:R')$ping()
=>
$message 'ping return message'$time'2022-02-24T10:46:58.642168'
...