Developer API¶

omega|ml¶

class omegaml.Omega(defaults=None, mongo_url=None, celeryconf=None, bucket=None, **kwargs)[source]¶

Client API to omegaml

Provides the following APIs:

datasets - access to datasets stored in the cluster
models - access to models stored in the cluster
runtimes - access to the cluster compute resources
jobs - access to jobs stored and executed in the cluster
scripts - access to lambda modules stored and executed in the cluster

status(check=None, data=False, by_status=False, wait=False)[source]¶

get the status of the omegaml cluster

Parameters:

check (str) – the check to run, e.g. ‘storage’, ‘runtime’
data (bool) – return the monitoring data for the check
by_status (bool) – return data by status
wait (bool) – wait for the check to complete

Returns:

dict

omegaml.store¶

Native storage for OmegaML using mongodb as the storage layer

An OmegaStore instance is a MongoDB database. It has at least the metadata collection which lists all objects stored in it. A metadata document refers to the following types of objects (metadata.kind):

pandas.dfrows - a Pandas DataFrame stored as a collection of rows
sklearn.joblib - a scikit learn estimator/pipline dumped using joblib.dump()
python.data - an arbitrary python dict, tuple, list stored as a document

Note that storing Pandas and scikit learn objects requires the availability of the respective packages. If either can not be imported, the OmegaStore degrades to a python.data store only. It will still .list() and get() any object, however reverts to pure python objects. In this case it is up to the client to convert the data into an appropriate format for processing.

Pandas and scikit-learn objects can only be stored if these packages are availables. put() raises a TypeError if you pass such objects and these modules cannot be loaded.

All data are stored within the same mongodb, in per-object collections as follows:

.metadata
all metadata. each object is one document, See omegaml.documents.Metadata for details

.<bucket>.files
this is the GridFS instance used to store blobs (models, numpy, hdf). The actual file name will be <prefix>/<name>.<ext>, where ext is optionally generated by put() / get().

.<bucket>.<prefix>.<name>.data
every other dataset is stored in a separate collection (dataframes, dicts, lists, tuples). Any forward slash in prefix is ignored (e.g. ‘data/’ becomes ‘data’)

DataFrames by default are stored in their own collection, every row becomes a document. To store dataframes as a binary file, use put(…., as_hdf=True). .get() will always return a dataframe.

Python dicts, lists, tuples are stored as a single document with a .data attribute holding the JSON-converted representation. .get() will always return the corresponding python object of .data.

Models are joblib.dump()’ed and ziped prior to transferring into GridFs. .get() will always unzip and joblib.load() before returning the model. Note this requires that the process using .get() supports joblib as well as all python classes referred to. If joblib is not supported, .get() returns a file-like object.

The .metadata entry specifies the format used to store each object as well as it’s location:

metadata.kind
the type of object

metadata.name
the name of the object, as given on put()

metadata.gridfile
the gridfs object (if any, null otherwise)

metadata.collection
the name of the collection

metadata.attributes
arbitrary custom attributes set in put(attributes=obj). This is used e.g. by OmegaRuntime’s fit() method to record the data used in the model’s training.

.put() and .get() use helper methods specific to the type in object’s type and metadata.kind, respectively. In the future a plugin system will enable extension to other types.

class omegaml.store.base.OmegaStore(mongo_url=None, bucket=None, prefix=None, kind=None, defaults=None, dbalias=None)[source]¶: The storage backend for models and data

omegaml.backends¶

class omegaml.backends.basedata.BaseDataBackend(model_store=None, data_store=None, tracking=None, **kwargs)[source]¶

OmegaML BaseDataBackend to be subclassed by other arbitrary backends

This provides the abstract interface for any data backend to be implemented

class omegaml.backends.basemodel.BaseModelBackend(model_store=None, data_store=None, tracking=None, **kwargs)[source]¶

OmegaML BaseModelBackend to be subclassed by other arbitrary backends

This provides the abstract interface for any model backend to be implemented Subclass to implement custom backends.

Essentially a model backend:

provides methods to serialize and deserialize a machine learning model for a given ML framework

offers fit() and predict() methods to be called by the runtime

offers additional methods such as score(), partial_fit(), transform()

Model backends are the middleware that connects the om.models API to specific frameworks. This class makes it simple to implement a model backend by offering a common syntax as well as a default implementation for get() and put().

Methods to implement:

# for model serialization (mandatory) @classmethod supports() - determine if backend supports given model instance _package_model() - serialize a model instance into a temporary file _extract_model() - deserialize the model from a file-like

By default BaseModelBackend uses joblib.dumps/loads to store the model as serialized Python objects. If this is not sufficient or applicable to your type models, override these methods.

Both methods provide readily set up temporary file names so that all you have to do is actually save the model to the given output file and restore the model from the given input file, respectively. All other logic has already been implemented (see get_model and put_model methods).

# for fitting and predicting (mandatory) fit() predict()

# other methods (optional) fit_transform() - fit and return a transformed dataset partial_fit() - fit incrementally predict_proba() - predict probabilities score() - score fitted classifier vv test dataset

class omegaml.documents.Metadata(**kwargs)[source]¶: Metadata stores information about objects in OmegaStore

omegaml.mixins¶

class omegaml.mixins.store.ProjectedMixin[source]¶: A OmegaStore mixin to process column specifications in dataset name

class omegaml.mixins.mdf.FilterOpsMixin[source]¶: filter operators on MSeries

class omegaml.mixins.mdf.ApplyMixin(*args, **kwargs)[source]¶

Implements the apply() mixin supporting arbitrary functions to build aggregation pipelines

Note that .apply() does not execute immediately. Instead it builds an aggregation pipeline that is executed on MDataFrame.value. Note that .apply() calls cannot be cascaded yet, i.e. a later .apply() will override a previous.apply().

See ApplyContext for usage examples.

class omegaml.mixins.mdf.ApplyArithmetics[source]¶

Math operators for ApplyContext

__mul__ (*)
__add__ (+)
__sub__ (-)
__div__ (/)
__floordiv__ (//)
__mod__ (%)
__pow__ (pow)
__ceil__ (ceil)
__floor__ (floor)
__trunc__ (trunc)
__abs__ (abs)
sqrt (math.sqrt)

__mul__(other)¶: multiply

class omegaml.mixins.mdf.ApplyDateTime[source]¶: Datetime operators for ApplyContext

class omegaml.mixins.mdf.ApplyString[source]¶: String operators

class omegaml.mixins.mdf.ApplyAccumulators[source]¶

omegaml.runtimes¶

class omegaml.runtimes.OmegaRuntime(omega, bucket=None, defaults=None, celeryconf=None)[source]¶: omegaml compute cluster gateway

class omegaml.runtimes.OmegaModelProxy(modelname, runtime=None)[source]¶

proxy to a remote model in a celery worker

The proxy provides the same methods as the model but will execute the methods using celery tasks and return celery AsyncResult objects

Usage:

om = Omega()
# train a model
# result is AsyncResult, use .get() to return it's result
result = om.runtime.model('foo').fit('datax', 'datay')
result.get()

# predict
result = om.runtime.model('foo').predict('datax')
# result is AsyncResult, use .get() to return it's result
print result.get()

Notes

The actual methods of ModelProxy are defined in its mixins

omegaml.documents¶

class omegaml.documents.Metadata(**kwargs)[source]¶: Metadata stores information about objects in OmegaStore

omegaml.jobs¶

omegajobs¶

class omegaml.notebook.omegacontentsmgr.OmegaStoreContentsManager(**kwargs: Any)[source]¶

Jupyter notebook storage manager for omegaml

Adopted from notebook/services/contents/filemanager.py

This requires a properly configured omegaml instance. see https://jupyter-server.readthedocs.io/en/latest/developers/contents.html