core models

class, bucket=None, prefix=None, kind=None, defaults=None, dbalias=None)

The storage backend for models and data

collection(name=None, bucket=None, prefix=None)

Returns a mongo db collection as a datastore

If there is an existing object of name, will return the .collection of the object. Otherwise returns the collection according to naming convention {bucket}.{prefix}.{name}.datastore


name – the collection to use. if none defaults to the collection name given on instantiation. the actual collection name used is always prefix + name + ‘.data’

drop(name, force=False, version=-1, report=False, **kwargs)

Drop the object

  • name – The name of the object. If the name is a pattern it will be expanded using .list(), and call .drop() on every obj found.

  • force – If True ignores DoesNotExist exception, defaults to False meaning this raises a DoesNotExist exception if the name does not exist

  • report – if True returns a dict name=>status, where status is True if deleted, False if not deleted


True if object was deleted, False if not. If force is True and the object does not exist it will still return True


DoesNotExist if the object does not exist and `force=False`

exists(name, hidden=False)

check if object exists

  • name (str) – name of object

  • hidden (bool) – if True, include hidden files, defaults to False, unless name starts with ‘.’


bool, True if object exists

Changed in version 0.16.4: hidden defaults to True if name starts with ‘.’

property fs

Retrieve a gridfs instance using url and collection provided


a gridfs instance

get(name, version=-1, force_python=False, kind=None, **kwargs)

Retrieve an object

  • name – The name of the object

  • version – Version of the stored object (not supported)

  • force_python – Return as a python object

  • kwargs – kwargs depending on object kind


an object, estimator, pipelines, data array or pandas dataframe previously stored with put()

get_backend(name, model_store=None, data_store=None, **kwargs)

return the backend by a given object name

  • kind – The object kind

  • model_store – the OmegaStore instance used to store models

  • data_store – the OmegaStore instance used to store data

  • kwargs – the kwargs passed to the backend initialization


the backend

get_backend_bykind(kind, model_store=None, data_store=None, **kwargs)

return the backend by a given object kind

  • kind – The object kind

  • model_store – the OmegaStore instance used to store models

  • data_store – the OmegaStore instance used to store data

  • kwargs – the kwargs passed to the backend initialization


the backend

get_backend_byobj(obj, name=None, kind=None, attributes=None, model_store=None, data_store=None, **kwargs)

return the matching backend for the given obj


the first backend that supports the given parameters or None

get_dataframe_dfgroup(name, version=-1, sanitize=True, kwargs=None)

Return a grouped dataframe

  • name – the name of the object

  • version – not supported

  • kwargs – mongo db query arguments to be passed to collection.find() as a filter.

  • sanitize – remove any $op operators in kwargs

get_dataframe_documents(name, columns=None, lazy=False, filter=None, version=-1, is_series=False, chunksize=None, sanitize=True, trusted=None, **kwargs)

Internal method to return DataFrame from documents

  • name – the name of the object (str)

  • columns – the column projection as a list of column names

  • lazy – if True returns a lazy representation as an MDataFrame. If False retrieves all data and returns a DataFrame (default)

  • filter – the filter to be applied as a column__op=value dict

  • sanitize – sanitize filter by removing all $op filter keys, defaults to True. Specify False to allow $op filter keys. $where is always removed as it is considered unsafe.

  • version – the version to retrieve (not supported)

  • is_series – if True retruns a Series instead of a DataFrame

  • kwargs – remaining kwargs are used a filter. The filter kwarg overrides other kwargs.


the retrieved object (DataFrame, Series or MDataFrame)

get_dataframe_hdf(name, version=-1)

Retrieve dataframe from hdf

  • name – The name of object

  • version – The version of object (not supported)


Returns a python pandas dataframe



get_object_as_python(meta, version=-1)

Retrieve object as python object

  • meta – The metadata object

  • version – The version of the object


Returns data as python object

get_python_data(name, filter=None, version=-1, lazy=False, trusted=False, **kwargs)

Retrieve objects as python data

  • name – The name of object

  • version – The version of object


Returns the object as python list object

getl(*args, **kwargs)

return a lazy MDataFrame for a given object

Same as .get, but returns a MDataFrame

help(name_or_obj=None, kind=None, raw=False, display=None, renderer=None)

get help for an object by looking up its backend and calling help() on it

Retrieves the object’s metadata and looks up its corresponding backend. If the metadata.attributes[‘docs’] is a string it will display this as the help() contents. If the string starts with ‘http://’ or ‘https://’ it will open the web page.

  • name_or_obj (str|obj) – the name or actual object to get help for

  • kind (str) – optional, if specified forces retrieval of backend for the given kind

  • raw (bool) – optional, if True forces help to be the backend type of the object. If False returns the attributes[docs] on the object’s metadata, if available. Defaults to False

  • display (fn) – optional, callable for interactive display, defaults to help in if sys.flags.interactive is True, else uses pydoc.render_doc with plaintext

  • renderer (fn) – optional, the renderer= argument for pydoc.render_doc to use if sys.flags.interactive is False and display is not provided


  • help(obj) if python is in interactive mode

  • text(str) if python is in not interactive mode

list(pattern=None, regexp=None, kind=None, raw=False, hidden=None, include_temp=False, bucket=None, prefix=None, filter=None)

List all files in store

specify pattern as a unix pattern (e.g. models/*, or specify regexp)

  • pattern – the unix file pattern or None for all

  • regexp – the regexp. takes precedence over pattern

  • raw – if True return the meta data objects

  • filter – specify additional filter criteria, optional


List of files in store

make_metadata(name, kind, bucket=None, prefix=None, **kwargs)

create or update a metadata object

this retrieves a Metadata object if it exists given the kwargs. Only the name, prefix and bucket arguments are considered

for existing Metadata objects, the attributes kw is treated as follows:

  • attributes=None, the existing attributes are left as is

  • attributes={}, the attributes value on an existing metadata object is reset to the empty dict

  • attributes={ some : value }, the existing attributes are updated

For new metadata objects, attributes defaults to {} if not specified, else is set as provided.

  • name – the object name

  • bucket – the bucket, optional, defaults to self.bucket

  • prefix – the prefix, optional, defaults to self.prefix

metadata(name=None, bucket=None, prefix=None, version=-1, **kwargs)

Returns a metadata document for the given entry name

property mongodb

Returns a mongo database object

object_store_key(name, ext, hashed=None)

Returns the store key

Unless you write a mixin or a backend you should not use this method

  • name – The name of object

  • ext – The extension of the filename

  • hashed – hash the key to support arbitrary name length, defaults to defaults.OMEGA_STORE_HASHEDNAMES, True by default since 0.13.7


A filename with relative bucket, prefix and name

put(obj, name, attributes=None, kind=None, replace=False, **kwargs)

Stores an object, store estimators, pipelines, numpy arrays or pandas dataframes

put_dataframe_as_dfgroup(obj, name, groupby, attributes=None)

store a dataframe grouped by columns in a mongo document


> # each group > { > #group keys > key: val, > _data: [ > # only data keys > { key: val, … } > ]}

put_dataframe_as_documents(obj, name, append=None, attributes=None, index=None, timestamp=None, chunksize=None, ensure_compat=True, _fast_insert=<function fast_insert>, **kwargs)

store a dataframe as a row-wise collection of documents

  • obj – the dataframe to store

  • name – the name of the item in the store

  • append – if False collection will be dropped before inserting, if True existing documents will persist. Defaults to True. If not specified and rows have been previously inserted, will issue a warning.

  • index – list of columns, using +, -, @ as a column prefix to specify ASCENDING, DESCENDING, GEOSPHERE respectively. For @ the column has to represent a valid GeoJSON object.

  • timestamp – if True or a field name adds a timestamp. If the value is a boolean or datetime, uses _created as the field name. The timestamp is always datetime.datetime.utcnow(). May be overriden by specifying the tuple (col, datetime).

  • ensure_compat – if True attempt to convert obj to mongodb compatibility, set to False only if you are sure to have only compatible values in dataframe. defaults to True. False may reduce memory and increase speed on large dataframes.


the Metadata object created

put_ndarray_as_hdf(obj, name, attributes=None)

store numpy array as hdf

this is hack, converting the array to a dataframe then storing it

put_pyobj_as_document(obj, name, attributes=None, append=True, index=None, as_many=None, **kwargs)

store a dict as a document

similar to put_dataframe_as_documents no data will be replaced by default. that is, obj is appended as new documents into the objects’ mongo collection. to replace the data, specify append=False.

put_pyobj_as_hdf(obj, name, attributes=None)

store list, tuple, dict as hdf

this requires the list, tuple or dict to be convertible into a dataframe

rebuild_params(kwargs, collection)

Returns a modified set of parameters for querying mongodb based on how the mongo document is structured and the fields the document is grouped by.

Note: Explicitly to be used with get_grouped_data only

  • kwargs – Mongo filter arguments

  • collection – The name of mongodb collection


Returns a set of parameters as dictionary.

register_backend(kind, backend)

register a backend class

  • kind – (str) the backend kind

  • backend – (class) the backend class


register backends in defaults.OMEGA_STORE_BACKENDS


register a mixin class


mixincls – (class) the mixin class

property tmppath

return an instance-specific temporary path

class omegaml.documents.Metadata(**kwargs)

Metadata stores information about objects in OmegaStore


customer-defined other meta attributes




for PANDAS_DFROWS this is the collection


created datetime


for PANDAS_HDF and SKLEARN_JOBLIB this is the gridfile


kind of data


omegaml technical attributes, e.g. column indicies


created datetime


this is the name of the data


for PYTHON_DATA this is the actual document




s3file attributes


location URI