core models

omegaml.store

class omegaml.store.base.OmegaStore(mongo_url=None, bucket=None, prefix=None, kind=None, defaults=None, dbalias=None)

The storage backend for models and data

collection(name=None, bucket=None, prefix=None)

Returns a mongo db collection as a datastore

If there is an existing object of name, will return the .collection of the object. Otherwise returns the collection according to naming convention {bucket}.{prefix}.{name}.datastore

Parameters:

name – the collection to use. if none defaults to the collection name given on instantiation. the actual collection name used is always prefix + name + ‘.data’

drop(name, force=False, version=-1, **kwargs)

Drop the object

Parameters:
  • name – The name of the object

  • force – If True ignores DoesNotExist exception, defaults to False meaning this raises a DoesNotExist exception of the name does not exist

Returns:

True if object was deleted, False if not. If force is True and the object does not exist it will still return True

Raises:

DoesNotExist if the object does not exist and `force=False`

exists(name, hidden=False)

check if object exists

Parameters:
  • name (str) – name of object

  • hidden (bool) – if True, include hidden files

Returns:

bool, True if object exists

property fs

Retrieve a gridfs instance using url and collection provided

Returns:

a gridfs instance

get(name, version=-1, force_python=False, kind=None, **kwargs)

Retrieve an object

Parameters:
  • name – The name of the object

  • version – Version of the stored object (not supported)

  • force_python – Return as a python object

  • kwargs – kwargs depending on object kind

Returns:

an object, estimator, pipelines, data array or pandas dataframe previously stored with put()

get_backend(name, model_store=None, data_store=None, **kwargs)

return the backend by a given object name

Parameters:
  • kind – The object kind

  • model_store – the OmegaStore instance used to store models

  • data_store – the OmegaStore instance used to store data

  • kwargs – the kwargs passed to the backend initialization

Returns:

the backend

get_backend_bykind(kind, model_store=None, data_store=None, **kwargs)

return the backend by a given object kind

Parameters:
  • kind – The object kind

  • model_store – the OmegaStore instance used to store models

  • data_store – the OmegaStore instance used to store data

  • kwargs – the kwargs passed to the backend initialization

Returns:

the backend

get_backend_byobj(obj, name=None, kind=None, attributes=None, model_store=None, data_store=None, **kwargs)

return the matching backend for the given obj

Returns:

the first backend that supports the given parameters or None

get_dataframe_dfgroup(name, version=-1, sanitize=True, kwargs=None)

Return a grouped dataframe

Parameters:
  • name – the name of the object

  • version – not supported

  • kwargs – mongo db query arguments to be passed to collection.find() as a filter.

  • sanitize – remove any $op operators in kwargs

get_dataframe_documents(name, columns=None, lazy=False, filter=None, version=-1, is_series=False, chunksize=None, sanitize=True, trusted=None, **kwargs)

Internal method to return DataFrame from documents

Parameters:
  • name – the name of the object (str)

  • columns – the column projection as a list of column names

  • lazy – if True returns a lazy representation as an MDataFrame. If False retrieves all data and returns a DataFrame (default)

  • filter – the filter to be applied as a column__op=value dict

  • sanitize – sanitize filter by removing all $op filter keys, defaults to True. Specify False to allow $op filter keys. $where is always removed as it is considered unsafe.

  • version – the version to retrieve (not supported)

  • is_series – if True retruns a Series instead of a DataFrame

  • kwargs – remaining kwargs are used a filter. The filter kwarg overrides other kwargs.

Returns:

the retrieved object (DataFrame, Series or MDataFrame)

get_dataframe_hdf(name, version=-1)

Retrieve dataframe from hdf

Parameters:
  • name – The name of object

  • version – The version of object (not supported)

Returns:

Returns a python pandas dataframe

Raises:

gridfs.errors.NoFile

get_object_as_python(meta, version=-1)

Retrieve object as python object

Parameters:
  • meta – The metadata object

  • version – The version of the object

Returns:

Returns data as python object

get_python_data(name, filter=None, version=-1, lazy=False, trusted=False, **kwargs)

Retrieve objects as python data

Parameters:
  • name – The name of object

  • version – The version of object

Returns:

Returns the object as python list object

getl(*args, **kwargs)

return a lazy MDataFrame for a given object

Same as .get, but returns a MDataFrame

help(name_or_obj=None, kind=None, raw=False)

get help for an object by looking up its backend and calling help() on it

Retrieves the object’s metadata and looks up its corresponding backend. If the metadata.attributes[‘docs’] is a string it will display this as the help() contents. If the string starts with ‘http://’ or ‘https://’ it will open the web page.

Parameters:
  • name_or_obj (str|obj) – the name or actual object to get help for

  • kind (str) – optional, if specified forces retrieval of backend for the given kind

  • raw (bool) – optional, if True forces help to be the backend type of the object. If False returns the attributes[docs] on the object’s metadata, if available. Defaults to False

Returns:

  • help(obj) if python is in interactive mode

  • text(str) if python is in not interactive mode

list(pattern=None, regexp=None, kind=None, raw=False, hidden=None, include_temp=False, bucket=None, prefix=None, filter=None)

List all files in store

specify pattern as a unix pattern (e.g. models/*, or specify regexp)

Parameters:
  • pattern – the unix file pattern or None for all

  • regexp – the regexp. takes precedence over pattern

  • raw – if True return the meta data objects

  • filter – specify additional filter criteria, optional

Returns:

List of files in store

make_metadata(name, kind, bucket=None, prefix=None, **kwargs)

create or update a metadata object

this retrieves a Metadata object if it exists given the kwargs. Only the name, prefix and bucket arguments are considered

for existing Metadata objects, the attributes kw is treated as follows:

  • attributes=None, the existing attributes are left as is

  • attributes={}, the attributes value on an existing metadata object is reset to the empty dict

  • attributes={ some : value }, the existing attributes are updated

For new metadata objects, attributes defaults to {} if not specified, else is set as provided.

Parameters:
  • name – the object name

  • bucket – the bucket, optional, defaults to self.bucket

  • prefix – the prefix, optional, defaults to self.prefix

metadata(name=None, bucket=None, prefix=None, version=-1, **kwargs)

Returns a metadata document for the given entry name

property mongodb

Returns a mongo database object

object_store_key(name, ext, hashed=None)

Returns the store key

Unless you write a mixin or a backend you should not use this method

Parameters:
  • name – The name of object

  • ext – The extension of the filename

  • hashed – hash the key to support arbitrary name length, defaults to defaults.OMEGA_STORE_HASHEDNAMES, True by default since 0.13.7

Returns:

A filename with relative bucket, prefix and name

put(obj, name, attributes=None, kind=None, replace=False, **kwargs)

Stores an object, store estimators, pipelines, numpy arrays or pandas dataframes

put_dataframe_as_dfgroup(obj, name, groupby, attributes=None)

store a dataframe grouped by columns in a mongo document

Example:

> # each group > { > #group keys > key: val, > _data: [ > # only data keys > { key: val, … } > ]}

put_dataframe_as_documents(obj, name, append=None, attributes=None, index=None, timestamp=None, chunksize=None, ensure_compat=True, _fast_insert=<function fast_insert>, **kwargs)

store a dataframe as a row-wise collection of documents

Parameters:
  • obj – the dataframe to store

  • name – the name of the item in the store

  • append – if False collection will be dropped before inserting, if True existing documents will persist. Defaults to True. If not specified and rows have been previously inserted, will issue a warning.

  • index – list of columns, using +, -, @ as a column prefix to specify ASCENDING, DESCENDING, GEOSPHERE respectively. For @ the column has to represent a valid GeoJSON object.

  • timestamp – if True or a field name adds a timestamp. If the value is a boolean or datetime, uses _created as the field name. The timestamp is always datetime.datetime.utcnow(). May be overriden by specifying the tuple (col, datetime).

  • ensure_compat – if True attempt to convert obj to mongodb compatibility, set to False only if you are sure to have only compatible values in dataframe. defaults to True. False may reduce memory and increase speed on large dataframes.

Returns:

the Metadata object created

put_ndarray_as_hdf(obj, name, attributes=None)

store numpy array as hdf

this is hack, converting the array to a dataframe then storing it

put_pyobj_as_document(obj, name, attributes=None, append=True, index=None, as_many=None, **kwargs)

store a dict as a document

similar to put_dataframe_as_documents no data will be replaced by default. that is, obj is appended as new documents into the objects’ mongo collection. to replace the data, specify append=False.

put_pyobj_as_hdf(obj, name, attributes=None)

store list, tuple, dict as hdf

this requires the list, tuple or dict to be convertible into a dataframe

rebuild_params(kwargs, collection)

Returns a modified set of parameters for querying mongodb based on how the mongo document is structured and the fields the document is grouped by.

Note: Explicitly to be used with get_grouped_data only

Parameters:
  • kwargs – Mongo filter arguments

  • collection – The name of mongodb collection

Returns:

Returns a set of parameters as dictionary.

register_backend(kind, backend)

register a backend class

Parameters:
  • kind – (str) the backend kind

  • backend – (class) the backend class

register_backends()

register backends in defaults.OMEGA_STORE_BACKENDS

register_mixin(mixincls)

register a mixin class

Parameters:

mixincls – (class) the mixin class

property tmppath

return an instance-specific temporary path

class omegaml.documents.Metadata(**kwargs)

Metadata stores information about objects in OmegaStore

attributes

customer-defined other meta attributes

bucket

bucket

collection

for PANDAS_DFROWS this is the collection

created

created datetime

gridfile

for PANDAS_HDF and SKLEARN_JOBLIB this is the gridfile

kind

kind of data

kind_meta

omegaml technical attributes, e.g. column indicies

modified

created datetime

name

this is the name of the data

objid

for PYTHON_DATA this is the actual document

prefix

prefix

s3file

s3file attributes

uri

location URI