core models ¶

omegaml.store ¶

class omegaml.store.base.OmegaStore(mongo_url=None, bucket=None, prefix=None, kind=None, defaults=None, dbalias=None)¶

The storage backend for models and data

collection(name=None, bucket=None, prefix=None)¶

Returns a mongo db collection as a datastore

If there is an existing object of name, will return the .collection of the object. Otherwise returns the collection according to naming convention {bucket}.{prefix}.{name}.datastore

Parameters:: name – the collection to use. if none defaults to the collection name given on instantiation. the actual collection name used is always prefix + name + ‘.data’

drop(name, force=False, version=- 1, **kwargs)¶

Drop the object

Parameters:

name – The name of the object
force – If True ignores DoesNotExist exception, defaults to False meaning this raises a DoesNotExist exception of the name does not exist

Returns:

True if object was deleted, False if not. If force is True and the object does not exist it will still return True

Raises:

DoesNotExist if the object does not exist and `force=False`

exists(name, hidden=False)¶

check if object exists

Parameters:

name (str) – name of object
hidden (bool) – if True, include hidden files

Returns:

bool, True if object exists

property fs¶

Retrieve a gridfs instance using url and collection provided

Returns:: a gridfs instance

get(name, version=- 1, force_python=False, kind=None, **kwargs)¶

Retrieve an object

Parameters:

name – The name of the object
version – Version of the stored object (not supported)
force_python – Return as a python object
kwargs – kwargs depending on object kind

Returns:

an object, estimator, pipelines, data array or pandas dataframe previously stored with put()

get_backend(name, model_store=None, data_store=None, **kwargs)¶

return the backend by a given object name

Parameters:

kind – The object kind
model_store – the OmegaStore instance used to store models
data_store – the OmegaStore instance used to store data
kwargs – the kwargs passed to the backend initialization

Returns:

the backend

get_backend_bykind(kind, model_store=None, data_store=None, **kwargs)¶

return the backend by a given object kind

Parameters:

kind – The object kind
model_store – the OmegaStore instance used to store models
data_store – the OmegaStore instance used to store data
kwargs – the kwargs passed to the backend initialization

Returns:

the backend

get_backend_byobj(obj, name=None, kind=None, attributes=None, model_store=None, data_store=None, **kwargs)¶

return the matching backend for the given obj

Returns:: the first backend that supports the given parameters or None

get_dataframe_dfgroup(name, version=- 1, kwargs=None)¶

Return a grouped dataframe

Parameters:

name – the name of the object
version – not supported
kwargs – mongo db query arguments to be passed to collection.find() as a filter.

get_dataframe_documents(name, columns=None, lazy=False, filter=None, version=- 1, is_series=False, chunksize=None, **kwargs)¶

Internal method to return DataFrame from documents

Parameters:

name – the name of the object (str)
columns – the column projection as a list of column names
lazy – if True returns a lazy representation as an MDataFrame. If False retrieves all data and returns a DataFrame (default)
filter – the filter to be applied as a column__op=value dict
version – the version to retrieve (not supported)
is_series – if True retruns a Series instead of a DataFrame
kwargs – remaining kwargs are used a filter. The filter kwarg overrides other kwargs.

Returns:

the retrieved object (DataFrame, Series or MDataFrame)

get_dataframe_hdf(name, version=- 1)¶

Retrieve dataframe from hdf

Parameters:

name – The name of object
version – The version of object (not supported)

Returns:

Returns a python pandas dataframe

Raises:

gridfs.errors.NoFile

get_object_as_python(meta, version=- 1)¶

Retrieve object as python object

Parameters:

meta – The metadata object
version – The version of the object

Returns:

Returns data as python object

get_python_data(name, version=- 1, **kwargs)¶

Retrieve objects as python data

Parameters:

name – The name of object
version – The version of object

Returns:

Returns the object as python list object

getl(*args, **kwargs)¶

return a lazy MDataFrame for a given object

Same as .get, but returns a MDataFrame

help(name_or_obj=None, kind=None, raw=False)¶

get help for an object by looking up its backend and calling help() on it

Retrieves the object’s metadata and looks up its corresponding backend. If the metadata.attributes[‘docs’] is a string it will display this as the help() contents. If the string starts with ‘http://’ or ‘https://’ it will open the web page.

Parameters:

name_or_obj (str|obj) – the name or actual object to get help for
kind (str) – optional, if specified forces retrieval of backend for the given kind
raw (bool) – optional, if True forces help to be the backend type of the object. If False returns the attributes[docs] on the object’s metadata, if available. Defaults to False

Returns:

help(obj) if python is in interactive mode
text(str) if python is in not interactive mode

list(pattern=None, regexp=None, kind=None, raw=False, hidden=None, include_temp=False, bucket=None, prefix=None, filter=None)¶

List all files in store

specify pattern as a unix pattern (e.g. models/*, or specify regexp)

Parameters:

pattern – the unix file pattern or None for all
regexp – the regexp. takes precedence over pattern
raw – if True return the meta data objects
filter – specify additional filter criteria, optional

Returns:

List of files in store

make_metadata(name, kind, bucket=None, prefix=None, **kwargs)¶

create or update a metadata object

this retrieves a Metadata object if it exists given the kwargs. Only the name, prefix and bucket arguments are considered

for existing Metadata objects, the attributes kw is treated as follows:

attributes=None, the existing attributes are left as is
attributes={}, the attributes value on an existing metadata object is reset to the empty dict
attributes={ some : value }, the existing attributes are updated

For new metadata objects, attributes defaults to {} if not specified, else is set as provided.

Parameters:

name – the object name
bucket – the bucket, optional, defaults to self.bucket
prefix – the prefix, optional, defaults to self.prefix

metadata(name=None, bucket=None, prefix=None, version=- 1)¶

Returns a metadata document for the given entry name

FIXME: version attribute does not do anything FIXME: metadata should be stored in a bucket-specific collection to enable access control, see https://docs.mongodb.com/manual/reference/method/db.create

Role/#db.createRole

property mongodb¶: Returns a mongo database object

object_store_key(name, ext, hashed=None)¶

Returns the store key

Unless you write a mixin or a backend you should not use this method

Parameters:

name – The name of object
ext – The extension of the filename
hashed – hash the key to support arbitrary name length, defaults to defaults.OMEGA_STORE_HASHEDNAMES, True by default since 0.13.7

Returns:

A filename with relative bucket, prefix and name

put(obj, name, attributes=None, kind=None, replace=False, **kwargs)¶: Stores an object, store estimators, pipelines, numpy arrays or pandas dataframes

put_dataframe_as_dfgroup(obj, name, groupby, attributes=None)¶

store a dataframe grouped by columns in a mongo document

Example:: > # each group > { > #group keys > key: val, > _data: [ > # only data keys > { key: val, … } > ]}

put_dataframe_as_documents(obj, name, append=None, attributes=None, index=None, timestamp=None, chunksize=None, ensure_compat=True, _fast_insert=<function fast_insert>, **kwargs)¶

store a dataframe as a row-wise collection of documents

Parameters:

obj – the dataframe to store
name – the name of the item in the store
append – if False collection will be dropped before inserting, if True existing documents will persist. Defaults to True. If not specified and rows have been previously inserted, will issue a warning.
index – list of columns, using +, -, @ as a column prefix to specify ASCENDING, DESCENDING, GEOSPHERE respectively. For @ the column has to represent a valid GeoJSON object.
timestamp – if True or a field name adds a timestamp. If the value is a boolean or datetime, uses _created as the field name. The timestamp is always datetime.datetime.utcnow(). May be overriden by specifying the tuple (col, datetime).
ensure_compat – if True attempt to convert obj to mongodb compatibility, set to False only if you are sure to have only compatible values in dataframe. defaults to True. False may reduce memory and increase speed on large dataframes.

Returns:

the Metadata object created

put_ndarray_as_hdf(obj, name, attributes=None)¶

store numpy array as hdf

this is hack, converting the array to a dataframe then storing it

put_pyobj_as_document(obj, name, attributes=None, append=True)¶

store a dict as a document

similar to put_dataframe_as_documents no data will be replaced by default. that is, obj is appended as new documents into the objects’ mongo collection. to replace the data, specify append=False.

put_pyobj_as_hdf(obj, name, attributes=None)¶

store list, tuple, dict as hdf

this requires the list, tuple or dict to be convertible into a dataframe

rebuild_params(kwargs, collection)¶

Returns a modified set of parameters for querying mongodb based on how the mongo document is structured and the fields the document is grouped by.

Note: Explicitly to be used with get_grouped_data only

Parameters:

kwargs – Mongo filter arguments
collection – The name of mongodb collection

Returns:

Returns a set of parameters as dictionary.

register_backend(kind, backend)¶

register a backend class

Parameters:

kind – (str) the backend kind
backend – (class) the backend class

register_backends()¶: register backends in defaults.OMEGA_STORE_BACKENDS

register_mixin(mixincls)¶

register a mixin class

Parameters:: mixincls – (class) the mixin class

property tmppath¶: return an instance-specific temporary path

class omegaml.documents.Metadata(**kwargs)¶

Metadata stores information about objects in OmegaStore

attributes¶: customer-defined other meta attributes

bucket¶: bucket

collection¶: for PANDAS_DFROWS this is the collection

created¶: created datetime

gridfile¶: for PANDAS_HDF and SKLEARN_JOBLIB this is the gridfile

kind¶: kind of data

kind_meta¶: omegaml technical attributes, e.g. column indicies

modified¶: created datetime

name¶: this is the name of the data

objid¶: for PYTHON_DATA this is the actual document

prefix¶: prefix

s3file¶: s3file attributes

uri¶: location URI

core models¶

omegaml.store¶

core models ¶

omegaml.store ¶