core models¶
omegaml.store¶
- class omegaml.store.base.OmegaStore(mongo_url=None, bucket=None, prefix=None, kind=None, defaults=None, dbalias=None)¶
The storage backend for models and data
- collection(name=None, bucket=None, prefix=None)¶
Returns a mongo db collection as a datastore
If there is an existing object of name, will return the .collection of the object. Otherwise returns the collection according to naming convention {bucket}.{prefix}.{name}.datastore
- Parameters:
name – the collection to use. if none defaults to the collection name given on instantiation. the actual collection name used is always prefix + name + ‘.data’
- drop(name, force=False, version=-1, **kwargs)¶
Drop the object
- Parameters:
name – The name of the object
force – If True ignores DoesNotExist exception, defaults to False meaning this raises a DoesNotExist exception of the name does not exist
- Returns:
True if object was deleted, False if not. If force is True and the object does not exist it will still return True
- Raises:
DoesNotExist if the object does not exist and
`force=False`
- exists(name, hidden=False)¶
check if object exists
- Parameters:
name (str) – name of object
hidden (bool) – if True, include hidden files
- Returns:
bool, True if object exists
- property fs¶
Retrieve a gridfs instance using url and collection provided
- Returns:
a gridfs instance
- get(name, version=-1, force_python=False, kind=None, **kwargs)¶
Retrieve an object
- Parameters:
name – The name of the object
version – Version of the stored object (not supported)
force_python – Return as a python object
kwargs – kwargs depending on object kind
- Returns:
an object, estimator, pipelines, data array or pandas dataframe previously stored with put()
- get_backend(name, model_store=None, data_store=None, **kwargs)¶
return the backend by a given object name
- Parameters:
kind – The object kind
model_store – the OmegaStore instance used to store models
data_store – the OmegaStore instance used to store data
kwargs – the kwargs passed to the backend initialization
- Returns:
the backend
- get_backend_bykind(kind, model_store=None, data_store=None, **kwargs)¶
return the backend by a given object kind
- Parameters:
kind – The object kind
model_store – the OmegaStore instance used to store models
data_store – the OmegaStore instance used to store data
kwargs – the kwargs passed to the backend initialization
- Returns:
the backend
- get_backend_byobj(obj, name=None, kind=None, attributes=None, model_store=None, data_store=None, **kwargs)¶
return the matching backend for the given obj
- Returns:
the first backend that supports the given parameters or None
- get_dataframe_dfgroup(name, version=-1, sanitize=True, kwargs=None)¶
Return a grouped dataframe
- Parameters:
name – the name of the object
version – not supported
kwargs – mongo db query arguments to be passed to collection.find() as a filter.
sanitize – remove any $op operators in kwargs
- get_dataframe_documents(name, columns=None, lazy=False, filter=None, version=-1, is_series=False, chunksize=None, sanitize=True, trusted=None, **kwargs)¶
Internal method to return DataFrame from documents
- Parameters:
name – the name of the object (str)
columns – the column projection as a list of column names
lazy – if True returns a lazy representation as an MDataFrame. If False retrieves all data and returns a DataFrame (default)
filter – the filter to be applied as a column__op=value dict
sanitize – sanitize filter by removing all $op filter keys, defaults to True. Specify False to allow $op filter keys. $where is always removed as it is considered unsafe.
version – the version to retrieve (not supported)
is_series – if True retruns a Series instead of a DataFrame
kwargs – remaining kwargs are used a filter. The filter kwarg overrides other kwargs.
- Returns:
the retrieved object (DataFrame, Series or MDataFrame)
- get_dataframe_hdf(name, version=-1)¶
Retrieve dataframe from hdf
- Parameters:
name – The name of object
version – The version of object (not supported)
- Returns:
Returns a python pandas dataframe
- Raises:
gridfs.errors.NoFile
- get_object_as_python(meta, version=-1)¶
Retrieve object as python object
- Parameters:
meta – The metadata object
version – The version of the object
- Returns:
Returns data as python object
- get_python_data(name, filter=None, version=-1, lazy=False, trusted=False, **kwargs)¶
Retrieve objects as python data
- Parameters:
name – The name of object
version – The version of object
- Returns:
Returns the object as python list object
- getl(*args, **kwargs)¶
return a lazy MDataFrame for a given object
Same as .get, but returns a MDataFrame
- help(name_or_obj=None, kind=None, raw=False)¶
get help for an object by looking up its backend and calling help() on it
Retrieves the object’s metadata and looks up its corresponding backend. If the metadata.attributes[‘docs’] is a string it will display this as the help() contents. If the string starts with ‘http://’ or ‘https://’ it will open the web page.
- Parameters:
name_or_obj (str|obj) – the name or actual object to get help for
kind (str) – optional, if specified forces retrieval of backend for the given kind
raw (bool) – optional, if True forces help to be the backend type of the object. If False returns the attributes[docs] on the object’s metadata, if available. Defaults to False
- Returns:
help(obj) if python is in interactive mode
text(str) if python is in not interactive mode
- list(pattern=None, regexp=None, kind=None, raw=False, hidden=None, include_temp=False, bucket=None, prefix=None, filter=None)¶
List all files in store
specify pattern as a unix pattern (e.g.
models/*
, or specify regexp)- Parameters:
pattern – the unix file pattern or None for all
regexp – the regexp. takes precedence over pattern
raw – if True return the meta data objects
filter – specify additional filter criteria, optional
- Returns:
List of files in store
- make_metadata(name, kind, bucket=None, prefix=None, **kwargs)¶
create or update a metadata object
this retrieves a Metadata object if it exists given the kwargs. Only the name, prefix and bucket arguments are considered
for existing Metadata objects, the attributes kw is treated as follows:
attributes=None, the existing attributes are left as is
attributes={}, the attributes value on an existing metadata object is reset to the empty dict
attributes={ some : value }, the existing attributes are updated
For new metadata objects, attributes defaults to {} if not specified, else is set as provided.
- Parameters:
name – the object name
bucket – the bucket, optional, defaults to self.bucket
prefix – the prefix, optional, defaults to self.prefix
- metadata(name=None, bucket=None, prefix=None, version=-1, **kwargs)¶
Returns a metadata document for the given entry name
- property mongodb¶
Returns a mongo database object
- object_store_key(name, ext, hashed=None)¶
Returns the store key
Unless you write a mixin or a backend you should not use this method
- Parameters:
name – The name of object
ext – The extension of the filename
hashed – hash the key to support arbitrary name length, defaults to defaults.OMEGA_STORE_HASHEDNAMES, True by default since 0.13.7
- Returns:
A filename with relative bucket, prefix and name
- put(obj, name, attributes=None, kind=None, replace=False, **kwargs)¶
Stores an object, store estimators, pipelines, numpy arrays or pandas dataframes
- put_dataframe_as_dfgroup(obj, name, groupby, attributes=None)¶
store a dataframe grouped by columns in a mongo document
- Example:
> # each group > { > #group keys > key: val, > _data: [ > # only data keys > { key: val, … } > ]}
- put_dataframe_as_documents(obj, name, append=None, attributes=None, index=None, timestamp=None, chunksize=None, ensure_compat=True, _fast_insert=<function fast_insert>, **kwargs)¶
store a dataframe as a row-wise collection of documents
- Parameters:
obj – the dataframe to store
name – the name of the item in the store
append – if False collection will be dropped before inserting, if True existing documents will persist. Defaults to True. If not specified and rows have been previously inserted, will issue a warning.
index – list of columns, using +, -, @ as a column prefix to specify ASCENDING, DESCENDING, GEOSPHERE respectively. For @ the column has to represent a valid GeoJSON object.
timestamp – if True or a field name adds a timestamp. If the value is a boolean or datetime, uses _created as the field name. The timestamp is always datetime.datetime.utcnow(). May be overriden by specifying the tuple (col, datetime).
ensure_compat – if True attempt to convert obj to mongodb compatibility, set to False only if you are sure to have only compatible values in dataframe. defaults to True. False may reduce memory and increase speed on large dataframes.
- Returns:
the Metadata object created
- put_ndarray_as_hdf(obj, name, attributes=None)¶
store numpy array as hdf
this is hack, converting the array to a dataframe then storing it
- put_pyobj_as_document(obj, name, attributes=None, append=True, index=None, as_many=None, **kwargs)¶
store a dict as a document
similar to put_dataframe_as_documents no data will be replaced by default. that is, obj is appended as new documents into the objects’ mongo collection. to replace the data, specify append=False.
- put_pyobj_as_hdf(obj, name, attributes=None)¶
store list, tuple, dict as hdf
this requires the list, tuple or dict to be convertible into a dataframe
- rebuild_params(kwargs, collection)¶
Returns a modified set of parameters for querying mongodb based on how the mongo document is structured and the fields the document is grouped by.
Note: Explicitly to be used with get_grouped_data only
- Parameters:
kwargs – Mongo filter arguments
collection – The name of mongodb collection
- Returns:
Returns a set of parameters as dictionary.
- register_backend(kind, backend)¶
register a backend class
- Parameters:
kind – (str) the backend kind
backend – (class) the backend class
- register_backends()¶
register backends in defaults.OMEGA_STORE_BACKENDS
- register_mixin(mixincls)¶
register a mixin class
- Parameters:
mixincls – (class) the mixin class
- property tmppath¶
return an instance-specific temporary path
- class omegaml.documents.Metadata(**kwargs)¶
Metadata stores information about objects in OmegaStore
- attributes¶
customer-defined other meta attributes
- bucket¶
bucket
- collection¶
for PANDAS_DFROWS this is the collection
- created¶
created datetime
- gridfile¶
for PANDAS_HDF and SKLEARN_JOBLIB this is the gridfile
- kind¶
kind of data
- kind_meta¶
omegaml technical attributes, e.g. column indicies
- modified¶
created datetime
- name¶
this is the name of the data
- objid¶
for PYTHON_DATA this is the actual document
- prefix¶
prefix
- s3file¶
s3file attributes
- uri¶
location URI