omegaml.runtime.experiment ¶

Concepts ¶

ExperimentBackend provides the storage layer (backend to om.models)
TrackingProvider provides the metrics logging API
TrackingProxy provides live metrics tracking in runtime tasks

Backends ¶

class omegaml.backends.experiment.ExperimentBackend(model_store=None, data_store=None, tracking=None, **kwargs)[source]¶

ExperimentBackend provides storage of tracker configurations

Usage:

To log metrics and other data:

with om.runtime.experiment('myexp') as exp:
    om.runtime.model('mymodel').fit(X, Y)
    om.runtime.model('mymodel').score(X, Y) # automatically log score result
    exp.log_metric('mymetric', value)
    exp.log_param('myparam', value)
    exp.log_artifact(X, 'X')
    exp.log_artifact(Y, 'Y')
    exp.log_artifact(om.models.metadata('mymodel'), 'mymodel')

To log data and automatically profile system data:

with om.runtime.experiment('myexp', provider='profiling') as exp:
    om.runtime.model('mymodel').fit(X, Y)
    om.runtime.model('mymodel').score(X, Y) # automatically log score result
    exp.log_metric('mymetric', value)
    exp.log_param('myparam', value)
    exp.log_artifact(X, 'X')
    exp.log_artifact(Y, 'Y')
    exp.log_artifact(om.models.metadata('mymodel'), 'mymodel')

# profiling data contains metrics for cpu, memory and disk use
data = exp.data(event='profile')

To get back experiment data without running an experiment:

# recommended way
exp = om.runtime.experiment('myexp').use()
exp_df = exp.data()

# experiments exist in the models store
exp = om.models.get('experiments/myexp')
exp_df = exp.data()

Metrics Logging ¶

class omegaml.backends.experiment.TrackingProvider(experiment, store=None, model_store=None, autotrack=False)[source]¶

TrackingProvider implements an abstract interface to experiment tracking

Concrete implementations like MLFlow, Sacred or Neptune.ai can be implemented based on TrackingProvider. In combination with the runtime’s OmegaTrackingProxy this provides a powerful tracking interface that scales with your needs.

How it works:

Experiments created using om.runtime.experiment() are stored as instances of a TrackingProvider concrete implementation

Upon retrieval of an experiment, any call to its API is proxied to the actual implementation, e.g. MLFlow
On calling a model method via the runtime, e.g. om.runtime.model().fit(), the TrackingProvider information is passed on to the runtime worker, and made available as the backend.tracking property. Thus within a model backend, you can always log to the tracker by using:
with self.tracking as exp:
    exp.log_metric() # call any TrackingProvider method
omega-ml provides the OmegaSimpleTracker, which implements a tracking interface similar to packages like MLFlow, Sacred. See ExperimentBackend for an example.

as_monitor(obj, alerts=None, schedule=None, store=None, provider=None, **kwargs)[source]¶

Return and attach a drift monitor to this experiment

Parameters:

obj (str) – the name of the model
alerts (list) – a list of alert definitions. Each alert definition is a dict with keys ‘event’, ‘recipients’. ‘event’ is the event to get from the tracking log, ‘recipients’ is a list of recipients (e.g. email address, notification channel)
schedule (str) – the job scheduling interval for the monitoring job, as used in om.jobs.schedule() when the job is created
store (OmegaStore) – the store to use, defaults to self._model_store
provider (str) – the name of the monitoring provider, defaults to store.prefix

Returns:

a drift monitor for the object

Return type:

monitor (DriftMonitor)

track(obj, store=None, label=None, monitor=False, monitor_kwargs=None, **kwargs)[source]¶

attach this experiment to the named object

Usage:

# use in experiment context with om.runtime.experiment(‘myexp’) as exp:

exp.track(‘mymodel’)

# use on experiment directly exp = om.runtime.experiment(‘myexp’) exp.track(‘mymodel’)

Parameters:

obj (str) – the name of the object
store (OmegaStore) – optional, om.models, om.scripts, om.jobs. If not provided will use om.models
label (str) – optional, the label of the worker, default is ‘default’
monitor (bool|str) – optional, truthy sets up a monitor to track drift in the object, if a string is provided it is used as the monitoring provider
monitor_kwargs (dict) – optional, additional keyword arguments to pass to the monitor

Note

This modifies the object’s metadata.attributes:

{ 'tracking': { label: self._experiment } }

If monitor is set, a monitor definition is added to the object’s metadata:

{ 'tracking': { 'monitors': [ { 'experiment': self._experiment,
                   'provider': monitor } ] } }

class omegaml.backends.experiment.OmegaSimpleTracker(*args, **kwargs)[source]¶

A tracking provider that logs to an omegaml dataset

Usage:

with om.runtime.experiment(provider='default') as exp:
    ...
    exp.log_metric('accuracy', .78)

Changed in version 0.17: any extra

active_run(run=None)[source]¶

set the lastest run as the active run

Parameters:: run (int|str) – optional or unique task id, if None the latest active run will be set, or a new run is created if no active run exists.
Returns:: current run (int)

clear(force=False)[source]¶

clear all data

All data is removed from the experiment’s dataset. This is not recoverable.

Parameters:: force (bool) – if True, clears all data, otherwise raises an error

Caution

this will clear all data and is not recoverable

Raises:: AssertionError – if force is not True

Added in version 0.16.2.

data(experiment=None, run=None, event=None, step=None, key=None, raw=False, lazy=False, since=None, end=None, batchsize=None, slice=None, **extra)[source]¶

build a dataframe of all stored data

Parameters:

experiment (str|list) – the name of the experiment, defaults to its current value
run (int|list|str|slice) – the run(s) to get data back, defaults to current run, use ‘all’ for all, 1-indexed since first run, or -1 indexed from latest run, can combine both. If run < 0 would go before the first run, run 1 will be returned. A slice(start, stop) can be used to specify a range of runs.
event (str|list) – the event(s) to include
step (int|list) – the step(s) to include
key (str|list) – the key(s) to include
raw (bool) – if True returns the raw data instead of a DataFrame
lazy (bool) – if True returns the Cursor instead of data, ignores raw
since (datetime|timedelta|str) – only return data since this date. If both since and run are specified, only matches since the given date are returned. If since is a string it must be parseable by pd.to_datime, or be given in the format ‘<n><unit:[smhdwMqy]>’ for relative times, or a timedelta object. See dtrelative() for details on relative times.
end (datetime) – only return data until this date
batchsize (int) – if specified, returns a generator yielding data in batches of batchsize, note that raw is respected, i.e. raw=False yields a DataFrame for every batch, raw=True yields a list of dicts
slice (tuple) – if specified, returns a slice of the data, e.g. slice=(10, 25) returns rows 10-25, the slice is applied after all other filters

Returns:

data (DataFrame) if raw == False
data (list of dicts) if raw == True
None if no data exists

For lazy == True, no batchsize, regardless of raw:

data (Cursor) for any value of raw

For lazy == True, with batchsize: * data(generator of list[dict]) if raw = True * data(generator of DataFrame) if raw = False

Return type:

For lazy == False

Changed in version 0.16.2: run supports negative indexing

Changed in version 0.17.

added batchsize

Changed in version 0.17: enabled the use of run=’*’ to retrieve all runs, equivalent of run=’all’

Changed in version 0.17: enabled data(run=, start=, end=, since=), accepting range queries on run, dt and event#

log_artifact(obj, name, step=None, dt=None, event=None, key=None, **extra)[source]¶

log any object to the current run

Usage:

# log an artifact
exp.log_artifact(mydict, 'somedata')

# retrieve back
mydict_ = exp.restore_artifact('somedata')

Parameters:

obj (obj) – any object to log
name (str) – the name of artifact
step (int) – the step, if any
**extra – any extra data to log

Notes

bool, str, int, float, list, dict are stored as format=type
Metadata is stored as format=metadata
objects supported by om.models are stored as format=model
objects supported by om.datasets are stored as format=dataset
all other objects are pickled and stored as format=pickle

log_data(key, value, step=None, dt=None, event=None, **extra)[source]¶

log x/y data for model predictions

This is semantic sugar for log_artifact() using the ‘data’ event.

Parameters:

key (str) – the name of the artifact
value (any) – the x/y data
step (int) – the step
dt (datetime) – the datetime
event (str) – the event, defaults to ‘data’
**extra – any other values to store with event

Returns:

None

log_events(event, key, values, step=None, dt=None, **extra)[source]¶

log a series of events

This is a convenience method to log multiple values for the same event. All values will be logged with the same commong log data, i.e. the same datetime, step, and any extra values.

Parameters:

event (str) – the event name
key (str) – the key for the event
values (list) – a list of values to log
step (int) – the step, if any
dt (datetime) – the datetime, defaults to now
**extra – any other values to store with event

Added in version NEXT.

log_extra(remove=False, **kwargs)[source]¶

add additional log information for every subsequent logging call

Parameters:

remove (bool) – if True, removes the extra log information
kwargs – any key-value pairs to log

log_metric(key, value, step=None, dt=None, **extra)[source]¶

log a metric value

Parameters:

key (str) – the metric name
value (str|float|int|bool|dict) – the metric value
step (int) – the step
**extra – any other values to store with event

Notes

logged as event=metric

log_param(key, value, step=None, dt=None, **extra)[source]¶

log an experiment parameter

Parameters:

key (str) – the parameter name
value (str|float|int|bool|dict) – the parameter value
step (int) – the step
**extra – any other values to store with event

Notes

logged as event=param

log_system(key=None, value=None, step=None, dt=None, **extra)[source]¶

log system data

Parameters:

key (str) – the key to use, defaults to ‘system’
value (str|float|int|bool|dict) – the parameter value
step (int) – the step
**extra – any other values to store with event

Notes

logged as event=system
logs platform, python version and list of installed packages

restore_artifact(*args, **kwargs)[source]¶: restore a specific logged artifact

Changed in version 0.17: deprecated, use exp.restore_artifacts() instead

restore_artifacts(key=None, experiment=None, run=None, since=None, step=None, value=None, event=None, name=None)[source]¶

restore logged artifacts

Parameters:

key (str) – the name of the artifact as provided in log_artifact
run (int) – the run for which to query, defaults to current run
since (datetime) – only return data since this date
step (int) – the step for which to query, defaults to all steps in run
value (dict|list) – dict or list of dict, this value is used instead of querying data, use to retrieve an artifact from contents of .data()

Returns:

list of restored objects

Notes

this will restore the artifact according to its type assigned by .log_artifact(). If the type cannot be determined, the actual data is returned

Updates:

since 0.17: return list of objects instead of last object

restore_data(key, run=None, event=None, since=None, concat=True, **extra)[source]¶

restore x/y data for model predictions

This is semantic sugar for restore_artifacts() using the event=’data’ event.

Parameters:

key (str) – the name of the artifact
run (int) – the run for which to query, defaults to current run
event (str) – the event, defaults to ‘data’
since (datetime) – only return data since this date
concat (bool) – if True, concatenates the data into a single object, in this case all data must be of the same type. Defaults to True.
**extra – any other values to store with event

Returns:

list of restored objects

start(run=None, immediate=True)[source]¶

start a new run

This starts a new run and logs the start event

status(run=None)[source]¶

status of a run

Parameters:: run (int) – the run number, defaults to the currently active run
Returns:: status in ‘STARTED’, ‘STOPPED’

stop(flush=True)[source]¶

stop the current run

This stops the current run and records the stop event

use(run=None)[source]¶

reuse the latest run instead of starting a new one

semantic sugar for self.active_run()

Returns:: self

class omegaml.backends.experiment.OmegaProfilingTracker(*args, **kwargs)[source]¶

A metric tracker that runs a system profiler while the experiment is active

Will record profile events that contain cpu, memory and disk profilings. See BackgroundProfiler.profile() for details of the profiling metrics collected.

Usage:

To log metrics and system performance data:

with om.runtime.experiment('myexp', provider='profiling') as exp:
    ...

data = exp.data(event='profile')

Properties:

exp.profiler.interval = n.m # interval of n.m seconds to profile, defaults to 3 seconds
exp.profiler.metrics = ['cpu', 'memory', 'disk'] # all or subset of metrics to collect
exp.max_buffer = n # number of items in buffer before tracking

Notes

the profiling data is buffered to reduce the number of database writes, by default the data is written on every 6 profiling events (default: 6 * 10 = every 60 seconds)
the step reported in the tracker counts the profiling event since the start, it is not related to the step (epoch) reported by e.g. tensorflow
For every step there is a event=profile, key=profile_dt entry which you can use to relate profiling events to a specific wall-clock time.
It usually sufficient to report system metrics in intervals > 10 seconds since machine learning algorithms tend to use CPU and memory over longer periods of time.

log_profile(data)[source]¶: the callback for BackgroundProfiler

class omegaml.backends.experiment.NoTrackTracker(experiment, store=None, model_store=None, autotrack=False)[source]¶: A default tracker that does not record anything

for tensorflow

omegaml.runtime.experiment¶

Concepts¶

Backends¶

Metrics Logging¶

Runtime Integration¶

omegaml.runtime.experiment ¶

Concepts ¶

Backends ¶

Metrics Logging ¶

Runtime Integration ¶