omegaml.runtime.experiment

Concepts

  • ExperimentBackend provides the storage layer (backend to om.models)

  • TrackingProvider provides the metrics logging API

  • TrackingProxy provides live metrics tracking in runtime tasks

Backends

class omegaml.backends.experiment.ExperimentBackend(model_store=None, data_store=None, tracking=None, **kwargs)

ExperimentBackend provides storage of tracker configurations

Usage:

To log metrics and other data:

with om.runtime.experiment('myexp') as exp:
    om.runtime.model('mymodel').fit(X, Y)
    om.runtime.model('mymodel').score(X, Y) # automatically log score result
    exp.log_metric('mymetric', value)
    exp.log_param('myparam', value)
    exp.log_artifact(X, 'X')
    exp.log_artifact(Y, 'Y')
    exp.log_artifact(om.models.metadata('mymodel'), 'mymodel')

To log data and automatically profile system data:

with om.runtime.experiment('myexp', provider='profiling') as exp:
    om.runtime.model('mymodel').fit(X, Y)
    om.runtime.model('mymodel').score(X, Y) # automatically log score result
    exp.log_metric('mymetric', value)
    exp.log_param('myparam', value)
    exp.log_artifact(X, 'X')
    exp.log_artifact(Y, 'Y')
    exp.log_artifact(om.models.metadata('mymodel'), 'mymodel')

# profiling data contains metrics for cpu, memory and disk use
data = exp.data(event='profile')

To get back experiment data without running an experiment:

# recommended way
exp = om.runtime.experiment('myexp').use()
exp_df = exp.data()

# experiments exist in the models store
exp = om.models.get('experiments/myexp')
exp_df = exp.data()

See also

  • omegaml.backends.tracking.OmegaSimpleTracker

  • omegaml.backends.tracking.OmegaProfilingTracker

KIND = 'experiment.tracker'
get(name, raw=False, data_store=None, **kwargs)

retrieve a model

Parameters:
  • name – the name of the object

  • version – the version of the object (not supported)

put(obj, name, **kwargs)

store a model

Parameters:
  • obj – the model object to be stored

  • name – the name of the object

  • attributes – attributes for meta data

classmethod supports(obj, name, **kwargs)

test if this backend supports this obj

Metrics Logging

class omegaml.backends.experiment.TrackingProvider(experiment, store=None, model_store=None, autotrack=False)

TrackingProvider implements an abstract interface to experiment tracking

Concrete implementations like MLFlow, Sacred or Neptune.ai can be implemented based on TrackingProvider. In combination with the runtime’s OmegaTrackingProxy this provides a powerful tracking interface that scales with your needs.

How it works:

  1. Experiments created using om.runtime.experiment() are stored as instances of a TrackingProvider concrete implementation

  2. Upon retrieval of an experiment, any call to its API is proxied to the actual implementation, e.g. MLFlow

  3. On calling a model method via the runtime, e.g. om.runtime.model().fit(), the TrackingProvider information is passed on to the runtime worker, and made available as the backend.tracking property. Thus within a model backend, you can always log to the tracker by using:

    with self.tracking as exp:
        exp.log_metric() # call any TrackingProvider method
    
  4. omega-ml provides the OmegaSimpleTracker, which implements a tracking interface similar to packages like MLFlow, Sacred. See ExperimentBackend for an example.

as_monitor(obj, alerts=None, schedule=None, store=None, provider=None, **kwargs)

Return and attach a drift monitor to this experiment

Parameters:
  • obj (str) – the name of the model

  • alerts (list) – a list of alert definitions. Each alert definition is a dict with keys ‘event’, ‘recipients’. ‘event’ is the event to get from the tracking log, ‘recipients’ is a list of recipients (e.g. email address, notification channel)

  • schedule (str) – the job scheduling interval for the monitoring job, as used in om.jobs.schedule() when the job is created

  • store (OmegaStore) – the store to use, defaults to self._model_store

  • provider (str) – the name of the monitoring provider, defaults to store.prefix

Returns:

a drift monitor for the object

Return type:

monitor (DriftMonitor)

track(obj, store=None, label=None, monitor=False, monitor_kwargs=None, **kwargs)

attach this experiment to the named object

Usage:

# use in experiment context with om.runtime.experiment(‘myexp’) as exp:

exp.track(‘mymodel’)

# use on experiment directly exp = om.runtime.experiment(‘myexp’) exp.track(‘mymodel’)

Parameters:
  • obj (str) – the name of the object

  • store (OmegaStore) – optional, om.models, om.scripts, om.jobs. If not provided will use om.models

  • label (str) – optional, the label of the worker, default is ‘default’

  • monitor (bool|str) – optional, truthy sets up a monitor to track drift in the object, if a string is provided it is used as the monitoring provider

  • monitor_kwargs (dict) – optional, additional keyword arguments to pass to the monitor

Note

This modifies the object’s metadata.attributes:

{ 'tracking': { label: self._experiment } }

If monitor is set, a monitor definition is added to the object’s metadata:

{ 'tracking': { 'monitors': [ { 'experiment': self._experiment,
                   'provider': monitor } ] } }
class omegaml.backends.experiment.OmegaSimpleTracker(*args, **kwargs)

A tracking provider that logs to an omegaml dataset

Usage:

with om.runtime.experiment(provider='default') as exp:
    ...
    exp.log_metric('accuracy', .78)

Changed in version 0.17: any extra

active_run(run=None)

set the lastest run as the active run

Parameters:

run (int|str) – optional or unique task id, if None the latest active run will be set, or a new run is created if no active run exists.

Returns:

current run (int)

clear(force=False)

clear all data

All data is removed from the experiment’s dataset. This is not recoverable.

Parameters:

force (bool) – if True, clears all data, otherwise raises an error

Caution

  • this will clear all data and is not recoverable

Raises:

AssertionError – if force is not True

Added in version 0.16.2.

data(experiment=None, run=None, event=None, step=None, key=None, raw=False, lazy=False, since=None, batchsize=None, **extra)

build a dataframe of all stored data

Parameters:
  • experiment (str|list) – the name of the experiment, defaults to its current value

  • run (int|list|str) – the run(s) to get data back, defaults to current run, use ‘all’ for all, 1-indexed since first run, or -1 indexed from latest run, can combine both. If run < 0 would go before the first run, run 1 will be returned.

  • event (str|list) – the event(s) to include

  • step (int|list) – the step(s) to include

  • key (str|list) – the key(s) to include

  • raw (bool) – if True returns the raw data instead of a DataFrame

  • lazy (bool) – if True returns the Cursor instead of data, ignores raw

  • since (datetime) – only return data since this date. If both since and run are specified, run is ignored and all runs since the date are returned

  • batchsize (int) – if specified, returns a generator yielding data in batches of batchsize, note that raw is respected, i.e. raw=False yields a DataFrame for every batch, raw=True yields a list of dicts

Returns:

  • data (DataFrame) if raw == False

  • data (list of dicts) if raw == True

  • None if no data exists

For lazy == True, no batchsize, regardless of raw:
  • data (Cursor) for any value of raw

For lazy == True, with batchsize: * data(generator of list[dict]) if raw = True * data(generator of DataFrame) if raw = False

Return type:

For lazy == False

Changed in version 0.16.2: run supports negative indexing

Changed in version 0.17.

added batchsize

Changed in version 0.17: enabled the use of run=’*’ to retrieve all runs, equivalent of run=’all’

log_artifact(obj, name, step=None, dt=None, event=None, key=None, **extra)

log any object to the current run

Usage:

# log an artifact
exp.log_artifact(mydict, 'somedata')

# retrieve back
mydict_ = exp.restore_artifact('somedata')
Parameters:
  • obj (obj) – any object to log

  • name (str) – the name of artifact

  • step (int) – the step, if any

  • **extra – any extra data to log

Notes

  • bool, str, int, float, list, dict are stored as format=type

  • Metadata is stored as format=metadata

  • objects supported by om.models are stored as format=model

  • objects supported by om.datasets are stored as format=dataset

  • all other objects are pickled and stored as format=pickle

log_data(key, value, step=None, dt=None, event=None, **extra)

log x/y data for model predictions

This is semantic sugar for log_artifact() using the ‘data’ event.

Parameters:
  • key (str) – the name of the artifact

  • value (any) – the x/y data

  • step (int) – the step

  • dt (datetime) – the datetime

  • event (str) – the event, defaults to ‘data’

  • **extra – any other values to store with event

Returns:

None

log_extra(remove=False, **kwargs)

add additional log information for every subsequent logging call

Parameters:
  • remove (bool) – if True, removes the extra log information

  • kwargs – any key-value pairs to log

log_metric(key, value, step=None, dt=None, **extra)

log a metric value

Parameters:
  • key (str) – the metric name

  • value (str|float|int|bool|dict) – the metric value

  • step (int) – the step

  • **extra – any other values to store with event

Notes

  • logged as event=metric

log_param(key, value, step=None, dt=None, **extra)

log an experiment parameter

Parameters:
  • key (str) – the parameter name

  • value (str|float|int|bool|dict) – the parameter value

  • step (int) – the step

  • **extra – any other values to store with event

Notes

  • logged as event=param

log_system(key=None, value=None, step=None, dt=None, **extra)

log system data

Parameters:
  • key (str) – the key to use, defaults to ‘system’

  • value (str|float|int|bool|dict) – the parameter value

  • step (int) – the step

  • **extra – any other values to store with event

Notes

  • logged as event=system

  • logs platform, python version and list of installed packages

restore_artifact(*args, **kwargs)

restore a specific logged artifact

Changed in version 0.17: deprecated, use exp.restore_artifacts() instead

restore_artifacts(key=None, experiment=None, run=None, since=None, step=None, value=None, event=None, name=None)

restore logged artifacts

Parameters:
  • key (str) – the name of the artifact as provided in log_artifact

  • run (int) – the run for which to query, defaults to current run

  • since (datetime) – only return data since this date

  • step (int) – the step for which to query, defaults to all steps in run

  • value (dict|list) – dict or list of dict, this value is used instead of querying data, use to retrieve an artifact from contents of .data()

Returns:

list of restored objects

Notes

  • this will restore the artifact according to its type assigned by .log_artifact(). If the type cannot be determined, the actual data is returned

Updates:
  • since 0.17: return list of objects instead of last object

restore_data(key, run=None, event=None, since=None, concat=True, **extra)

restore x/y data for model predictions

This is semantic sugar for restore_artifacts() using the event=’data’ event.

Parameters:
  • key (str) – the name of the artifact

  • run (int) – the run for which to query, defaults to current run

  • event (str) – the event, defaults to ‘data’

  • since (datetime) – only return data since this date

  • concat (bool) – if True, concatenates the data into a single object, in this case all data must be of the same type. Defaults to True.

  • **extra – any other values to store with event

Returns:

list of restored objects

start(run=None)

start a new run

This starts a new run and logs the start event

status(run=None)

status of a run

Parameters:

run (int) – the run number, defaults to the currently active run

Returns:

status in ‘STARTED’, ‘STOPPED’

stop()

stop the current run

This stops the current run and records the stop event

use(run=None)

reuse the latest run instead of starting a new one

semantic sugar for self.active_run()

Returns:

self

class omegaml.backends.experiment.OmegaProfilingTracker(*args, **kwargs)

A metric tracker that runs a system profiler while the experiment is active

Will record profile events that contain cpu, memory and disk profilings. See BackgroundProfiler.profile() for details of the profiling metrics collected.

Usage:

To log metrics and system performance data:

with om.runtime.experiment('myexp', provider='profiling') as exp:
    ...

data = exp.data(event='profile')

Properties:

exp.profiler.interval = n.m # interval of n.m seconds to profile, defaults to 3 seconds
exp.profiler.metrics = ['cpu', 'memory', 'disk'] # all or subset of metrics to collect
exp.max_buffer = n # number of items in buffer before tracking

Notes

  • the profiling data is buffered to reduce the number of database writes, by default the data is written on every 6 profiling events (default: 6 * 10 = every 60 seconds)

  • the step reported in the tracker counts the profiling event since the start, it is not related to the step (epoch) reported by e.g. tensorflow

  • For every step there is a event=profile, key=profile_dt entry which you can use to relate profiling events to a specific wall-clock time.

  • It usually sufficient to report system metrics in intervals > 10 seconds since machine learning algorithms tend to use CPU and memory over longer periods of time.

log_profile(data)

the callback for BackgroundProfiler

class omegaml.backends.experiment.NoTrackTracker(experiment, store=None, model_store=None, autotrack=False)

A default tracker that does not record anything

for tensorflow

class omegaml.backends.experiment.TensorflowCallback(*args, **kwargs)

Runtime Integration