omegaml.runtime.experiment

Concepts

  • ExperimentBackend provides the storage layer (backend to om.models)

  • TrackingProvider provides the metrics logging API

  • TrackingProxy provides live metrics tracking in runtime tasks

Backends

class omegaml.backends.experiment.ExperimentBackend(model_store=None, data_store=None, tracking=None, **kwargs)[source]

ExperimentBackend provides storage of tracker configurations

Usage:

To log metrics and other data:

with om.runtime.experiment('myexp') as exp:
    om.runtime.model('mymodel').fit(X, Y)
    om.runtime.model('mymodel').score(X, Y) # automatically log score result
    exp.log_metric('mymetric', value)
    exp.log_param('myparam', value)
    exp.log_artifact(X, 'X')
    exp.log_artifact(Y, 'Y')
    exp.log_artifact(om.models.metadata('mymodel'), 'mymodel')

To log data and automatically profile system data:

with om.runtime.experiment('myexp', provider='profiling') as exp:
    om.runtime.model('mymodel').fit(X, Y)
    om.runtime.model('mymodel').score(X, Y) # automatically log score result
    exp.log_metric('mymetric', value)
    exp.log_param('myparam', value)
    exp.log_artifact(X, 'X')
    exp.log_artifact(Y, 'Y')
    exp.log_artifact(om.models.metadata('mymodel'), 'mymodel')

# profiling data contains metrics for cpu, memory and disk use
data = exp.data(event='profile')

To get back experiment data without running an experiment:

# recommended way
exp = om.runtime.experiment('myexp').use()
exp_df = exp.data()

# experiments exist in the models store
exp = om.models.get('experiments/myexp')
exp_df = exp.data()

See also

  • omegaml.backends.tracking.OmegaSimpleTracker

  • omegaml.backends.tracking.OmegaProfilingTracker

KIND = 'experiment.tracker'
get(name, raw=False, data_store=None, **kwargs)[source]

retrieve a model

Parameters:
  • name – the name of the object

  • version – the version of the object (not supported)

put(obj, name, **kwargs)[source]

store a model

Parameters:
  • obj – the model object to be stored

  • name – the name of the object

  • attributes – attributes for meta data

classmethod supports(obj, name, **kwargs)[source]

test if this backend supports this obj

Metrics Logging

class omegaml.backends.experiment.TrackingProvider(experiment, store=None, model_store=None, autotrack=False)[source]

TrackingProvider implements an abstract interface to experiment tracking

Concrete implementations like MLFlow, Sacred or Neptune.ai can be implemented based on TrackingProvider. In combination with the runtime’s OmegaTrackingProxy this provides a powerful tracking interface that scales with your needs.

How it works:

  1. Experiments created using om.runtime.experiment() are stored as instances of a TrackingProvider concrete implementation

  2. Upon retrieval of an experiment, any call to its API is proxied to the actual implementation, e.g. MLFlow

  3. On calling a model method via the runtime, e.g. om.runtime.model().fit(), the TrackingProvider information is passed on to the runtime worker, and made available as the backend.tracking property. Thus within a model backend, you can always log to the tracker by using:

    with self.tracking as exp:
        exp.log_metric() # call any TrackingProvider method
    
  4. omega-ml provides the OmegaSimpleTracker, which implements a tracking interface similar to packages like MLFlow, Sacred. See ExperimentBackend for an example.

as_monitor(obj, alerts=None, schedule=None, store=None, provider=None, **kwargs)[source]

Return and attach a drift monitor to this experiment

Parameters:
  • obj (str) – the name of the model

  • alerts (list) – a list of alert definitions. Each alert definition is a dict with keys ‘event’, ‘recipients’. ‘event’ is the event to get from the tracking log, ‘recipients’ is a list of recipients (e.g. email address, notification channel)

  • schedule (str) – the job scheduling interval for the monitoring job, as used in om.jobs.schedule() when the job is created

  • store (OmegaStore) – the store to use, defaults to self._model_store

  • provider (str) – the name of the monitoring provider, defaults to store.prefix

Returns:

a drift monitor for the object

Return type:

monitor (DriftMonitor)

track(obj, store=None, label=None, monitor=False, monitor_kwargs=None, **kwargs)[source]

attach this experiment to the named object

Usage:

# use in experiment context with om.runtime.experiment(‘myexp’) as exp:

exp.track(‘mymodel’)

# use on experiment directly exp = om.runtime.experiment(‘myexp’) exp.track(‘mymodel’)

Parameters:
  • obj (str) – the name of the object

  • store (OmegaStore) – optional, om.models, om.scripts, om.jobs. If not provided will use om.models

  • label (str) – optional, the label of the worker, default is ‘default’

  • monitor (bool|str) – optional, truthy sets up a monitor to track drift in the object, if a string is provided it is used as the monitoring provider

  • monitor_kwargs (dict) – optional, additional keyword arguments to pass to the monitor

Note

This modifies the object’s metadata.attributes:

{ 'tracking': { label: self._experiment } }

If monitor is set, a monitor definition is added to the object’s metadata:

{ 'tracking': { 'monitors': [ { 'experiment': self._experiment,
                   'provider': monitor } ] } }
class omegaml.backends.experiment.OmegaSimpleTracker(*args, **kwargs)[source]

A tracking provider that logs to an omegaml dataset

Usage:

with om.runtime.experiment(provider='default') as exp:
    ...
    exp.log_metric('accuracy', .78)

Changed in version 0.17: any extra

active_run(run=None)[source]

set the lastest run as the active run

Parameters:

run (int|str) – optional or unique task id, if None the latest active run will be set, or a new run is created if no active run exists.

Returns:

current run (int)

clear(force=False)[source]

clear all data

All data is removed from the experiment’s dataset. This is not recoverable.

Parameters:

force (bool) – if True, clears all data, otherwise raises an error

Caution

  • this will clear all data and is not recoverable

Raises:

AssertionError – if force is not True

Added in version 0.16.2.

data(experiment=None, run=None, event=None, step=None, key=None, raw=False, lazy=False, since=None, end=None, batchsize=None, slice=None, **extra)[source]

build a dataframe of all stored data

Parameters:
  • experiment (str|list) – the name of the experiment, defaults to its current value

  • run (int|list|str|slice) – the run(s) to get data back, defaults to current run, use ‘all’ for all, 1-indexed since first run, or -1 indexed from latest run, can combine both. If run < 0 would go before the first run, run 1 will be returned. A slice(start, stop) can be used to specify a range of runs.

  • event (str|list) – the event(s) to include

  • step (int|list) – the step(s) to include

  • key (str|list) – the key(s) to include

  • raw (bool) – if True returns the raw data instead of a DataFrame

  • lazy (bool) – if True returns the Cursor instead of data, ignores raw

  • since (datetime|timedelta|str) – only return data since this date. If both since and run are specified, only matches since the given date are returned. If since is a string it must be parseable by pd.to_datime, or be given in the format ‘<n><unit:[smhdwMqy]>’ for relative times, or a timedelta object. See dtrelative() for details on relative times.

  • end (datetime) – only return data until this date

  • batchsize (int) – if specified, returns a generator yielding data in batches of batchsize, note that raw is respected, i.e. raw=False yields a DataFrame for every batch, raw=True yields a list of dicts

  • slice (tuple) – if specified, returns a slice of the data, e.g. slice=(10, 25) returns rows 10-25, the slice is applied after all other filters

Returns:

  • data (DataFrame) if raw == False

  • data (list of dicts) if raw == True

  • None if no data exists

For lazy == True, no batchsize, regardless of raw:
  • data (Cursor) for any value of raw

For lazy == True, with batchsize: * data(generator of list[dict]) if raw = True * data(generator of DataFrame) if raw = False

Return type:

For lazy == False

Changed in version 0.16.2: run supports negative indexing

Changed in version 0.17.

added batchsize

Changed in version 0.17: enabled the use of run=’*’ to retrieve all runs, equivalent of run=’all’

Changed in version 0.17: enabled data(run=, start=, end=, since=), accepting range queries on run, dt and event#

log_artifact(obj, name, step=None, dt=None, event=None, key=None, **extra)[source]

log any object to the current run

Usage:

# log an artifact
exp.log_artifact(mydict, 'somedata')

# retrieve back
mydict_ = exp.restore_artifact('somedata')
Parameters:
  • obj (obj) – any object to log

  • name (str) – the name of artifact

  • step (int) – the step, if any

  • **extra – any extra data to log

Notes

  • bool, str, int, float, list, dict are stored as format=type

  • Metadata is stored as format=metadata

  • objects supported by om.models are stored as format=model

  • objects supported by om.datasets are stored as format=dataset

  • all other objects are pickled and stored as format=pickle

log_data(key, value, step=None, dt=None, event=None, **extra)[source]

log x/y data for model predictions

This is semantic sugar for log_artifact() using the ‘data’ event.

Parameters:
  • key (str) – the name of the artifact

  • value (any) – the x/y data

  • step (int) – the step

  • dt (datetime) – the datetime

  • event (str) – the event, defaults to ‘data’

  • **extra – any other values to store with event

Returns:

None

log_events(event, key, values, step=None, dt=None, **extra)[source]

log a series of events

This is a convenience method to log multiple values for the same event. All values will be logged with the same commong log data, i.e. the same datetime, step, and any extra values.

Parameters:
  • event (str) – the event name

  • key (str) – the key for the event

  • values (list) – a list of values to log

  • step (int) – the step, if any

  • dt (datetime) – the datetime, defaults to now

  • **extra – any other values to store with event

Added in version NEXT.

log_extra(remove=False, **kwargs)[source]

add additional log information for every subsequent logging call

Parameters:
  • remove (bool) – if True, removes the extra log information

  • kwargs – any key-value pairs to log

log_metric(key, value, step=None, dt=None, **extra)[source]

log a metric value

Parameters:
  • key (str) – the metric name

  • value (str|float|int|bool|dict) – the metric value

  • step (int) – the step

  • **extra – any other values to store with event

Notes

  • logged as event=metric

log_param(key, value, step=None, dt=None, **extra)[source]

log an experiment parameter

Parameters:
  • key (str) – the parameter name

  • value (str|float|int|bool|dict) – the parameter value

  • step (int) – the step

  • **extra – any other values to store with event

Notes

  • logged as event=param

log_system(key=None, value=None, step=None, dt=None, **extra)[source]

log system data

Parameters:
  • key (str) – the key to use, defaults to ‘system’

  • value (str|float|int|bool|dict) – the parameter value

  • step (int) – the step

  • **extra – any other values to store with event

Notes

  • logged as event=system

  • logs platform, python version and list of installed packages

restore_artifact(*args, **kwargs)[source]

restore a specific logged artifact

Changed in version 0.17: deprecated, use exp.restore_artifacts() instead

restore_artifacts(key=None, experiment=None, run=None, since=None, step=None, value=None, event=None, name=None)[source]

restore logged artifacts

Parameters:
  • key (str) – the name of the artifact as provided in log_artifact

  • run (int) – the run for which to query, defaults to current run

  • since (datetime) – only return data since this date

  • step (int) – the step for which to query, defaults to all steps in run

  • value (dict|list) – dict or list of dict, this value is used instead of querying data, use to retrieve an artifact from contents of .data()

Returns:

list of restored objects

Notes

  • this will restore the artifact according to its type assigned by .log_artifact(). If the type cannot be determined, the actual data is returned

Updates:
  • since 0.17: return list of objects instead of last object

restore_data(key, run=None, event=None, since=None, concat=True, **extra)[source]

restore x/y data for model predictions

This is semantic sugar for restore_artifacts() using the event=’data’ event.

Parameters:
  • key (str) – the name of the artifact

  • run (int) – the run for which to query, defaults to current run

  • event (str) – the event, defaults to ‘data’

  • since (datetime) – only return data since this date

  • concat (bool) – if True, concatenates the data into a single object, in this case all data must be of the same type. Defaults to True.

  • **extra – any other values to store with event

Returns:

list of restored objects

start(run=None, immediate=True)[source]

start a new run

This starts a new run and logs the start event

status(run=None)[source]

status of a run

Parameters:

run (int) – the run number, defaults to the currently active run

Returns:

status in ‘STARTED’, ‘STOPPED’

stop(flush=True)[source]

stop the current run

This stops the current run and records the stop event

use(run=None)[source]

reuse the latest run instead of starting a new one

semantic sugar for self.active_run()

Returns:

self

class omegaml.backends.experiment.OmegaProfilingTracker(*args, **kwargs)[source]

A metric tracker that runs a system profiler while the experiment is active

Will record profile events that contain cpu, memory and disk profilings. See BackgroundProfiler.profile() for details of the profiling metrics collected.

Usage:

To log metrics and system performance data:

with om.runtime.experiment('myexp', provider='profiling') as exp:
    ...

data = exp.data(event='profile')

Properties:

exp.profiler.interval = n.m # interval of n.m seconds to profile, defaults to 3 seconds
exp.profiler.metrics = ['cpu', 'memory', 'disk'] # all or subset of metrics to collect
exp.max_buffer = n # number of items in buffer before tracking

Notes

  • the profiling data is buffered to reduce the number of database writes, by default the data is written on every 6 profiling events (default: 6 * 10 = every 60 seconds)

  • the step reported in the tracker counts the profiling event since the start, it is not related to the step (epoch) reported by e.g. tensorflow

  • For every step there is a event=profile, key=profile_dt entry which you can use to relate profiling events to a specific wall-clock time.

  • It usually sufficient to report system metrics in intervals > 10 seconds since machine learning algorithms tend to use CPU and memory over longer periods of time.

log_profile(data)[source]

the callback for BackgroundProfiler

class omegaml.backends.experiment.NoTrackTracker(experiment, store=None, model_store=None, autotrack=False)[source]

A default tracker that does not record anything

for tensorflow

Runtime Integration