omegaml.runtime.experiment

Concepts

  • ExperimentBackend provides the storage layer (backend to om.models)

  • TrackingProvider provides the metrics logging API

  • TrackingProxy provides live metrics tracking in runtime tasks

Backends

class omegaml.backends.experiment.ExperimentBackend(model_store=None, data_store=None, tracking=None, **kwargs)

ExperimentBackend provides storage of tracker configurations

Usage:

To log metrics and other data:

with om.runtime.experiment('myexp') as exp:
    om.runtime.model('mymodel').fit(X, Y)
    om.runtime.model('mymodel').score(X, Y) # automatically log score result
    exp.log_metric('mymetric', value)
    exp.log_param('myparam', value)
    exp.log_artifact(X, 'X')
    exp.log_artifact(Y, 'Y')
    exp.log_artifact(om.models.metadata('mymodel'), 'mymodel')

To log data and automatically profile system data:

with om.runtime.experiment('myexp', provider='profiling') as exp:
    om.runtime.model('mymodel').fit(X, Y)
    om.runtime.model('mymodel').score(X, Y) # automatically log score result
    exp.log_metric('mymetric', value)
    exp.log_param('myparam', value)
    exp.log_artifact(X, 'X')
    exp.log_artifact(Y, 'Y')
    exp.log_artifact(om.models.metadata('mymodel'), 'mymodel')

# profiling data contains metrics for cpu, memory and disk use
data = exp.data(event='profile')

To get back experiment data without running an experiment:

# recommended way
exp = om.runtime.experiment('myexp').use()
exp_df = exp.data()

# experiments exist in the models store
exp = om.models.get('experiments/myexp')
exp_df = exp.data()

See also

KIND = 'experiment.tracker'
get(name, raw=False, data_store=None, **kwargs)

retrieve a model

Parameters:
  • name – the name of the object

  • version – the version of the object (not supported)

put(obj, name, **kwargs)

store a model

Parameters:
  • obj – the model object to be stored

  • name – the name of the object

  • attributes – attributes for meta data

classmethod supports(obj, name, **kwargs)

test if this backend supports this obj

Metrics Logging

class omegaml.backends.experiment.TrackingProvider(experiment, store=None, model_store=None)

TrackingProvider implements an abstract interface to experiment tracking

Concrete implementations like MLFlow, Sacred or Neptune.ai can be implemented based on TrackingProvider. In combination with the runtime’s OmegaTrackingProxy this provides a powerful tracking interface that scales with your needs.

How it works:

  1. Experiments created using om.runtime.experiment() are stored as instances of a TrackingProvider concrete implementation

  2. Upon retrieval of an experiment, any call to its API is proxied to the actual implementation, e.g. MLFlow

  3. On calling a model method via the runtime, e.g. om.runtime.model().fit(), the TrackingProvider information is passed on to the runtime worker, and made available as the backend.tracking property. Thus within a model backend, you can always log to the tracker by using:

    with self.tracking as exp:
        exp.log_metric() # call any TrackingProvider method
    
  4. omega-ml provides the OmegaSimpleTracker, which implements a tracking interface similar to packages like MLFlow, Sacred. See ExperimentBackend for an example.

class omegaml.backends.experiment.OmegaSimpleTracker(experiment, store=None, model_store=None)

A tracking provider that logs to an omegaml dataset

Usage:

with om.runtime.experiment(provider='default') as exp:
    ...
    exp.log_metric('accuracy', .78)
active_run()

set the lastest run as the active run

Returns:

current run (int)

data(experiment=None, run=None, event=None, step=None, key=None, raw=False)

build a dataframe of all stored data

Parameters:
  • experiment (str) – the name of the experiment, defaults to its current value

  • run (int|list) – the run(s) to get data back, defaults to current run, use ‘all’ for all

  • event (str|list) – the event(s) to include

  • step (int|list) – the step(s) to include

  • key (str|list) – the key(s) to include

  • raw (bool) – if True returns the raw data instead of a DataFrame

Returns:

  • data (DataFrame) if raw == False

  • data (list of dicts) if raw == True

log_artifact(obj, name, step=None, **extra)

log any object to the current run

Usage:

# log an artifact
exp.log_artifact(mydict, 'somedata')

# retrieve back
mydict_ = exp.restore_artifact('somedata')
Parameters:
  • obj (obj) – any object to log

  • name (str) – the name of artifact

  • step (int) – the step, if any

  • **extra – any extra data to log

Notes

  • bool, str, int, float, list, dict are stored as format=type

  • Metadata is stored as format=metadata

  • objects supported by om.models are stored as format=model

  • objects supported by om.datasets are stored as format=dataset

  • all other objects are pickled and stored as format=pickle

log_event(event, key, value, step=None, **extra)

log some event

Parameters:
  • event (str) – the event name (e.g. start, stop, metric, param)

  • key (str) – a key to relate the value (e.g. metric name)

  • value (str|float|int|bool|dict) – the actual event value

  • step (int) – the step

  • **extra – any other values to store with event

log_metric(key, value, step=None, **extra)

log a metric value

Parameters:
  • key (str) – the metric name

  • value (str|float|int|bool|dict) – the metric value

  • step (int) – the step

  • **extra – any other values to store with event

Notes

  • logged as event=metric

log_param(key, value, step=None, **extra)

log an experiment parameter

Parameters:
  • key (str) – the parameter name

  • value (str|float|int|bool|dict) – the parameter value

  • step (int) – the step

  • **extra – any other values to store with event

Notes

  • logged as event=param

log_system(key=None, value=None, step=None, **extra)

log system data

Parameters:
  • key (str) – the key to use, defaults to ‘system’

  • value (str|float|int|bool|dict) – the parameter value

  • step (int) – the step

  • **extra – any other values to store with event

Notes

  • logged as event=system

  • logs platform, python version and list of installed packages

restore_artifact(key=None, experiment=None, run=None, step=None, value=None)

restore a logged artificat

Parameters:
  • key (str) – the name of the artifact as provided in log_artifact

  • run (int) – the run for which to query, defaults to current run

  • step (int) – the step for which to query, defaults to all steps in run

  • value (dict) – this value is used instead of querying data, use to retrieve an artifact from contents of .data()

Notes

  • this will restore the artifact according to its type assigned by .log_artifact(). If the type cannot be determined, the actual data is returned

start()

start a new run

This starts a new run and logs the start event

property status

status of a run

Parameters:

run (int) – the run number, defaults to the currently active run

Returns:

status in ‘STARTED’, ‘STOPPED’

stop()

stop the current run

This stops the current run and records the stop event

use()

reuse the latest run instead of starting a new one

semantic sugar for self.active_run()

Returns:

self

class omegaml.backends.experiment.OmegaProfilingTracker(*args, **kwargs)

A metric tracker that runs a system profiler while the experiment is active

Will record profile events that contain cpu, memory and disk profilings. See BackgroundProfiler.profile() for details of the profiling metrics collected.

Usage:

To log metrics and system performance data:

with om.runtime.experiment('myexp', provider='profiling') as exp:
    ...

data = exp.data(event='profile')

Properties:

exp.profiler.interval = n.m # interval of n.m seconds to profile, defaults to 3 seconds
exp.profiler.metrics = ['cpu', 'memory', 'disk'] # all or subset of metrics to collect
exp.max_buffer = n # number of items in buffer before tracking

Notes

  • the profiling data is buffered to reduce the number of database writes, by default the data is written on every 6 profiling events (default: 6 * 10 = every 60 seconds)

  • the step reported in the tracker counts the profiling event since the start, it is not related to the step (epoch) reported by e.g. tensorflow

  • For every step there is a event=profile, key=profile_dt entry which you can use to relate profiling events to a specific wall-clock time.

  • It usually sufficient to report system metrics in intervals > 10 seconds since machine learning algorithms tend to use CPU and memory over longer periods of time.

log_profile(data)

the callback for BackgroundProfiler

start()

start a new run

This starts a new run and logs the start event

stop()

stop the current run

This stops the current run and records the stop event

class omegaml.backends.experiment.NoTrackTracker(experiment, store=None, model_store=None)

A default tracker that does not record anything

for tensorflow

class omegaml.backends.experiment.TensorflowCallback(*args, **kwargs)

A callback for Tensorflow Keras models

Implements the callback protocol according to Tensorflow Keras semantics and linking to a omegaml.backends.experiment.TrackingProvider

Runtime Integration

class omegaml.runtimes.trackingproxy.OmegaTrackingProxy(experiment=None, provider=None, runtime=None, implied_run=True)

OmegaTrackingProxy provides the runtime context for experiment tracking

Usage:

Using implied start()/stop() semantics, creating experiment runs:

with om.runtime.experiment('myexp') as exp:
    ...
    exp.log_metric('accuracy', score)

Using explicit start()/stop() semantics:

exp = om.runtime.experiment('myexp')
exp.start()
...
exp.stop()

See also

  • OmegaSimpleTracker

  • ExperimentBackend