omegaml.runtime.experiment¶
Concepts¶
ExperimentBackend
provides the storage layer (backend toom.models
)TrackingProvider
provides the metrics logging APITrackingProxy
provides live metrics tracking in runtime tasks
Backends¶
- class omegaml.backends.experiment.ExperimentBackend(model_store=None, data_store=None, tracking=None, **kwargs)¶
ExperimentBackend provides storage of tracker configurations
Usage:
To log metrics and other data:
with om.runtime.experiment('myexp') as exp: om.runtime.model('mymodel').fit(X, Y) om.runtime.model('mymodel').score(X, Y) # automatically log score result exp.log_metric('mymetric', value) exp.log_param('myparam', value) exp.log_artifact(X, 'X') exp.log_artifact(Y, 'Y') exp.log_artifact(om.models.metadata('mymodel'), 'mymodel')
To log data and automatically profile system data:
with om.runtime.experiment('myexp', provider='profiling') as exp: om.runtime.model('mymodel').fit(X, Y) om.runtime.model('mymodel').score(X, Y) # automatically log score result exp.log_metric('mymetric', value) exp.log_param('myparam', value) exp.log_artifact(X, 'X') exp.log_artifact(Y, 'Y') exp.log_artifact(om.models.metadata('mymodel'), 'mymodel') # profiling data contains metrics for cpu, memory and disk use data = exp.data(event='profile')
To get back experiment data without running an experiment:
# recommended way exp = om.runtime.experiment('myexp').use() exp_df = exp.data() # experiments exist in the models store exp = om.models.get('experiments/myexp') exp_df = exp.data()
See also
- KIND = 'experiment.tracker'¶
- get(name, raw=False, data_store=None, **kwargs)¶
retrieve a model
- Parameters:
name – the name of the object
version – the version of the object (not supported)
- put(obj, name, **kwargs)¶
store a model
- Parameters:
obj – the model object to be stored
name – the name of the object
attributes – attributes for meta data
- classmethod supports(obj, name, **kwargs)¶
test if this backend supports this obj
Metrics Logging¶
- class omegaml.backends.experiment.TrackingProvider(experiment, store=None, model_store=None)¶
TrackingProvider implements an abstract interface to experiment tracking
Concrete implementations like MLFlow, Sacred or Neptune.ai can be implemented based on TrackingProvider. In combination with the runtime’s OmegaTrackingProxy this provides a powerful tracking interface that scales with your needs.
How it works:
Experiments created using
om.runtime.experiment()
are stored as instances of a TrackingProvider concrete implementationUpon retrieval of an experiment, any call to its API is proxied to the actual implementation, e.g. MLFlow
On calling a model method via the runtime, e.g.
om.runtime.model().fit()
, the TrackingProvider information is passed on to the runtime worker, and made available as the backend.tracking property. Thus within a model backend, you can always log to the tracker by using:with self.tracking as exp: exp.log_metric() # call any TrackingProvider method
omega-ml provides the OmegaSimpleTracker, which implements a tracking interface similar to packages like MLFlow, Sacred. See ExperimentBackend for an example.
- class omegaml.backends.experiment.OmegaSimpleTracker(experiment, store=None, model_store=None)¶
A tracking provider that logs to an omegaml dataset
Usage:
with om.runtime.experiment(provider='default') as exp: ... exp.log_metric('accuracy', .78)
- active_run()¶
set the lastest run as the active run
- Returns:
current run (int)
- data(experiment=None, run=None, event=None, step=None, key=None, raw=False)¶
build a dataframe of all stored data
- Parameters:
experiment (str) – the name of the experiment, defaults to its current value
run (int|list) – the run(s) to get data back, defaults to current run, use ‘all’ for all
event (str|list) – the event(s) to include
step (int|list) – the step(s) to include
key (str|list) – the key(s) to include
raw (bool) – if True returns the raw data instead of a DataFrame
- Returns:
data (DataFrame) if raw == False
data (list of dicts) if raw == True
- log_artifact(obj, name, step=None, **extra)¶
log any object to the current run
Usage:
# log an artifact exp.log_artifact(mydict, 'somedata') # retrieve back mydict_ = exp.restore_artifact('somedata')
- Parameters:
obj (obj) – any object to log
name (str) – the name of artifact
step (int) – the step, if any
**extra – any extra data to log
Notes
bool, str, int, float, list, dict are stored as
format=type
Metadata is stored as
format=metadata
objects supported by
om.models
are stored asformat=model
objects supported by
om.datasets
are stored asformat=dataset
all other objects are pickled and stored as
format=pickle
- log_event(event, key, value, step=None, **extra)¶
log some event
- Parameters:
event (str) – the event name (e.g. start, stop, metric, param)
key (str) – a key to relate the value (e.g. metric name)
value (str|float|int|bool|dict) – the actual event value
step (int) – the step
**extra – any other values to store with event
- log_metric(key, value, step=None, **extra)¶
log a metric value
- Parameters:
key (str) – the metric name
value (str|float|int|bool|dict) – the metric value
step (int) – the step
**extra – any other values to store with event
Notes
logged as
event=metric
- log_param(key, value, step=None, **extra)¶
log an experiment parameter
- Parameters:
key (str) – the parameter name
value (str|float|int|bool|dict) – the parameter value
step (int) – the step
**extra – any other values to store with event
Notes
logged as
event=param
- log_system(key=None, value=None, step=None, **extra)¶
log system data
- Parameters:
key (str) – the key to use, defaults to ‘system’
value (str|float|int|bool|dict) – the parameter value
step (int) – the step
**extra – any other values to store with event
Notes
logged as
event=system
logs platform, python version and list of installed packages
- restore_artifact(key=None, experiment=None, run=None, step=None, value=None)¶
restore a logged artificat
- Parameters:
key (str) – the name of the artifact as provided in log_artifact
run (int) – the run for which to query, defaults to current run
step (int) – the step for which to query, defaults to all steps in run
value (dict) – this value is used instead of querying data, use to retrieve an artifact from contents of
.data()
Notes
this will restore the artifact according to its type assigned by
.log_artifact()
. If the type cannot be determined, the actual data is returned
- start()¶
start a new run
This starts a new run and logs the start event
- property status¶
status of a run
- Parameters:
run (int) – the run number, defaults to the currently active run
- Returns:
status in ‘STARTED’, ‘STOPPED’
- stop()¶
stop the current run
This stops the current run and records the stop event
- use()¶
reuse the latest run instead of starting a new one
semantic sugar for self.active_run()
- Returns:
self
- class omegaml.backends.experiment.OmegaProfilingTracker(*args, **kwargs)¶
A metric tracker that runs a system profiler while the experiment is active
Will record
profile
events that contain cpu, memory and disk profilings. See BackgroundProfiler.profile() for details of the profiling metrics collected.Usage:
To log metrics and system performance data:
with om.runtime.experiment('myexp', provider='profiling') as exp: ... data = exp.data(event='profile')
Properties:
exp.profiler.interval = n.m # interval of n.m seconds to profile, defaults to 3 seconds exp.profiler.metrics = ['cpu', 'memory', 'disk'] # all or subset of metrics to collect exp.max_buffer = n # number of items in buffer before tracking
Notes
the profiling data is buffered to reduce the number of database writes, by default the data is written on every 6 profiling events (default: 6 * 10 = every 60 seconds)
the step reported in the tracker counts the profiling event since the start, it is not related to the step (epoch) reported by e.g. tensorflow
For every step there is a
event=profile
,key=profile_dt
entry which you can use to relate profiling events to a specific wall-clock time.It usually sufficient to report system metrics in intervals > 10 seconds since machine learning algorithms tend to use CPU and memory over longer periods of time.
- log_profile(data)¶
the callback for BackgroundProfiler
- start()¶
start a new run
This starts a new run and logs the start event
- stop()¶
stop the current run
This stops the current run and records the stop event
- class omegaml.backends.experiment.NoTrackTracker(experiment, store=None, model_store=None)¶
A default tracker that does not record anything
for tensorflow
- class omegaml.backends.experiment.TensorflowCallback(*args, **kwargs)¶
A callback for Tensorflow Keras models
Implements the callback protocol according to Tensorflow Keras semantics and linking to a
omegaml.backends.experiment.TrackingProvider
See also
Runtime Integration¶
- class omegaml.runtimes.trackingproxy.OmegaTrackingProxy(experiment=None, provider=None, runtime=None, implied_run=True)¶
OmegaTrackingProxy provides the runtime context for experiment tracking
Usage:
Using implied start()/stop() semantics, creating experiment runs:
with om.runtime.experiment('myexp') as exp: ... exp.log_metric('accuracy', score)
Using explicit start()/stop() semantics:
exp = om.runtime.experiment('myexp') exp.start() ... exp.stop()
See also
OmegaSimpleTracker
ExperimentBackend