omegaml.runtime.experiment¶
Concepts¶
ExperimentBackend
provides the storage layer (backend toom.models
)TrackingProvider
provides the metrics logging APITrackingProxy
provides live metrics tracking in runtime tasks
Backends¶
- class omegaml.backends.experiment.ExperimentBackend(model_store=None, data_store=None, tracking=None, **kwargs)¶
ExperimentBackend provides storage of tracker configurations
Usage:
To log metrics and other data:
with om.runtime.experiment('myexp') as exp: om.runtime.model('mymodel').fit(X, Y) om.runtime.model('mymodel').score(X, Y) # automatically log score result exp.log_metric('mymetric', value) exp.log_param('myparam', value) exp.log_artifact(X, 'X') exp.log_artifact(Y, 'Y') exp.log_artifact(om.models.metadata('mymodel'), 'mymodel')
To log data and automatically profile system data:
with om.runtime.experiment('myexp', provider='profiling') as exp: om.runtime.model('mymodel').fit(X, Y) om.runtime.model('mymodel').score(X, Y) # automatically log score result exp.log_metric('mymetric', value) exp.log_param('myparam', value) exp.log_artifact(X, 'X') exp.log_artifact(Y, 'Y') exp.log_artifact(om.models.metadata('mymodel'), 'mymodel') # profiling data contains metrics for cpu, memory and disk use data = exp.data(event='profile')
To get back experiment data without running an experiment:
# recommended way exp = om.runtime.experiment('myexp').use() exp_df = exp.data() # experiments exist in the models store exp = om.models.get('experiments/myexp') exp_df = exp.data()
See also
omegaml.backends.tracking.OmegaSimpleTracker
omegaml.backends.tracking.OmegaProfilingTracker
- KIND = 'experiment.tracker'¶
- get(name, raw=False, data_store=None, **kwargs)¶
retrieve a model
- Parameters:
name – the name of the object
version – the version of the object (not supported)
- put(obj, name, **kwargs)¶
store a model
- Parameters:
obj – the model object to be stored
name – the name of the object
attributes – attributes for meta data
- classmethod supports(obj, name, **kwargs)¶
test if this backend supports this obj
Metrics Logging¶
- class omegaml.backends.experiment.TrackingProvider(experiment, store=None, model_store=None, autotrack=False)¶
TrackingProvider implements an abstract interface to experiment tracking
Concrete implementations like MLFlow, Sacred or Neptune.ai can be implemented based on TrackingProvider. In combination with the runtime’s OmegaTrackingProxy this provides a powerful tracking interface that scales with your needs.
How it works:
Experiments created using
om.runtime.experiment()
are stored as instances of a TrackingProvider concrete implementationUpon retrieval of an experiment, any call to its API is proxied to the actual implementation, e.g. MLFlow
On calling a model method via the runtime, e.g.
om.runtime.model().fit()
, the TrackingProvider information is passed on to the runtime worker, and made available as the backend.tracking property. Thus within a model backend, you can always log to the tracker by using:with self.tracking as exp: exp.log_metric() # call any TrackingProvider method
omega-ml provides the OmegaSimpleTracker, which implements a tracking interface similar to packages like MLFlow, Sacred. See ExperimentBackend for an example.
- as_monitor(obj, alerts=None, schedule=None, store=None, provider=None, **kwargs)¶
Return and attach a drift monitor to this experiment
- Parameters:
obj (str) – the name of the model
alerts (list) – a list of alert definitions. Each alert definition is a dict with keys ‘event’, ‘recipients’. ‘event’ is the event to get from the tracking log, ‘recipients’ is a list of recipients (e.g. email address, notification channel)
schedule (str) – the job scheduling interval for the monitoring job, as used in om.jobs.schedule() when the job is created
store (OmegaStore) – the store to use, defaults to self._model_store
provider (str) – the name of the monitoring provider, defaults to store.prefix
- Returns:
a drift monitor for the object
- Return type:
monitor (DriftMonitor)
- track(obj, store=None, label=None, monitor=False, monitor_kwargs=None, **kwargs)¶
attach this experiment to the named object
Usage:
# use in experiment context with om.runtime.experiment(‘myexp’) as exp:
exp.track(‘mymodel’)
# use on experiment directly exp = om.runtime.experiment(‘myexp’) exp.track(‘mymodel’)
- Parameters:
obj (str) – the name of the object
store (OmegaStore) – optional, om.models, om.scripts, om.jobs. If not provided will use om.models
label (str) – optional, the label of the worker, default is ‘default’
monitor (bool|str) – optional, truthy sets up a monitor to track drift in the object, if a string is provided it is used as the monitoring provider
monitor_kwargs (dict) – optional, additional keyword arguments to pass to the monitor
Note
This modifies the object’s metadata.attributes:
{ 'tracking': { label: self._experiment } }
If monitor is set, a monitor definition is added to the object’s metadata:
{ 'tracking': { 'monitors': [ { 'experiment': self._experiment, 'provider': monitor } ] } }
- class omegaml.backends.experiment.OmegaSimpleTracker(*args, **kwargs)¶
A tracking provider that logs to an omegaml dataset
Usage:
with om.runtime.experiment(provider='default') as exp: ... exp.log_metric('accuracy', .78)
Changed in version 0.17: any extra
- active_run(run=None)¶
set the lastest run as the active run
- Parameters:
run (int|str) – optional or unique task id, if None the latest active run will be set, or a new run is created if no active run exists.
- Returns:
current run (int)
- clear(force=False)¶
clear all data
All data is removed from the experiment’s dataset. This is not recoverable.
- Parameters:
force (bool) – if True, clears all data, otherwise raises an error
Caution
this will clear all data and is not recoverable
- Raises:
AssertionError – if force is not True
Added in version 0.16.2.
- data(experiment=None, run=None, event=None, step=None, key=None, raw=False, lazy=False, since=None, batchsize=None, **extra)¶
build a dataframe of all stored data
- Parameters:
experiment (str|list) – the name of the experiment, defaults to its current value
run (int|list|str) – the run(s) to get data back, defaults to current run, use ‘all’ for all, 1-indexed since first run, or -1 indexed from latest run, can combine both. If run < 0 would go before the first run, run 1 will be returned.
event (str|list) – the event(s) to include
step (int|list) – the step(s) to include
key (str|list) – the key(s) to include
raw (bool) – if True returns the raw data instead of a DataFrame
lazy (bool) – if True returns the Cursor instead of data, ignores raw
since (datetime) – only return data since this date. If both since and run are specified, run is ignored and all runs since the date are returned
batchsize (int) – if specified, returns a generator yielding data in batches of batchsize, note that raw is respected, i.e. raw=False yields a DataFrame for every batch, raw=True yields a list of dicts
- Returns:
data (DataFrame) if raw == False
data (list of dicts) if raw == True
None if no data exists
- For lazy == True, no batchsize, regardless of raw:
data (Cursor) for any value of raw
For lazy == True, with batchsize: * data(generator of list[dict]) if raw = True * data(generator of DataFrame) if raw = False
- Return type:
For lazy == False
Changed in version 0.16.2: run supports negative indexing
Changed in version 0.17.
added batchsize
Changed in version 0.17: enabled the use of run=’*’ to retrieve all runs, equivalent of run=’all’
- log_artifact(obj, name, step=None, dt=None, event=None, key=None, **extra)¶
log any object to the current run
Usage:
# log an artifact exp.log_artifact(mydict, 'somedata') # retrieve back mydict_ = exp.restore_artifact('somedata')
- Parameters:
obj (obj) – any object to log
name (str) – the name of artifact
step (int) – the step, if any
**extra – any extra data to log
Notes
bool, str, int, float, list, dict are stored as
format=type
Metadata is stored as
format=metadata
objects supported by
om.models
are stored asformat=model
objects supported by
om.datasets
are stored asformat=dataset
all other objects are pickled and stored as
format=pickle
- log_data(key, value, step=None, dt=None, event=None, **extra)¶
log x/y data for model predictions
This is semantic sugar for log_artifact() using the ‘data’ event.
- Parameters:
key (str) – the name of the artifact
value (any) – the x/y data
step (int) – the step
dt (datetime) – the datetime
event (str) – the event, defaults to ‘data’
**extra – any other values to store with event
- Returns:
None
- log_extra(remove=False, **kwargs)¶
add additional log information for every subsequent logging call
- Parameters:
remove (bool) – if True, removes the extra log information
kwargs – any key-value pairs to log
- log_metric(key, value, step=None, dt=None, **extra)¶
log a metric value
- Parameters:
key (str) – the metric name
value (str|float|int|bool|dict) – the metric value
step (int) – the step
**extra – any other values to store with event
Notes
logged as
event=metric
- log_param(key, value, step=None, dt=None, **extra)¶
log an experiment parameter
- Parameters:
key (str) – the parameter name
value (str|float|int|bool|dict) – the parameter value
step (int) – the step
**extra – any other values to store with event
Notes
logged as
event=param
- log_system(key=None, value=None, step=None, dt=None, **extra)¶
log system data
- Parameters:
key (str) – the key to use, defaults to ‘system’
value (str|float|int|bool|dict) – the parameter value
step (int) – the step
**extra – any other values to store with event
Notes
logged as
event=system
logs platform, python version and list of installed packages
- restore_artifact(*args, **kwargs)¶
restore a specific logged artifact
Changed in version 0.17: deprecated, use exp.restore_artifacts() instead
- restore_artifacts(key=None, experiment=None, run=None, since=None, step=None, value=None, event=None, name=None)¶
restore logged artifacts
- Parameters:
key (str) – the name of the artifact as provided in log_artifact
run (int) – the run for which to query, defaults to current run
since (datetime) – only return data since this date
step (int) – the step for which to query, defaults to all steps in run
value (dict|list) – dict or list of dict, this value is used instead of querying data, use to retrieve an artifact from contents of
.data()
- Returns:
list of restored objects
Notes
this will restore the artifact according to its type assigned by
.log_artifact()
. If the type cannot be determined, the actual data is returned
- Updates:
since 0.17: return list of objects instead of last object
- restore_data(key, run=None, event=None, since=None, concat=True, **extra)¶
restore x/y data for model predictions
This is semantic sugar for restore_artifacts() using the event=’data’ event.
- Parameters:
key (str) – the name of the artifact
run (int) – the run for which to query, defaults to current run
event (str) – the event, defaults to ‘data’
since (datetime) – only return data since this date
concat (bool) – if True, concatenates the data into a single object, in this case all data must be of the same type. Defaults to True.
**extra – any other values to store with event
- Returns:
list of restored objects
- start(run=None)¶
start a new run
This starts a new run and logs the start event
- status(run=None)¶
status of a run
- Parameters:
run (int) – the run number, defaults to the currently active run
- Returns:
status in ‘STARTED’, ‘STOPPED’
- stop()¶
stop the current run
This stops the current run and records the stop event
- use(run=None)¶
reuse the latest run instead of starting a new one
semantic sugar for self.active_run()
- Returns:
self
- class omegaml.backends.experiment.OmegaProfilingTracker(*args, **kwargs)¶
A metric tracker that runs a system profiler while the experiment is active
Will record
profile
events that contain cpu, memory and disk profilings. See BackgroundProfiler.profile() for details of the profiling metrics collected.Usage:
To log metrics and system performance data:
with om.runtime.experiment('myexp', provider='profiling') as exp: ... data = exp.data(event='profile')
Properties:
exp.profiler.interval = n.m # interval of n.m seconds to profile, defaults to 3 seconds exp.profiler.metrics = ['cpu', 'memory', 'disk'] # all or subset of metrics to collect exp.max_buffer = n # number of items in buffer before tracking
Notes
the profiling data is buffered to reduce the number of database writes, by default the data is written on every 6 profiling events (default: 6 * 10 = every 60 seconds)
the step reported in the tracker counts the profiling event since the start, it is not related to the step (epoch) reported by e.g. tensorflow
For every step there is a
event=profile
,key=profile_dt
entry which you can use to relate profiling events to a specific wall-clock time.It usually sufficient to report system metrics in intervals > 10 seconds since machine learning algorithms tend to use CPU and memory over longer periods of time.
- log_profile(data)¶
the callback for BackgroundProfiler
- class omegaml.backends.experiment.NoTrackTracker(experiment, store=None, model_store=None, autotrack=False)¶
A default tracker that does not record anything
for tensorflow
- class omegaml.backends.experiment.TensorflowCallback(*args, **kwargs)¶