Capturing model metrics
=======================

.. contents::


omega-ml provides experiment and model tracking for all models using its
built-in metrics store.

Running an experiment
---------------------

Collecting metrics as part of an experiment is straight forward:

.. code:: python

    lr = LogisticRegression()
    X, Y = ...

    with om.runtime.experiment('myexp') as exp:
        lr.fit(X, Y)
        score = lr.score(X, Y)
        exp.log_metric('accuracy', score)
        exp.log_param('penalty', 'L2')
        exp.log_artifact(lr, 'mymodel')

We can get back the data collected in an experiment using :code:`exp.data()`,
as a DataFrame:

.. code:: python

        In [1]: exp.data()
        Out[2]:
              experiment  run     event                          dt   node  step       key                                              value
        0      myexp    1     start  2021-11-22T15:49:51.893920  eowyn   NaN       NaN                                                NaN
        1      myexp    1    system  2021-11-22T15:49:51.927902  eowyn   NaN    system  {'platform': {'system': 'Linux', 'node': 'eowy...
        2      myexp    1    metric  2021-11-22T15:49:51.938601  eowyn   NaN  accuracy                                                  1
        3      myexp    1     param  2021-11-22T15:49:51.950340  eowyn   NaN  penaltiy                                                 L2
        4      myexp    1  artifact  2021-11-22T15:49:51.984030  eowyn   NaN   mymodel  {'name': 'mymodel', 'data': 'experiments/.arte...
        5      myexp    1      stop  2021-11-22T15:49:51.994113  eowyn   NaN       NaN                                                NaN

Note the :code:`run` column records the number of times the above :code:`with`
block has been run. If you run it again, there is a second set of metrics. We can
get back all the runs by filtering as :code:`exp.data(run='all')`, or a specific set
of runs by giving a list or a tuple :code:`exp.data(run=(1,3))`. Additional
filters are available for the :code:`event, node, step, key` fields

.. code:: python

    In [3]: exp.data(run='all', key='accuracy')
    Out[4]:
      experiment  run  step   event       key  value                          dt   node
    0      myexp    1  None  metric  accuracy      1  2021-11-22T15:49:51.938601  eowyn
    1      myexp    2  None  metric  accuracy      1  2021-11-22T16:02:08.579077  eowyn
    2      myexp    3  None  metric  accuracy      1  2021-11-22T16:02:13.048647  eowyn


Tracking experiments with multiple steps
----------------------------------------

To run an experiment that uses a series of parameters or a k-fold of input data
such as in cross-validation, we can track each step separately. In this case there
the data will be recorded for one run (e.g. :code:`run=1`) and many steps.

.. code:: python

    shuffle = ShuffleSplit(n_splits=5)
    with om.runtime.experiment('myexp') as exp:
        for step, split in enumerate(shuffle.split(X))
            Xs, Ys = X[split[0]], Y[split[1]]
            lr.fit(Xs, Ys)
            score = lr.score(X, Y, step=step)
            exp.log_metric('accuracy', score, step=step)
            exp.log_param('penalty', 'L2', step=step)
            exp.log_artifact(lr, 'mymodel', step=step)


If the ML framework provides model callbacks, such as Tensorflow, the model can
be fit using :code:`exp.tensorflow_callback()`. In this case, the model itself
will provide model metrics via the callback:

.. code:: python

    model = Sequential()
    ...
    model.compile(metrics=['accuracy'])
    with om.runtime.experiment('myexp') as exp:
        model.fit(X, Y,
                  callbacks=[exp.tensorflow_callback()])


Customizing tracking behavior
-----------------------------

Tracking behavior can be adjusted by using a different tracking provider,
e.g. the :code:`SimpleTrackingProvider` logs model metrics, while the
:code:`OmegaProfilingTracker` also logs system resource usage like
CPU and RAM while running the experiment. Write your own tracking providers
to forward metrics to a third-party metrics store, or to provide custom
callbacks to your machine learning framework.

The specific tracking provider used is specified as the :code:`provider=`
argument when creating the experiment. For example, the 'profiling' provider
will track system metrics during execution:

.. code::

    In [1]: with om.runtime.experiment('myexp2',
                                        provider='profiling') as exp:
                 ...
    Out[2]: exp.data()
           experiment  run     event                          dt   node  step           key                                              value
        0      myexp2    1     start  2021-11-22T16:53:28.534211  eowyn   NaN           NaN                                                NaN
        1      myexp2    1    system  2021-11-22T16:53:28.579121  eowyn   NaN        system  {'platform': {'system': 'Linux', 'node': 'eowy...
        2      myexp2    1    metric  2021-11-22T16:53:28.592081  eowyn   NaN      accuracy                                                  1
        3      myexp2    1     param  2021-11-22T16:53:28.600690  eowyn   NaN      penaltiy                                                 L2
        4      myexp2    1  artifact  2021-11-22T16:53:28.627970  eowyn   NaN       mymodel  {'name': 'mymodel', 'data': 'experiments/.arte...
        5      myexp2    1   profile  2021-11-22T16:53:28.635717  eowyn   0.0    profile_dt                         2021-11-22T16:53:28.531654
        6      myexp2    1   profile  2021-11-22T16:53:28.643665  eowyn   0.0   memory_load                                               22.4
        7      myexp2    1   profile  2021-11-22T16:53:28.651388  eowyn   0.0  memory_total                                        33542479872
        8      myexp2    1   profile  2021-11-22T16:53:28.658964  eowyn   0.0      cpu_load                           [25.9, 27.1, 27.6, 28.6]
        9      myexp2    1   profile  2021-11-22T16:53:28.666597  eowyn   0.0     cpu_count                                                  4
        10     myexp2    1   profile  2021-11-22T16:53:28.673986  eowyn   0.0      cpu_freq                       [0.833, 1.728, 2.228, 1.736]
        11     myexp2    1   profile  2021-11-22T16:53:28.681591  eowyn   0.0       cpu_avg                            [0.215, 0.6825, 0.6925]
        12     myexp2    1   profile  2021-11-22T16:53:28.688981  eowyn   0.0      disk_use                                               95.6
        13     myexp2    1   profile  2021-11-22T16:53:28.697768  eowyn   0.0    disk_total                                       502468108288
        14     myexp2    1      stop  2021-11-22T16:53:28.705661  eowyn   NaN           NaN                                                NaN


The following tracking providers are available:

* :code:`default` - the default tracker, :code:`OmegaSimpleTracker`
* :code:`profiling` - the profiling tracker, :code:`OmegaProfilingTracker`
* :code:`notrack` - the no-operation tracker, :code:`NoTrackTracker`. Use
  this to disable tracking.