Logging runtime execution¶
How it works¶
When running tasks in the cloud it can be difficult to get access to the technical log messages issued by the runtime or the model framework. omega-ml provides straight access to logs that are created while a task is run. This works independent of where the task is run (e.g. locally, on a CPU or a GPU node, on-premise or in a cloud-provider etc.)
In [1]: # enable logging
om.runtime.mode(logging=True)
# run a task
om.runtime.ping(fox='bar')
# access the log
om.logger.dataset.get()
Out[2]:
level levelno logger msg text hostname created
0 SYSTEM 999 system log init log init eowyn 2021-11-22 17:15:10.234
1 INFO 4 simple omega log: running ping task {'message': 'ping... 2021-11-22 17:15:10.410605 INFO omega log: run... eowyn 2021-11-22 17:15:10.410
2 INFO 20 celery python log: running ping task {'message': 'pin... 2021-11-22 18:15:10,427 - celery - INFO - pyth... eowyn 2021-11-22 17:15:10.427
3 INFO 20 omegaml package log: running ping task {'message': 'pi... 2021-11-22 18:15:10,428 - omegaml - INFO - pac... eowyn 2021-11-22 17:15:10.428
4 INFO 20 root print log: running ping task {'message': 'ping... 2021-11-22 18:15:10,428 - root - INFO - print ... eowyn 2021-11-22 17:15:10.428
Logging in scripts or jobs¶
Use logging in your own scripts or notebooks to log arbitrary information.
In [1]: # in your notebook
logger = om.logger.getLogger('mylogger')
logger.info('some message')
In [2]: # access the log on any omegaml client
om.logger.dataset.get(logger='mylogger')
Out [3]:
0 INFO 4 mylogger some message 2021-11-22 17:20:21.691137 INFO some message eowyn 2021-11-22 17:20:21.691
Profiling execution¶
We can use omega-ml tracking to measure any code execution on the runtime. Say we want to understand the CPU load the following code generates:
# notebook testcpu
import numpy as np
import omegaml as om
A = np.random.rand(10000, 10000)
B = np.random.rand(10000, 10000)
C = np.matmul(A, B)
We can profile the cpu and memory by wrapping the call to om.runtime as
an experiment using the profiling
tracking provider:
with om.runtime.experiment('perf', provider='profiling') as exp:
om.runtime.job('testcpu').run().get()
This instructs the runtime worker, locally or on a remote machine, to run the code and collect metrics on cpu, memory and disk usage. Once the job is finished, the metrics can be accessed by getting back the data as follows. Note that this tracks the data every time your code is run so it is easy to compare different runs.
exp.data()
=>
experiment run event dt node step key value
0 perf 1 start 2021-11-25T10:11:23.538108 eowyn NaN NaN NaN
1 perf 1 system 2021-11-25T10:11:23.601244 eowyn NaN system {'platform': {'system': 'Linux', 'node': 'eowy...
2 perf 1 task_call 2021-11-25T10:11:23.870269 eowyn NaN omegaml.tasks.omega_ping {'args': [], 'kwargs': {'__experiment': 'perf'...
3 perf 1 task_success 2021-11-25T10:11:23.894004 eowyn NaN omegaml.tasks.omega_ping {'result': {'message': 'ping return message', ...
4 perf 1 profile 2021-11-25T10:11:23.900073 eowyn 0.0 profile_dt 2021-11-25T10:11:23.532387
5 perf 1 profile 2021-11-25T10:11:23.910524 eowyn 0.0 memory_load 27.5
6 perf 1 profile 2021-11-25T10:11:23.918407 eowyn 0.0 memory_total 33542479872
7 perf 1 profile 2021-11-25T10:11:23.926501 eowyn 0.0 cpu_load [13.1, 11.2, 10.3, 17.3]
8 perf 1 profile 2021-11-25T10:11:23.934526 eowyn 0.0 cpu_count 4
9 perf 1 profile 2021-11-25T10:11:23.942705 eowyn 0.0 cpu_freq [1.785, 2.642, 2.414, 1.846]
10 perf 1 profile 2021-11-25T10:11:23.950802 eowyn 0.0 cpu_avg [0.1425, 0.4225, 0.33]
11 perf 1 profile 2021-11-25T10:11:23.958760 eowyn 0.0 disk_use 97.2
12 perf 1 profile 2021-11-25T10:11:23.966861 eowyn 0.0 disk_total 502468108288
13 perf 1 stop 2021-11-25T10:11:23.974942 eowyn NaN NaN NaN