Getting Started with omega|ml

omega|ml is the data science integration platform that consists of a compute cluster, a highly scalable distributed NoSQL database and a web app providing a dashboard and REST API. omega|ml enables data scientists to offload all the heavy-lifting involved with machine learning and analytics workflows, while enabling third-party apps to use machine learning models in production.

Deployment layout

The following setup is provided in docker-compose.yml with all services directed via a nginx reverse proxy (the nginx service is not shown as it is not a required component):

../_images/deployment.jpg
  • app client - some third party app that uses the omega|ml REST API

  • data science client - a fully fledged data science workstation that directly talks to the omega|ml compute & data cluster

  • omegaweb - the REST API and omega|ml web application

  • mysql - the MySQL database used by omegaweb

  • rabbitmq - the integration broker between omegaweb/compute cluster and data science clients/compute cluster

  • runtime - the compute cluster, consisting of a central scheduler (runtime), at least 1 worker and at least 1 mongodb master. workers and mongodbs can be scaled horizontally as required to meet performance requirements.

Note

A single-node deployment is possible and does not require rabbitmq nor omegaweb/mysql.

If the runtime is Dask Distributed, zeroMQ instead of rabbitmq is used.

Both Dask Distributed and Celery Workers can be deployed to an Apache Spark Master node in which case a Spark cluster is presumed; details see below.

Installation

We provide the omega|ml Dockerfile and docker-compose configuration to run omega|ml on a single node, a docker swarm cluster or kubernetes. This guide assumes a docker-compose single-node deployment.

Note

To go from docker-compose to kubernetes, consider adopting the kubernetes deployments from the omega|ml docker-compose.yml file using kompose.io

  1. make sure you have the sources to build the omega|ml docker image, typically provided as a release file, e.g. omegaml-release-0.1.zip

  2. build the docker image:

    $ mkdir -p /path/to/omegaml-release-0.1.zip
    $ cd /path/to/release/docker-staging
    $ unzip omegaml-release-<version>.zip
    $ docker build -t omega|ml .
    
  3. run docker-compose:

    $ docker-compose up
    

    This will start a series of docker containers, the microservices needed to run omega|ml:

    • omegaml - the omega|ml web server

    • omjobs - the omega|ml notebook hub server

    • worker - the omega|ml compute cluster

    • mongodb - the omega|ml data cluster

    • mysql - the webserver’s database

    • rabbitmq - the communication bus between web server, worker and clients

    • nginx - the front-end proxy to expose omegaml, rabbitmq and mongodb


Note

nginx is not technically required. It is included as a demonstration of one approach to exposing rabbitmq and mongodb to data science clients hosted outside of the omega|ml compute & data cluster.

Exposure of rabbitmq and mongodb is not a pre-requiste to using omega|ml as data scientists can work on the cluster directly using the notebook service.

  1. secure mongodb:

    $ cat scripts/mongoinit.js | docker exec -i omegaml_mongodb_1 mongo
    MongoDB shell version v3.4.5
    connecting to: mongodb://127.0.0.1:27017
    MongoDB server version: 3.4.5
    { "ok" : 1 }
    bye
    

    Note

    You can verify this was successful by running the same command again. It will respond with code 13, unauthorized

  2. initialize & secure omegaweb

    $ docker exec -i build_omegaml_1 python manage.py loaddata landingpage.json
    Installed 1 object(s) from 1 fixture(s)
    
    $ docker exec -ti build_omegaml_1 python manage.py createsuperuser
      Username (leave blank to use 'root'): admin
      Email address: admin@example.com
      Password:
      Password (again):
      Superuser created successfully.
    

    You will need the admin user to access the admin UI at http://localhost:5000/admin/


  1. set data science client configuration (optional)

    Data science clients need direct access to rabbitmq and mongodb. To this end omega|ml needs to know the externally accessible host name so that it can provide to clients the client-specific, password-protected URLs (see Client Configuration).

    The parameters to be set are in the admin UI at http://localhost:5000/admin/constance/config:

    • BROKER_URL - this is the rabbitmq broker used by the Celery cluster. Set as ampq://public-omegaml-hostname:port/<vhost>/. Set vhost depending on your rabbitmq configuration. By default the vhost is an empty string

    • MONGO_HOST - set as public-mongodb-hostname:port


Note

If you run the omega|ml docker image using docker-compose locally, set BROKER_URL=ampq://localhost// and MONGO_HOST=localhost. The docker-compose configuration already exposes the rabbitmq and mongodb containers at their default ports, served through nginx.

Warning

The default configuration does not provide network-level security as it exposes omegaweb, mongodb and rabbitmq over their native, non-encrypted tcp transports and thus is not fit for enterprise production deployment.

However, mongodb, mysql and omegaweb as well as tasks executed on the Celery cluster are protected via userid/password and userid/apikey authentication thus there is no unauthorized exposure of data or models even in the default configuration.

  1. access dashboard and Jupyter notebook

    # dashboard
    open http://localhost:5000/
    
    # notebook
    open http://localhost:8888/
    

Client Configuration

omega|ml supports two types of clients:

  1. Data Science workstation - a local workstation / PC / laptop with a full-scale data science setup, ready for a Data Scientist to work locally. When ready she will deploy data and models onto the runtime (the omega|ml compute and data cluster), run models and jobs on the cluster or provide datasets for access by her colleagues. This configuration requires a local installation of omegaml, including machine learning libraries and client-side distribution components.

  2. Application clients - some third-party application that access omega|ml datasets, models or jobs using omegaml’s REST API. This configuration has no specific requirements other than access to the REST API and the ability to send and receive JSON documents via HTTP.

Data Science workstation

  1. Setup a conda environment including omegaml:

    $ conda create -n myomegaml python=3.6
    $ source activate myomega|ml
    $ conda install --file conda-requirements.txt
    $ pip install -r requirements.txt
    $ pip install omegaml.whl
    
  2. Create an account with omegaml:

    1. open http://public-omegaml-hostname:port

    2. sign up

    3. on your account profile get the userid and apikey

  3. Create a configuration file:

    $ python -m omegacli init --userid <userid> --apikey <key> --url http://omegamlhost:port
    

    This will create the $HOME/.omegaml/config.yml file set up to work with your omega|ml account created above.

  1. Launch Jupyter notebook

    1. create a notebook

    2. load omegaml

      import omegaml as om
      om.datasets.list()
      

Application client

  1. Create an account with omegaml:

    1. open http://omegamlhost:port

    2. sign up

    3. on your account profile get the userid and apikey

  2. On every request to omegaml’s REST API, provide the userid and apikey as the Authorization header, as follows

    Authorization: userid:apikey