Getting Started with omega|ml ============================= omega|ml is the data science integration platform that consists of a compute cluster, a highly scalable distributed NoSQL database and a web app providing a dashboard and REST API. omega|ml enables data scientists to offload all the heavy-lifting involved with machine learning and analytics workflows, while enabling third-party apps to use machine learning models in production. Deployment layout ----------------- The following setup is provided in :code:`docker-compose.yml` with all services directed via a nginx reverse proxy (the nginx service is not shown as it is not a required component): .. image:: /images/deployment.jpg * *app client* - some third party app that uses the omega|ml REST API * *data science client* - a fully fledged data science workstation that directly talks to the omega|ml compute & data cluster * *omegaweb* - the REST API and omega|ml web application * *mysql* - the MySQL database used by omegaweb * *rabbitmq* - the integration broker between omegaweb/compute cluster and data science clients/compute cluster * *runtime* - the compute cluster, consisting of a central scheduler (runtime), at least 1 worker and at least 1 mongodb master. workers and mongodbs can be scaled horizontally as required to meet performance requirements. .. note:: A single-node deployment is possible and does not require rabbitmq nor omegaweb/mysql. If the runtime is Dask Distributed, zeroMQ instead of rabbitmq is used. Both Dask Distributed and Celery Workers can be deployed to an Apache Spark Master node in which case a Spark cluster is presumed; details see below. Installation ------------ .. _kompose.io: http://kompose.io/getting-started/ We provide the omega|ml Dockerfile and docker-compose configuration to run omega|ml on a single node, a docker swarm cluster or kubernetes. This guide assumes a docker-compose single-node deployment. .. note:: To go from docker-compose to kubernetes, consider adopting the kubernetes deployments from the omega|ml :code:`docker-compose.yml` file using kompose.io_ 1. make sure you have the sources to build the omega|ml docker image, typically provided as a release file, e.g. :code:`omegaml-release-0.1.zip` 2. build the docker image:: $ mkdir -p /path/to/omegaml-release-0.1.zip $ cd /path/to/release/docker-staging $ unzip omegaml-release-.zip $ docker build -t omega|ml . 3. run docker-compose:: $ docker-compose up This will start a series of docker containers, the microservices needed to run omega|ml: * omegaml - the omega|ml web server * omjobs - the omega|ml notebook hub server * worker - the omega|ml compute cluster * mongodb - the omega|ml data cluster * mysql - the webserver's database * rabbitmq - the communication bus between web server, worker and clients * nginx - the front-end proxy to expose omegaml, rabbitmq and mongodb | .. note:: nginx is not technically required. It is included as a demonstration of one approach to exposing rabbitmq and mongodb to data science clients hosted outside of the omega|ml compute & data cluster. Exposure of rabbitmq and mongodb is not a pre-requiste to using omega|ml as data scientists can work on the cluster directly using the notebook service. 4. secure mongodb:: $ cat scripts/mongoinit.js | docker exec -i omegaml_mongodb_1 mongo MongoDB shell version v3.4.5 connecting to: mongodb://127.0.0.1:27017 MongoDB server version: 3.4.5 { "ok" : 1 } bye .. note:: You can verify this was successful by running the same command again. It will respond with code 13, *unauthorized* 5. initialize & secure omegaweb .. code:: $ docker exec -i build_omegaml_1 python manage.py loaddata landingpage.json Installed 1 object(s) from 1 fixture(s) $ docker exec -ti build_omegaml_1 python manage.py createsuperuser Username (leave blank to use 'root'): admin Email address: admin@example.com Password: Password (again): Superuser created successfully. You will need the admin user to access the admin UI at http://localhost:5000/admin/ | 6. set data science client configuration (optional) Data science clients need direct access to rabbitmq and mongodb. To this end omega|ml needs to know the externally accessible host name so that it can provide to clients the client-specific, password-protected URLs (see `Client Configuration`_). The parameters to be set are in the admin UI at http://localhost:5000/admin/constance/config: * :code:`BROKER_URL` - this is the rabbitmq broker used by the Celery cluster. Set as :code:`ampq://public-omegaml-hostname:port//`. Set vhost depending on your rabbitmq configuration. By default the vhost is an empty string * :code:`MONGO_HOST` - set as :code:`public-mongodb-hostname:port` | .. note:: If you run the omega|ml docker image using docker-compose locally, set :code:`BROKER_URL=ampq://localhost//` and :code:`MONGO_HOST=localhost`. The docker-compose configuration already exposes the rabbitmq and mongodb containers at their default ports, served through nginx. .. warning:: The default configuration does not provide network-level security as it exposes omegaweb, mongodb and rabbitmq over their native, non-encrypted tcp transports and thus is not fit for enterprise production deployment. However, mongodb, mysql and omegaweb as well as tasks executed on the Celery cluster are protected via userid/password and userid/apikey authentication thus there is no unauthorized exposure of data or models even in the default configuration. 7. access dashboard and Jupyter notebook .. code:: # dashboard open http://localhost:5000/ # notebook open http://localhost:8888/ Client Configuration -------------------- omega|ml supports two types of clients: 1. Data Science workstation - a local workstation / PC / laptop with a full-scale data science setup, ready for a Data Scientist to work locally. When ready she will deploy data and models onto the runtime (the omega|ml compute and data cluster), run models and jobs on the cluster or provide datasets for access by her colleagues. This configuration requires a local installation of omegaml, including machine learning libraries and client-side distribution components. 2. Application clients - some third-party application that access omega|ml datasets, models or jobs using omegaml's REST API. This configuration has no specific requirements other than access to the REST API and the ability to send and receive JSON documents via HTTP. Data Science workstation ++++++++++++++++++++++++ 1. Setup a conda environment including omegaml:: $ conda create -n myomegaml python=3.6 $ source activate myomega|ml $ conda install --file conda-requirements.txt $ pip install -r requirements.txt $ pip install omegaml.whl 2. Create an account with omegaml:: 1. open http://public-omegaml-hostname:port 2. sign up 3. on your account profile get the userid and apikey 3. Create a configuration file:: $ python -m omegacli init --userid --apikey --url http://omegamlhost:port This will create the :code:`$HOME/.omegaml/config.yml` file set up to work with your omega|ml account created above. 3. Launch Jupyter notebook 1. create a notebook 2. load omegaml .. code:: import omegaml as om om.datasets.list() Application client ++++++++++++++++++ 1. Create an account with omegaml:: 1. open http://omegamlhost:port 2. sign up 3. on your account profile get the userid and apikey 2. On every request to omegaml's REST API, provide the userid and apikey as the :code:`Authorization` header, as follows .. code:: Authorization: userid:apikey