Concepts in omega-ml Generative AI ================================== omega-ml's promise is to deliver complex AI workflows with minimal code. This is achieved by providing a set of components that can be easily combined to create powerful AI applications. The components are designed to be modular and reusable, allowing users to mix and match according to requirements. .. contents:: :local: :depth: 1 Generative AI model ------------------- A generative AI model is a type of AI model that can generate new content based on a given input, also known as a prompt. In omega-ml we can define a generative model by specifying the URL to a model provider, the model name, and the model type (embedding model, text or multi-modal model), and give these specifications a name to store in the model repository. While models can be used for generation of content or responses to user input, other models are used to create embeddings. Embeddings are numerical representations of data that can be used to compare and retrieve similar data. For example, a text embedding model can convert a piece of text into a numeric representation (a vector), which can then be used to find similar pieces of text. In omega-ml both types of models are defined in the same way, namely by specifying the URL to a model provider, the model name, and the model type. Model Provider -------------- A model provider is a service that hosts and serves AI models. omega-ml provides a transparent interface to various model providers, allowing users to easily switch between them without changing their code. This enables users to leverage the best models available for their specific use cases. Currently omega-ml supports the following model providers out of the box: *Open Source* * vLLM - a high-performance, open-source model serving framework that supports multiple backends, including Hugging Face and OpenAI models. vLLM is designed for low-latency and high-throughput inference, making it ideal for real-time applications. * LocalAI - a local model serving framework that allows users to run models on their own hardware. LocalAI is designed for users who want to have full control over their models and data, and it supports a wide range of models, including those from Hugging Face and OpenAI. * AnythingLLM - a local model serving framework that allows users to run models on their own hardware. AnythingLLM is designed for users who want to have full control over their models and data, and it supports a wide range of models, including those from Hugging Face and OpenAI. * GPT4All - a local model serving framework that allows users to run models on their own hardware. GPT4All is designed for users who want to have full control over their models and data, and it supports a wide range of models, including those from Hugging Face and OpenAI. *Commercial* * OpenAI - a commercial model provider that offers a wide range of models for various tasks, including text generation, image generation, and more. OpenAI is known for its high-quality models and ease of use. * OpenRouter - a commercial model provider that offers a wide range of models for various tasks, including text generation, image generation, and more. OpenRouter is known for its high-quality models and ease of use. * Infomaniak - a Swiss commercial model provider that offers a wide range of models for various tasks, including text generation, image generation, and more. Infomaniak is known for its high-quality models and ease of use with Swiss Hosting. * Any provider offering a OpenAI-compatible set of APIs, specifically /completions, /chat/completions and /embeddings. Pipelines and Guardrails ------------------------ Every generative AI model in omega-ml is part of a pipeline, a sequence of steps that are executed in order to process some part of the completion process. For example, the steps include prompt preprocessing, model inference, and postprocessing. Each step in the pipeline can adjust the input to the model, how the model is called, and process or modify the output after inference. A pipeline in omega-ml is simply a callable object attached to a model, that takes inputs and returns an output. For example, the pipeline implement guardrails (content or security checks) by checking the input and output to the model, and modifying it if necessary. Documents storage ----------------- Document storage is a key component of generative AI workflows, as it allows users to store and retrieve documents which provide additional context to a model when completing a user's input. Similarly to other transparent data access in omega-ml, we can define a a document storage by providing the URL to a supported database, and storing this definition in the dataset repository. For a document storage to be useful, we also need an embedding model, which is a type of a generative AI model that can convert documents into embeddings. These embeddings are then stored in the document storage, allowing for efficient retrieval and comparison of documents. Conversation history -------------------- Conversation history is a key component of generative AI workflows, as it allows users to store and retrieve the history of a conversation with a model. This is useful for maintaining context and providing a more coherent experience for the user. In omega-ml, conversation history is automatically stored in its database, and and can be retrieved and used to inform the model's responses. This is particularly useful for chatbots and other conversational agents, where maintaining context is crucial for providing relevant responses across a longer conversation. Tools ----- Tools are a key component of generative AI workflows, as they allow users to extend the functionality of their models and pipelines. A tool is some external functionality that can be provided to a model, such as a database lookup, a custom function to calculate a result, or a call to an external API. Tools can be used to augment the capabilities of a model, whereby the model can call the tool to perform some action, and then use the result of that action to inform its response to the user. In omega-ml, tools are defined as a callable object that can be attached to a model and can be called by the model during inference. Tool calls are processed as part of the pipeline in order to modify the input to the tool, replace the tool call, or modify its output. Generative AI runtime and REST services --------------------------------------- omegaml provides a customizable runtime to meet the specific resource and distribution requirements of generative AI models. For example, in a corporate environment the runtime can be configured to use a local model provider, such as vLLM or LocalAI, to ensure that data is not sent to a third-party provider. The runtime can also be configured to use autoscaling for scalability and high availability, and to automatically distribute inference requests across multiple nodes. Multi-model Repository ---------------------- It is often useful to configure multiple models, templates and pipelines to implement different use case scenarios. For example, one generative AI model can be configured to answer questions about a company's products, while another model can be configured to provide guidances on human resources policies. In omega-ml we can define multiple models, templates and pipelines in a single repository, and then use these models in different workflows. This allows users to easily switch between different models and pipelines, and to reuse them in different workflows. Monitoring ---------- omega-ml provides a built-in model tracking and monitoring system. This works the same way for all models, including generative AI models. The tracking system automatically logs all interactions with a model, including the input, output, and any metadata associated with the interaction. All interactions are stored to the model repository for later query and analysis.