Manually deploy a serverless inference endpoint¶

Scalable AI model inference is handled by the AI4EOSC Inference platform, powered by the OSCAR open-source serverless platform.

An OSCAR cluster consists of, among other components:

a Kubernetes cluster than can optionally auto-scale, in terms of number of nodes, within a certain boundaries.
configured with MinIO, a high-performance object storage system, so that file uploads to a MinIO bucket can trigger the invocation of an OSCAR service to perform AI model inference.
configured with Knative, a FaaS platform, so that synchronous requests to an OSCAR service are handled via dynamically provisioned pods (containers) in the Kubenetes cluster.

User can manage OSCAR services via an User Interface (UI) or via the command-line interface (CLI).

Where can I access the OSCAR UI?

The Inference platform consists of a pre-deployed OSCAR cluster exclusively accessible for fully authenticated users. Depending on your project, you should use a specific UI.

AI4EOSC UI (OSCAR) 👈 (use this one if you are unsure which one to use)
iMagine UI (OSCAR)
AI4Life UI (OSCAR)

Warning

This cluster is provided for testing purposes and OSCAR services may be removed at any time depending on the underlying infrastructure capacity and usage rates. Should this happen, you can easily re-deploy the services from the corresponding FDL file.

1. Configuring an OSCAR service¶

The cluster is used to deploy OSCAR services, which are described by a Functions Definition Language (FDL) file which specifies (among other features):

The Docker image, which includes the AI model that supports the DEEPaaS API and all the required libraries and data to perform the inference.
The computing requirements (CPUs, RAM, GPUs, etc.).
The shell-script to be executed inside the container created out of the Docker image for each service invocation.
(Optional) The link to a MinIO bucket and an input folder.

2. Invoking an OSCAR service¶

OSCAR services can be invoked (see Invoking services for further details):

Asynchronously, by uploading files to a MinIO bucket to trigger the OSCAR service upon file uploads.
Synchronously, by invoking the service from OSCAR CLI or via the OSCAR Manager’s REST API. A certain number of pre-deployed containers can be kept up and running to mitigate the cold start problem (initial delays when performing the first invocations to the service).
Through Exposed Services, where stateless services created out of large containers require too much time to be started to process a service invocation. This is the case when supporting the fast inference of pre-trained AI models that require close to real-time processing with high throughput. In a traditional serverless approach, the AI model weights would be loaded in memory for each service invocation (thus creating a new container). With this approach AI model weights could be loaded just once and the service would perform the AI model inference for each subsequent request. An auto-scaled load-balanced approach for these stateless services is supported.

3. More info and examples¶

Official OSCAR documentation
Examples to deploy AI models with DEEPaaS support, including:
- The deployment of the Body pose detection AI model from the platform Marketplace is documented in the body-pose-detection folder, used to perform asynchronous invocations via MinIO.
- The deployment of the Plants Species Classifier AI model from the platform Marketplace is documented in the plant-classification-sync folder, used to perform synchronous invocations.