Onboarding Algorithms to the Information Factory

The Information Factory Algorithm Hosting Capability enables value-adders to onboard custom algorithms that integrate seamlessly with services like the Data Lake, Dynamic Data Cube services, and other auxiliary resources.

This document provides guidance for algorithm owners to onboard their algorithms for on-demand execution and deliver results to end-users via the Information Factory platform.

Overview

The Information Factory supports both long-running applications and on-demand processes for processing environmental data (EO, meteorological, etc.). Algorithms can be executed with user-defined input parameters^.

As an algorithm owner, you can:

  1. Manage results independently, e.g. upload execution results to the Data Lake.

  2. Delegate result handling to the platform by writing outputs to the result-data directory for automatic transfer.

Algorithms must run in Docker containers, with all customization (e.g., AOI, time range, thresholds) passed via environment variables. The Kubernetes runtime enforces best practices, such as non-root execution, which will be clarified during onboarding.

Architecture

Your algorithm executes in an isolated environment with access to:

  • User-provided execution parameters.

  • Platform and custom credentials injected into your Docker process.

You may use additional libraries or third-party APIs, adhering to relevant terms of service.

Result Management Options

  • Provider-Managed Results: Handle results independently and provide user access.

  • Platform-Managed Results: Delegate upload and transfer to the platform.

Algorithm Registry

The Information Factory maintains a private Docker registry for secure storage of all algorithm versions. This internal registry ensures fast execution without external dependencies or throttling.

Execution Parameters

Define execution parameters (e.g., AOI, time range) with:

  • Data type (e.g., integer, float, string).

  • Value restrictions (e.g., range limits).

  • Optional or mandatory flags.

Example Parameter:

{
  "name": "Max. cloud coverage",
  "id": "maxCC",
  "type": "float",
  "description": "Maximal cloud coverage in percentage",
  "optional": false,
  "restriction": {
    "type": "range",
    "value": [0, 100]
  }
}

Execution Lineage and Logging

Each execution is tracked with parameters, runtime, and logs for troubleshooting. Only the result-data directory and logs are retained; temporary files are discarded.

Steps to Publish Your Algorithm

  1. Review the agreement and provide a narrative description of the algorithm and goals.

  2. Request access to the Docker registry.

  3. Define input parameters and result handling preferences.

  4. Adapt your code and publish the Docker image.

  5. Request review and onboarding.

  6. Test your algorithm in the Information Factory workspace.

Steps to Update Your Algorithm

Algorithm hosting follows GitOps principles. After onboarding, you will access a GitOps repository to manage configurations, environment variables, secrets, and Docker image versions. Updates involve standard Git flow practices, such as updating the image tag to a newer version.