Data transfer and storage

Learn how output data is stored and transferred.


Overview

A running workflow handles data transfer between added blocks. You don't need to store and retrieve data for the next block in the workflow yourself.

  • All input data of a running block will be placed in the /tmp/input directory of the file system.
  • All output data of a running block should be saved in the /tmp/output directory of the file system.

The output of the last block in a workflow is the job output.

Data transfer diagram

Output folder structure

The /tmp/output/ directory can contain the following:

  • A data.json file with GeoJSON metadata and links to output images. This file is required.
  • Any number of images that are linked to from the data.json file. Images are optional.

For example:

/tmp/output/data.json
/tmp/output/be051fa1/block_output.TIF

GeoJSON metadata

The metadata in the data.json file should be a GeoJSON FeatureCollection. The features array can contain zero or more Feature objects

Specify a relative path in up42.data_path. It should point to the respective file or output directory in the /tmp/output folder.

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {
        "up42.data_path": "be051fa1/block_output.TIF"
      },
      "bbox": [],
      "geometry": {
        "coordinates": [
          [
            [-10.0, -10.0],
            [10.0, -10.0],
            [10.0, 10.0],
            [-10.0, -10.0]
          ]
        ],
        "type": "Polygon"
      }
    }
  ]
}

Storing data

The default storage mechanism for UP42 is stateless. A job doesn't have access to the results of previous jobs.

A block can access the output data of the previous block. To combine results from the previous block with results from another block several steps back in the workflow, do one of the following:

  • Use the propagation operator to forward data.

  • Store data in a bucket for later access as follows:

    1. Set up Amazon S3 or Google Cloud Storage buckets.

    2. In your Dockerfile, add the variable below the FROM directive that sets the base image:

      ARG your_key
      
      ENV YOUR_KEY $your_key
      
    3. Set a key when building:

      --build-arg your_key="ADD-YOUR-VALUE"
      
    4. Access the key value within your block code:

      import os
      
      YOUR_KEY = os.environ.get("YOUR_KEY")