Skip to main content

Introduction to MaaS

workflows

A MaaS workflow is a text file in YAML format that describes how to launch your testsuites on the intranet.

MaaS workflows are organized in stages that are divided in tasks.

Similar to a GitLab CI, stages are ran sequentially while the tasks in them are ran in parallel:

A few remarks and rules about organizing a workflow:

  • Having a final stage with multiple tasks does not make sense because you would not be able to retrieve the output of the tasks (outside of the one in charge of outputting the trace).

  • You need at least one stage with at least one task.

  • Try to organize your workflow so you have the least stages possible. Having less stages means that more tasks are ran in parallel, hence reducing the overall time of your workflow.

Here's an overview of a real-world pipeline:

In this example, we're running functional tests on the student code:

  • "Untar" just runs tar xvf and exposes the different files (see the file management section for information on how this can be done).
  • "Build" compiles the student code
  • "Execution" sets up pre-defined inputs, runs the student's binary and forwards stderr and stdout as outputs in files.
  • "Post-process" compares the student outputs of all of the tests and creates a trace file.

Tasks

Tasks represent a docker run command. Tasks within the same stage can be executed in parallel. The definition of a task includes command to execute, environment variables (optional), CPU/RAM limits, timeout, IO mounted path (bundle and trace)

Refer to the workflow documentation to check which docker run flag corresponds to which field of the task.

A workflow is defined into maasWorkflow.yml, as the above example:

cpu_count: 1
memory_mb: 1024
timeout: 300
stages:
- name: moulinette
tasks:
- name: testsuite
cpus: 1
memory_mb: 1024
timeout: 300
input_file_path: /student
output_file_path: /output
# FIXME
image: ...
# FIXME
commands: ["..."]

Sharing files among stages

The only way to share files among stages and tasks are mounts.

For each pipeline, MaaS creates a directory outside of the container that can be mounted in various ways. The most common pattern is to mount a directory or a file in a preliminary task, write to it, then read it from a subsequent task.

A mount is a string with PATH_ON_MAAS:PATH_IN_CONTAINER[:rw|:ro], where:

  • PATH_ON_MAAS is the path within MaaS' temporary directory.
  • PATH_IN_CONTAINER is the path within the container where the file will be placed.
  • rw or ro defines the read/write mode. rw means read+write, ro means read-only. If not specified, rw is assumed.

An example of amount would be my/file.txt:/file.txt:ro which mounts my/file.txt under /file.txt in the container in read-only mode.

Under the hood, MaaS uses the -v option in docker run. The actual path on the worker will be similar to /tmp/maas_NAME_UUID/PATH_ON_MAAS.

Conflicting mounts

Make sure that mounted paths do not conflict among tasks in a stage. This could lead to MaaS telling you there's a problem with your workflow (at best) or corrupted files and undefined behavior™️ (at worst).

Managing CPU and RAM

Why limit CPU and RAM?

CPU and RAM is limited to avoid overloading runners. There is a built-in hard coded limit, so even if you ask for egregious amounts of resources, MaaS will not allow it and will restrict you to a smaller amount of resources.

This also means that you can run a large amount of tasks and MaaS will be able to run them all, even if running them at the same time would overload the MaaS runner.

Because MaaS is designed to run multiple workflows in parallel on the same host, you must limit the amount of CPU and RAM that your workflow can use. This ensures the pipeline does not accidentally use extreme amounts of resources and gives MaaS a hint on how to schedule workloads.

You must define CPU and RAM limits at two levels:

  • At the workflow level. This determines the total amount of resources your workflow can use at any time.
  • At the task level. Tasks will use a share of the resources allocated at the workflow level. In consequence, tasks can run in parallel as long as the workflow has enough free resources for them.

This can cause your tasks to not run in parallel because your workflow did not request enough resources. Let's take this workflow as an example:

# This is pseudo-code to illustrate the concept, not a real workflow
workflow:
cpus: 6
memoryMb: 4096
stages:
- tasks:
- cpus: 2
memoryMb: 512
# ...
- cpus: 1
memoryMb: 512
# ...
- cpus: 2
memoryMb: 1024

However, if the workflow had requested less resources, MaaS will not be able to run everything in parallel because it would exceed the allocated workflow resources.

# This is pseudo-code to illustrate the concept, not a real workflow
workflow:
cpus: 4
memoryMb: 1024
stages:
- tasks:
- cpus: 2
memoryMb: 512
# ...
- cpus: 1
memoryMb: 512
# ...
- cpus: 2
memoryMb: 1024

An example of a scheduling strategy MaaS could use could be:

note

MaaS is able to reschedule tasks whenever resources are freed up. In other words, whenever a task finishes, the scheduler will try to run another task in its place (if possible).