Introduction

The core of this project is made up of three important components:

The DataSample class holds all data relevant for exactly one simulation.
A number of PipelineBlock classes, each performing a single task for a simulation (like a necessary preprocessing step, or some metric calculation)
The Pipeline class which is responsible for handling all PipelineBlocks and running them all on each DataSample

The pipeline is parallelized using the parsl library, see parsl.

While follwing this documentation, we strongly suggest to load the src/run_sample.py as an simple first example. Once some of the basic functionality is clear, you can look at src/run_nonrigid.py or src/run_nonrigid_with_us_and_rendering for more complete exampless.

This documentation is still incomplete, if you find errors or miss some important information, please report it as an issue , or, even better, create a pull request!

The DataSample

The DataSample class represents a single simulation. It stores all of its information in a single folder and gives read and write access to these files. A DataSample is considered “valid” as long as none of the PipelineBlocks have reported an issue with it. As soon as an issue is encountered (such as a simulation that doesn’t converge, intersecting triangles etc.) the PipelineBlocks should raise an exception which gets stored with the DataSample. In this case, subsequent PipelineBlocks in the Pipeline will not be called on this DataSample.

The DataSample class also stores logs for the given sample, as well as meta-info such as statistics for easier access.

Note that the DataSample is the only thing that is passed into the PipelineBlock::run() functions. This means it is passed between python processes by the parallelization library parsl. Usually, this does not change much for you as the user, but there may be edge cases where this is important. In python_apps, you can write things to the DataSample class (for example by adding a new file list). By returning the changed sample from these functions, we make sure that subsequent blocks can read the changes that you’ve made. See Scene Objects for more details.

The PipelineBlock

Each functionality of the Pipeline is implemented in a subclass of the PipelineBlock class. Examples could be:

generating a mesh
extracting a surface
adding noise

The PipelineBlock may run python code or arbitraty bash code. Because of parallelization, each PipelineBlock should act independently (i.e. not referencing other PipelineBlocks). When adding your own functionality, you should do this by subclassing the PipelineBlock. See Building your own PipelineBlocks for more details.

The Pipeline

The Pipeline class controls the process of calling each PipelineBlock on each DataSample. Depending on how the Pipeline is configured, this workload may be spread out across multiple threads/processes or computers. For a single DataSample, we guarantee that the blocks are called in order, but there is no guarantee that sample i will be finished before sample i+1 is finished (Note: we added the --run_sequential flag to process samples in-order for debugging purposes).

The pipeline also aggregates the statistics from all samples into one single file for easier overview and - if you configure it to do so - generates plots of these statistics.