Troubleshooting

The pipeline has grown into a large project with multiple dependencies and parallelism, and debugging it is not always easy. Here are a few pointers to get you started:

Note: If the pipeline ends with a message like:

[OK|Pipeline] Pipeline finished. 4 of 10 (40.0%) samples processed successfully

this is usually not considered an error. This simply means that some of the samples could not be processed, for whatever reason. This may be due to instable simulations (which often occur due to the random nature of the simulation space), invalid meshing (which often occurs due to the random mesh setup) or similar. Still, you may be able to track down where the issue lies and figure how to make more samples process correctly.

Debugging a single sample:

Often, you’ll run into issues that don’t occur with every sample, but only some. In such cases, it can be benefitial to re-run only this sample - usually, this should yield the same error every time due to the deterministic behavior or the pipeline. (Note that there are exceptions, like timeouts, which sometimes yield non-deterministic behavior). To run only sample 7, for example, you can do:

python3 src/run_sample.py --data_path [YOUR_DATA_PATH] --start_sample 7 --num_samples 1

Note that this effectively disables parallelism (the blocks for a single sample must be run sequentially), but may still run on the cluster. It can also be useful to instead run locally by adding –run_local. Also, use –show_full_errors and –verbosity_level=DETAIL.

Next, it’s usually useful to check the full log for the sample, for example by doing calling the following in a separate terminal:

tail -f [YOUR_DATA_PATH]/000007/log.log

Note that this log may contain information from previous runs. To avoid confusion, you can remove the file before running the pipeline.

Furthermore, some blocks (those using bash_app instead of python_app) will write their own log files. To view these, look at the .stderr and .stdout files in the sample’s folder. Note that errors are not necessarily placed in .stderr - python errors may also appear in .stdout instead, so check both files. Note also that errors may not be reported at the end of these log files, but can occur earlier - usually it’s best to start at the end and scroll up until you find the latest error. Important: Just like log.log, these files are appended to, so you may see errors from previous runs further up the file.

Analyze (Deprecated)

The first thing to do after encountering issues is to analyze the resulting folder:

python3 src/analyze.py --data_path [YOUR_DATA_PATH]

This will list the encountered issues for all samples. Each issue also mentions the block in which the issue occurred.

Log files

There is a lot of log output for each sample. Usually, the logs from multiple data samples are mixed together due to the parallel processing of the samples, and some logs are hidden to not convolute the console too much. To dive deeper into issues for a sample, you can access the log.log and issues.log files, which are created in each of the data sample folders. Usually, it probably makes sense to look at the end of these files first, to see where the processing of the sample was aborted (and why).

Bash output log files

The functions decorated with bash_app will log to .stderr and .stdout files in the corresponding sample folder. Note: Blender python errors will log to .stdout, not .stderr, because blender itself will itself exit gracefully. Also note: The ordering in these .stdout files may be strange, with errors being reported above and then more standard output being printed below. This is likely because the error gets written immediately, while other output is buffered longer. So when looking for errors (especially in blender bash_apps, search through the whole .stderr and .stdout files!)

Running sequentially

It can help to run the pipeline sequentially in order to get clearer error messages, using the “–run_sequential” argument. This will parse the samples one by one instead of parsing multiple samples at the same time, making the output clearer:

python3 src/run_[...].py --data_path [YOUR_DATA_PATH] --num_samples 10 --run_sequential

Common issues

process_worker_pool.py: command not found

Something is wrong with your environment setup on the remote end. Try logging in to the vm (ssh g27vmsteffi) and see if running the same command as constructed in parsl_config.py worker_init function will give you a working python conda environment. In my case, multiple environments were mixed on the workstation from which I started the pipeline, but once connecting to g27vmsteffi, I noticed only the conda environment was being activated. Parsl was installed in the other environment…

Sofa troubleshooting

Set up your python environment

Compile Sofa. While doing so, point to the python version in your environment! You can find out where that is by using which python3. In my case, this pointed to “/mnt/cluster/environments/pfeiffemi/miniconda3/envs/parsl-pipeline/bin/python3”:

Set up python environment variables for Sofa:

(You could put that into .bashrc)

Test if your Sofa environment variables are set up correctly:

Some possible errors:

or: .. code-block:: console

import Sofa.Helper ImportError: libpython3.10.so.1.0: cannot open shared object file: No such file or directory

-> Does your python version not match the one that Sofa was built for?

ccmake ../Source/ -DPython_EXECUTABLE:FILEPATH=/mnt/cluster/environments/pfeiffemi/miniconda3/envs/parsl-pipeline/bin/python3.10