Queueing more cleverly
Often in scientific analyses, you will face the problem that you have a large number of differently named output files and you want to analyze them
crunching through one file per job. Up to now, we have only used a basic Queue
command, queueing multiple jobs with different process IDs.
But HTCondor can also help out in the common use case where file names or more complex configuration sets need to be queued.
In our simple example, we will render some 3D images: We have two rather complex scenes whose files live in separate directories.
For the rendering, we will use the open source renderer POV-Ray (Persistence of Vision Raytracer). Since well-made 3D objects are better made by people with more artistic sense than the regular HTCondor user, I have used existing artwork available under the Creative Commons Attribution-Share Alike 3.0 Unported license.
Hence, we start with an attribution to the artists:
The artwork available as “dice” in this course was made available under Creative Commons Attribution-Share Alike 3.0 Unported . The artwork is called “PNG transparency demonstration”” and has been shared by user ed_g2s on Wikimedia at https://commons.wikimedia.org/wiki/File:PNG_transparency_demonstration_1.png.
For the course, I have added an additional file dice_movie.pov
and a render_movie.ini
which chooses some more light settings for creation of a movie. In addition, a render.ini
file has been added to render a simple frame.
The artwork available as “mini_demo” in this course including the scene and all materials were made available under Creative Commons Attribution-Share Alike 3.0 Unported . The artwork is called “Mini Cooper and Building” and has been shared by © 2004 Gilles Tran http://www.oyonale.com.
For the course, I have added an additional file demo_mini_movie.pov
and a render_movie.ini
which chooses some more light settings for creation of a movie. In addition, a render.ini
file has been added to render a simple frame.
Again, we need a file to describe our job, and an actual job payload. We will use a flexible job payload (a shell script taking parameters) and use a single job description file for all scenes.
Save the following into a file of your choosing or use the file
Debian12_render_scenes.jdl
from the repository.
Save the following into a file of your choosing or use the file
render_pov_single.sh
from the repository.
Please check that the shell script is executable - if not, run
chmod +x render_pov_single.sh
.
First, take a look at the job description file. Can you understand how it works? Some helpful pointers follow.
In general, if the syntax is unclear, you may want to check out the HTCondor documentation.
You can check the HTCondor version used on your submission machine with condor_q -version
.
For example, to get an explanation on what the strange magic line Scene = $Fdb(ScenePath)
is doing, it is best to start from the HTCondor web page,
since links to the HTCondor documentation are sadly not stable yet1.
As you might guess, $Something()
is the syntax of a built-in function. You will find it explained in chapter 3.3.10.
Can you find out what it does, and why might we need it? To answer this question, you should also understand the
Queue
command. If in doubt, this is the right point in time to ask!
As soon as everything is understood and you know what to expect, it is time to submit the jobs:
These jobs may run for a little while, so let’s take the time to check on them!2 POV-Ray produces some progress output on STDERR
. You can access that live from your submit machine using (with 98.0
being the first job’s id):
You can also ask for more output with:
You can also check the log
file of the job, and use condor_q
to check resource usage:
Check out status and resource consumption of those jobs. Do they match with the requests formulated in the job description? What about the units?
If a job does not start, you may also want to check out (for job id 98.0
):
Check out your results
As soon as the jobs have finished, you should find two new image files in your submit directory.
The best way to look at them is to copy them to your local machine (on Linux or MacOS X, use scp
or rsync
, on Windows, either use the same commands in Windows Subsystem for Linux (WSL),
or use e.g. WinSCP). Once they have arrived, use a normal image viewer.
Queueing with a complex set of parameters
Finally, you may encounter very complex analysis tools in your scientific career which need a lot of configuration parameters. We don’t provide a hands-on example here, since the possibilities are endless, but instead, we present an example snippet of a JDL file and configuration file to queue a complex set of jobs. At this point, it is important to remember about the possibilities you are granted by HTCondor - an actual implementation will always be specific for the analysis tool you are using.
Consider the following lines from a JDL:
and the following lists_of_tasks.txt
accompanying it:
The several “columns” (separated by spaces) are identified as the variable names passed to the Queue
command.
Note that the DATASETS
column contains a list of datasets separated by ;
which may for example be parsed by the wrapper script or the analysis software.
This may for example be helpful if job runtime would otherwise be very short, and the actual setup / teardown phase would take long compared to the job runtime.
Examples for necessary, but heavy setup / teardown could be:
- Condor file transfer of large / huge number of input files
- Extraction of the actual software (for example, it might be stored as a tarball on a cluster file system, and be extracted on scratch space for actual use, since cluster file systems scale bad with many small files)
- Cleanup of the job scratch directory (this also takes time!)
- Necessary cache filling, software startup time etc.
Can you follow along the example, and understand all parts of it? For example, what would happen if you would name the full JDL file analysis.jdl
and submit as follows?
Do you have an example use case in mind? Feel free to ask questions!
Another important attribute for your job description file is the possibility to remap the file names of input and output files. Imagine the program you use expects a specific input file name, and produces a hardcoded output filename. A simple workaround would be to write a job wrapper script which renames the files accordingly. However, you can also do this:
This would move the file output.root
which is expected to be produced by the job in the working directory to the subdirectory output/
on the execute machine when the job has finished, and give it a unique name by using the process ID. The same is possible for input files.
Related to this, the initialdir
setting effectively changes the directory before submitting the single job. This allows to prepare multiple sets of input files in different subdirectories on the submit machine, and to collect the logs and outputs in those subdirectories.
Do you have example use cases in mind? Again, feel free to ask questions!
-
A very much improved online documentation is part of the HTCondor 8.8 series. ↩
-
If they finish too fast, you will also find a
Debian12_render_scenes_hq.jdl
in the repository. Note that this requires significantly more resources, so please only use that if the normal jobs are too short for investigating their behaviour. ↩