A Simple Job Script
A Simple Job Script[edit | edit source]
htcondor_workshop/01_simple_submit are a bash
job_setup.sh . This script should generate a program on the batch system . This script should do some checks and compile the
generate_number.cpp file. The compiled program should copied to a shared filesystem. At EKP it could be the home on the portal machines. After you edit this script submit this script with grid-control or JDL-Creator.
JDL-Creator[edit | edit source]
Have a look at the python script
create_jdl.py. This script creates a directory
setup_jobs with some subdirectories
error out log, the JDL file and a copy of the bash script. The warnings from the python script can be ignored here. We discuss these in the next exercise. You can submit your job with
Grid-Control[edit | edit source]
Have a look at the grid-control submit config file
gc_submit.cfg. This is a minimal config file for an user defined job which should run on a HTCondor batch system. You can submit this job with
<path to grid-control>/go.py gc_submit.cfg. This setting ist would be
../../grid-control/go.py. Grid-control would create an directory
work.gs_submit. This directory contains amongst others the direcotries
output. The sandbox includes all the files which are needed by the jobs. The
output directory includes for each job the
stderr from the job. Look at this
stdout file. You see that grid-control do some checks and print some information in which directory the job starts and how much space is left on it.
Job Monitoring in HTCondor[edit | edit source]
You can show the status of your jobs with
condor_q. If the job is finished it will be removed from that list automatically. You can look with
condor_q -analyze <job-id>or
condor_q -better-analyze <job-id>how much resources can run your job. With the option
Special Job Submit[edit | edit source]
How we want to run our compiled program with different parameters. First change the
job_setup.sh script to run your job with an argument.
Program Arguments in Jobs[edit | edit source]
For the JDL-Creator you have to create a list of strings. Each element of the list is one job. The element of the list contains the arguments for that job. The argument list will be written in file
arguments.txtin the submit folder.
For grid.-control you have to add the
[parameter] section and define there the set of your variables. This set of variables will be used in the
[UserTask] section to set the arguments per job.
You will see that some of that jobs go into the hold state. The reason for that can be found with
condor_q -hold. The message should say that the job needs more memory than requested. The default value at EKP is about 2100MB. That our job runs complet, we have to request more memory for the jobs. For that jobs 4500MB memory should requested. Have a look in the batch system log file. There are the maximum of used disk and memory per job and look if you can reduce the request memory.
Additional to that information we should add some other attributes to the submit file:
walltime. The accounting group depends on your working group. At the EKP we have following accounting groups:
Job Information[edit | edit source]
You can look all attributes of a job with
condor_q -l <job-id>. For jobs which are done use the
condor_history command. Both,
condor_history can also print special values with the autoformat argument. To see the walltime of the 10 last finished jobs of the user mschnepf run
condor_history -limit 10 -af RemoteWallClockTime mschnepf.
Other useful job values (ClassAds) are
Job Resubmission[edit | edit source]
With the JDL-Creator you can simple resubmit your jobs with the new jdl file. The old jobs will be in queue for a while.
The resubmission with grid-control is more complex. Grid-control knows the old submit file and looks at the state of these jobs when it runs. You can remove the old jobs from the batch system with
go.py -d all. Now you can submit again. Grid-control does not allow the change of some variable after the submit. If this is the case, you can remove the
work.* directory and resubmit your jobs.
Select Resources[edit | edit source]
At the EKP we have different resources. These resources can be categorised in CPU or I/O. You can set requirements that has the resource to provide. These requirements are
ProvidesIO. Please set the requirements that your jobs run on resources (
Target) which are specialised for that kind of job.
The example program
generate_numbersis a CPU intensive program, so set the requirement for the job to
Target.ProvidesCPU. Submit the jobs with that requirement and look with
condor_q -better-analyze how much resources can run your jobs.