Explore condor q
Explore condor_q[edit | edit source]
The goal of this exercise is try out some of the most common options to the
condor_q command, so that you can view jobs effectively.
The main part of this exercise should take just a few minutes, but if you have more time later, come back and work on the extension ideas at the end to become a
Selecting Jobs [edit | edit source]The
condor_qprogram has many options for selecting which jobs are listed. You have already seen that the default mode (as of version 8.5) is to show only your jobs in "batch" mode:
$ condor_q -all
$ condor_q -nobatch $ condor_q -all -nobatch
$ condor_q username1 username2 username3
$ condor_q <cluster>
5678.1, etc.), you use
condor_q 5678. To list a specific job (i.e., cluster.process, as in 5678.0):
$ condor_q <job id>
Note: You can name more than one cluster, job ID, or combination thereof on the command line, in which case jobs for all of the named clusters and/or job IDs are listed.
Let’s get some practice using
- Using a previous exercise, submit several
- List all jobs in the queue — are there others besides your own?
- Practice using all forms of
condor_qthat you have learned:
- List just your jobs, with and without batching
- List a specific cluster
- List a specific job ID
- Try listing several users at once
- Try listing several clusters and job IDs at once
- When there are a variety of jobs in the queue, try combining a user ID and a different user's cluster or job ID in the same command — what happens?
Viewing a Job ClassAd [edit | edit source]
You may have wondered why it is useful to be able to list a single job ID using
condor_q. By itself, it may not be that useful. But, in combination with another option, it is very useful!
condor_q(or its short form,
-l), it will show the complete ClassAd for each selected job, instead of the one-line summary that you have seen so far. Because job ClassAds may have 80–90 attributes (or more!), it probably makes the most sense to show the ClassAd for a single job at a time. And you know how to show just one job! Here is what the command looks like:
$ condor_q -long <job id>
condor_qoutput (except with some line breaks added to the
MyType = "Job" Err = "sleep.err" UserLog = "/home/cat/1-monday-2.1-queue/sleep.log" JobUniverse = 5 Requirements = ( IsOSGSchoolSlot =?= true ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer ) ClusterId = 2420 WhenToTransferOutput = "ON_EXIT" Owner = "cat" CondorVersion = "$CondorVersion: 8.5.5 May 03 2016 BuildID: 366162 $" Out = "sleep.out" Cmd = "/bin/sleep" Arguments = "120"
Note: Attributes are listed in no particular order and may change from time to time. Do not assume anything about the order of attributes in
See what you can find in a job ClassAd from your own job.
- Using a previous exercise, submit a
- Before the job executes, capture its ClassAd and save to a file
$ condor_q -l job-id > classad-1.txt
$ condor_q -l job-id > classad-2.txt
- Can you find attributes that came from your submit file? (E.g., JobUniverse, Cmd, Arguments, Out, Err, UserLog, and so forth)
- Can you find attributes that could have come from your submit file, but that HTCondor added for you? (E.g., Requirements)
How many of the following attributes can you guess the meaning of?
- JobStartDate — what format is the value in? (Hint: The HTCondor developers are primarily trained in the Unix way of doing things.)
Why Is My Job Not Running? [edit | edit source]
Sometimes, you submit a job and it just sits in the queue in Idle state, never running. It can be difficult to figure out why a job never matches and runs. Fortunately, HTCondor can give you some help.To ask HTCondor why you job is not running, add the
condor_qfor the specific job. For example, for job ID 2423.0, the command is:
$ condor_q -better-analyze 2423.0
universe = vanilla executable = /bin/hostname output = norun.out error = norun.err log = norun.log should_transfer_files = YES when_to_transfer_output = ON_EXIT request_memory = 2TB queue
- Save and submit this file
condor_q -better-analyzeon the job ID
-- Schedd: htcondor-0.os-internal : <10.10.0.120:9618?... The Requirements expression for job 15.000 is ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer ) Job 15.000 defines the following attributes: DiskUsage = 17 RequestDisk = DiskUsage RequestMemory = 2097152 The Requirements expression for job 15.000 reduces to these conditions: Slots Step Matched Condition ----- -------- ---------  16 TARGET.Arch == "X86_64"  16 TARGET.OpSys == "LINUX"  16 TARGET.Disk >= RequestDisk  0 TARGET.Memory >= RequestMemory No successful match recorded. Last failed match: Tue Aug 29 07:28:09 2017 Reason for last match failure: no match found 015.000: Run analysis summary ignoring user priority. Of 16 machines, 0 are rejected by your job's requirements 0 reject your job because of their own requirements 16 are exhausted partitionable slots 0 match and are already running your jobs 0 match but are serving other users 0 are available to run your job WARNING: Be advised: No machines matched the jobs's constraints
condor_qsaid that it considered 16 “machines” (really, slots) and all 16 of them were rejected as the partitionable slots were exhausted. In other words, I am asking for something that is not available. But what? If you look at the list of requirements, you will see that the requirement that was matched by zero slots was that TARGET.Memory >= RequestMemory. TARGET here is a namespace of the target machine in the attempted matchmaking. Therefore we can see that for none of the target machines was the available memory greater or equal to the requested 2TB.
Automatic Formatting Output (Optional) [edit | edit source]
There is a way to format output from
condor_q with the
-af option. In this case, HTCondor decides for you how to format the data you ask for from job ClassAd(s). (To tell HTCondor how to format this information, yourself, you could use the
-format option, which we're not covering.)
-afoption followed by the attribute name, for each attribute that you want to output:
$ condor_q -af Owner -af ClusterId -af Cmd bejones 16 /share/test.sh cat 17 /bin/sleep cat 18 /bin/sleep
Requirementsexpression of a job, how would you do that with
-af? Is the output what you expected? (HINT: for ClassAd attributes like "Requirements" that are long expressions, instead of simple values, you can use
-af:rto view the expressions, instead of what it's current evaluation.)