Basic HTCondor commands

Experiment with basic HTCondor commands.[edit | edit source]

We are going to look at two fundamental HTCondor commands "condor_q" and "condor_status". They are used to monitor your jobs and your slots, respectively.

Viewing slots[edit | edit source]

This command can be very simple:

$ condor_status
This command running on the CERN pool would produce a lot of output - we have 100k slots. Here we should see something a bit more simple.
[gks@htcondor-t-0 ~]$ condor_status
Name                           OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@htcondor-t-0.os-internal LINUX      X86_64 Unclaimed Idle      0.000 3790  5+18:35:27
slot1@htcondor-t-1.os-internal LINUX      X86_64 Unclaimed Idle      0.050 3790  5+18:35:20

                     Machines Owner Claimed Unclaimed Matched Preempting  Drain

        X86_64/LINUX        2     0       0         2       0          0      0

               Total        2     0       0         2       0          0      0
The output consists of 8 columns:
Col Example Meaning
Name slot1@htcondor-t-0.os-internal Slot name and hostname
OpSys LINUX Operating System
Arch X86_64 Machine Architecture
State Unclaimed State of the slot (Unclaimed is available, Owner is being used by the machine owner, Claimed is matched to a job)
Activity Idle Is there activity on the slot?
LoadAv 0.050 Load average, a measure of CPU activity on the slot
Mem 3790 Memory available to the slot, in MB
ActivityTime 5+18:35:27 Amount of time spent in current activity (days + hours:minutes:seconds)

After the slot data, you can see summary information about the whole pool. There is one row of summary for each machine architecture/operating system combination. The columns are the different states that a slot can be in. The final row gives a summary of slot states for the whole pool.

Now run:

$ condor_status

...yourself and compare it to the output above. How does it compare?

Viewing whole machines only[edit | edit source]

Run:

$ condor_status -compact

Note how the output compares to the full summary.

Viewing Jobs[edit | edit source]

The condor_q command lists jobs that are on this submit machine and that are running or waiting to run. The _q part of the name is meant to suggest the word “queue”, or list of jobs waiting to finish.

The simplest version of this command shows only your jobs:

$ condor_q 
The main part of the output (which for you will be empty, as you haven't submitted any jobs yet), looks like this:
-- Schedd: bigbird02.cern.ch : <128.142.196.38:9618?... @ 08/28/17 13:07:42
OWNER   BATCH_NAME       SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
bejones CMD: hello.sh   8/28 13:07      _      _      1      1 459934.0
The output consists of the following columns:
Col Example Meaning
OWNER bejones The user ID of the user who submitted the job
BATCH_NAME hello.sh The executable or the "jobbatchname" specified within submit file(s)
SUBMITTED 8/28 13:07 The date and time when the job was submitted
DONE _ Number of jobs in this batch that have completed
RUN _ Number of jobs in this batch that are currently running
IDLE 1 Number of jobs in this batch that are idle, waiting for a match
HOLD _ Column will show up if there are jobs on "hold" because something about the submission/setup needs to be corrected by the user
TOTAL 1 Total number of jobs in this batch
JOB_IDS 459934.0 Job ID or range of Job IDs in this batch 
After this, again there's a summary, like:
1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
This shows the job counts in all possible states.

Viewing everyones jobs[edit | edit source]

In the lab, where you have your own schedd, this may not show much. In other environments, to show all users jobs, run:

$ condor_q -all
 PreviousNext