The qsub command is used to submit jobs to the queue. job, as
previously mentioned, is a program or task that may be assigned to run on a
cluster system. qsub command is itself simple, however,
it to actually run your desired program may be a bit tricky. is because
qsub, when used as designed, will only run scripts. A script
is a text file containing a series of instructions or commands that are carried
out in sequence when run on a computer. Scripts can vary a great deal in
complexity, a simple script may just change your directory and run a program,
while a complex script, may change your directory, run multiple programs, sort
the output, compress it, and then copy it to a specific location.
Note: For advanced users, although you can technically submit any type of
script to qsub, as a best practice you should only submit shell scripts (i.e.
bash), even if this script only invokes another script and scripting language.
This is because the data the scheduling system provides to a job during runtime
is set as environment variables to the shell. Also, in general, &rarr#to;simplify
debugging, it is prudent to keep the actual job submission scripting, separate
from the program or task you want to run.
Lets say I have a program that calculates an arbitrary
fibonacci number, and I want to run this program on the cluster. have
written a script file ``script.sh'' which
runs this program. submit this job I can use the following method.
Example:
[jdpoisso@axiom ~]$ cd qsub [jdpoisso@axiom qsub]$ ls fib script.sh [jdpoisso@axiom qsub]$ qsub script.sh 1056.axiom.localdomain [jdpoisso@axiom qsub]$ qstat Job id Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 1056.axiom script.sh jdpoisso 00:00:38 R first [jdpoisso@axiom qsub]$
Here the qsub command was given the script as an argument. This tells the
qsub command to to the scheduling system and submit the
script ``script.sh'' to the
cluster, if the submission is accepted, you are given a number, in this case
1055, which we can use to monitor the job. Once resources are available to run
the program, the scheduling system will send the script to one of the cluster
nodes and follow its instructions to run the program. shown in the example
by the qstat command, the script started running soon after
submission. The job will continue to run until it completes, at which point it
will copy back the results and will disappear from job listing provided by
qstat.
Example:
[jdpoisso@axiom qsub]$ qstat [jdpoisso@axiom qsub]$ ls fib script.sh script.sh.o1056 [jdpoisso@axiom qsub]$
By default the only results copied back is the standard output and
standard error (script.sh.o1056). These are Unix/Linux terms for
anything that would be printed by the program being run. Many programs write
their results to a separate file, rather than printing it. On most cluster
configurations, the copying of these files (if necessary) are the
responsibility of the user. The instructions to do so may be done at the end of
your submitted script. More will be said about this in the sections on
scripting.
Note: You may or may not be notified by email when your job completes,
depending on the cluster configuration. If you wish to receive an email when
your job completes ( or fails) instructions for doing so may be found in
Scripting Details section under PBS directives.
Example Script -
The script used in the previous qsub example, and its output are listed
below. learn about scripting, see the next section and the section on
Scripting Details or refer to other documentation on scripting in Unix/Linux.
script.sh :
#!/bin/bash #PBS -j oe echo "Running on: " cat ${PBS_NODEFILE} echo echo "Program Output begins: " cd ${PBS_O_WORKDIR} ./fib 46
script.sh.o1056 :
Running on: compute-0-19 Program Output begins: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170, 1836311903,