Starting in Fall 2003, you will be able to run
your SimpleScalar simulation jobs (actually any jobs) on the machines in
Moore 207 in batch cycle-strealing mode (i.e., you submit the jobs to a
central scheduler and it runs them on any available machines it can find).
The job scheduler we will use is Condor,
which was developed at the University of Wisconsin-Madison. Condor is a
very general "grid computing" utility, but we will use it in its very basic
form. Condor is available on the machines in Moore 207 and on halfdome.cis.upenn.edu.
sim-func:
sysprobe sim-func.$(OEXT) $(FUNC_OBJS) $(EXOLIB)
$(CC) -o sim-func $(CFLAGS) sim-func.$(OEXT) $(FUNC_OBJS) $(EXOLIB) $(ZLIB)
$(MLIBS)
sim-func.condor:
sysprobe sim-func.$(OEXT) $(FUNC_OBJS) $(EXOLIB)
condor_compile
$(CC) -o sim-func.condor
$(CFLAGS) sim-func.$(OEXT) $(FUNC_OBJS) $(EXOLIB) $(ZLIB) $(MLIBS)
INITIALDIR:
the directory in which you want the job to execute
EXECUTABLE:
the program you want to run (one of the simulators)
ARGUMENTS:
the arguments to the program
INPUT:
a file which will be used as the program's standard input
OUTPUT:
a file which will be used as the program's standard output
ERROR:
a file which will be used as the program's standard error
An important keyword in a condor job script is QUEUE. Whenever the script sees the word QUEUE, it evaluates the current environment variables and creates a condor job to that specification. That's handy, because you can keep calling QUEUE while changing only the environment variables you want. In fact, you can use your own variables as well. Here is the file jobs.condor which is in the directory /home8/a/amir/cis501/simplescalar/condor_example/
## MYROOT is
my own variable
## PROG is my
own variable, too
MYROOT=/home8/a/amir/cis501/simplescalar
INITIALDIR=$(MYROOT)/condor_example
EXECUTABLE=$(MYROOT)/simulators/sim-func.condor
ARGUMENTS=-insn:limit
10000000 $(MYROOT)/traces/specint2000/test/$(PROG).eio
INPUT=
OUTPUT=
ERROR=$(PROG).ssout
PROG=gcc
QUEUE
PROG=eon.kajiya
QUEUE
This file specifies two condor jobs, one runs
gcc the other eon.kajiya. Notice, because we are using eio traces,
the simulations have neither input nor output. The simplescalar output
(i.e., the statistics) is written to the standard error.
condor_submit <script_name>
This submits your jobs to condor. All jobs submitted by a single script are submitted to a new "cluster" number. You should remember that number.
bash-2.05$ condor_submit
jobs.condor
Submitting job(s)..
2 job(s) submitted
to cluster 12.
Condor will send you email when your jobs are done. If you want to check on your jobs in the meantime, use:
condor_q
For example.
bash-2.05$ condor_q
-- Submitter:
canfield.cis.upenn.edu : <158.130.68.19:55913> : canfield.cis.upenn.edu
ID
OWNER
SUBMITTED RUN_TIME ST PRI SIZE CMD
12.0
amir
9/15 10:49 0+00:00:22 R 0 3.8 sim-func.condor
-i
12.1
amir
9/15 10:49 0+00:00:00 I 0 3.8 sim-func.condor
-i
2 jobs; 1 idle, 1 running, 0 held
This tells you that you have two jobs on the queue, 12.0 and 12.1. 12.0 has been running for 22 seconds. 12.1 is idle, it will start running when condor finds a machine for it.
If you want to remove a job, use:
condor_rm cluster_number[.job_number]
You can remove individual jobs or entire clusters of jobs.
bash-2.05$ condor_rm
12
Cluster 12 has
been marked for removal.
That's about it. If you have any questions, ask the TA's or look on the condor homepage for documentation.