HPC Rosalind Gettingstarted

Rosalind - Getting Started
Important Notices
This is not a HIPAA compliant system. Please do not use it to store or process PHI.
Currently, nothing is backed up. Please make sure that you have your own backup of any data you keep on the HPC.
We are targeting both HIPAA compliance and backup of selected data within the next few months.
Cluster Specifications
Quantity 32 Linux compute nodes, each with 24 CPU cores and 128 GB of RAM
FDR InfiniBand high speed networking for parallel computing and storage interconnect
3.7 PB usable shared storage (DDN GS14Ke with approx. 650 hard drives)
Red Hat Enterprise Linux 6.7 operating system, SLURM job scheduler
Logging In
The cluster can only be reached through campus networks or through the campus VPN.
To log in to the cluster, you will need a SSH client. If you have a Linux or Mac computer, open a terminal and log in using:
ssh username@cubipmlgn01.ucdenver.pvt
Where username is the cluster username that has been assigned to you.
If you have a Windows computer, you will have to download a SSH client first. We recommend Putty, downloadable here
If you are using Putty on Windows, enter Rosalind's address, cubipmlgn01.ucdenver.pvt , into the Host Name blank and press Open to begin.
Home and Scratch

By default, the shell variables $HOME and $SCRATCH refer to your home directory and your scratch area. You can use these just like normal file
paths. For example:
mkdir $HOME/newdir
echo "test" > $SCRATCH/testfile
It is always a good idea to refer to your home directory using $HOME instead of hard coding it's path into your code. This way, if your home
directory is moved, your code will still work, and it will also work for other people you give it to.
$SCRATCH points to storage that is much faster than the local disks on the compute nodes. Instead of writing temporary data to /tmp on a node,
write it to $SCRATCH for better performance.
$SCRATCH is temporary storage. It should never be used to store data you intend to keep beyond the runtime the job that's using the data. In the
future, we may automatically delete files from $SCRATCH that haven't been accessed within a week or so. $SCRATCH will never be backed up.
Running Jobs
In order to run a job, you must make a submit script for the SLURM job queueing software.
The purpose of the submit script is to tell SLURM how to run your job, including how many resources it requires.
Below, a minimal submit script:
#!/bin/bash
#SBATCH --time=60
#SBATCH --ntasks=1
#SBATCH --mem=1000
#SBATCH --job-name=MyJob
#SBATCH --output=MyJob.log
date
sleep 60
To submit the script above as a job, first save it to a text file on Rosalind. For example, you could save it to M yJob.sh. Then, while logged in,
submit it with sbatch:
[you@cubipmlgn01 ~]$ sbatch ./myjob.sh
To check to see if it is running, you can use squeue, and to check status on it, you can use use sacct.
[you@cubipmlgn01 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
15852 defq MyJob you PD 0:00 1 (Priority)
<--- your job is pending
[you@cubipmlgn01 ~]$ squeue

JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
15852 defq MyJob you R 0:48 1
cubipmcmp028 <--- your job is running
[you@cubipmlgn01 ~]$ sacct

JobID JobName Partition Account AllocCPUS State
ExitCode
------------ ---------- ---------- ---------- ---------- ----------
--------
15852 MyJob defq you 1 COMPLETED
0:0
15852.batch batch you 1 COMPLETED
0:0
[you@cubipmlgn01 ~]$ sacct --format=jobid,elapsed,ncpus,ntasks,state
JobID Elapsed NCPUS NTasks State

------------ ---------- ---------- -------- ----------
15852 00:01:01 1 COMPLETED
15852.batch 00:01:01 1 1 COMPLETED
The manual pages on Rosalind for sacct, squeue, and sbatch contain a lot of useful information. Don't forget to try typing man sbatch, man
squeue, and man sacct
Explaining the above submit script one line at a time:
#!/bin/bash
Your submit script is a Linux shell script that is put together in a specific way that SLURM understands. Since it's a regular shell script, it begins
with a call to a shell script interpreter, in this case, bash.
#SBATCH --time=60
This line tells SLURM to give a time limit of 60 minutes to your job. Setting time limits is important because jobs with time limits will be allowed to
run sooner than jobs with the default limit, which is 36 hours.
In a shell script, # denotes a comment that is normally not executed. However, SLURM recognizes lines that begin with #SBATCH as instructions
to SLURM, In general, if you would pass an option like the --time option to SLURM on the command line, you can put it in your submit script on a
line beginning with #SBATCH and it will have the same effect.
#SBATCH --ntasks=1
This line tells SLURM that you are requesting one task for your job, which on our system means requesting one CPU core. No matter how many
threads your job has, it will only be allowed to run on the number of cores you request. You may request fewer cores than your job has threads,
with no ill effect aside from the possibility of your job running more slowly.
#SBATCH --mem=1000
This line tells SLURM that you are requesting 1000 MB of memory for your job. The default, if you do not specify, is about 4 GB, but requesting
smaller amounts can cause your job to be scheduled to run sooner. If you need more than 4 GB, you must explicitly request it. The maximum you
can request on one node is about 100 GB. Be sure to request enough memory for your job, since a job will fail when it tries to use more memory
than you requested.
#SBATCH --job-name=MyJob
#SBATCH --output=MyJob.log
These lines give your job a name and a file to send (by default) the job's stdin and stderr output streams to. If you don't give your job a name or an
output file, it will pick generic defaults, which may make it inconvenient to track the progress of your job.
date
This line begins the part of the submit script which contains the actual Linux programs SLURM will run as part of your job. In this case, the job will
simply run the date command and exit. Date's output will be written to MyJob.log
Using Modules
Some basic UNIX programs are part of your default environment when you log in. For instance, one particular version of Python:
[you@cubipmlgn01 ~]$ python

Python 2.6.6 (r266:84292, May 22 2015, 08:34:51)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-15)] on linux2
But, it is likely that you'll want to use programs that are not part of the default environment, or are part of it, but the default version isn't the version
you need. This is what modules are for.
Essentially, the "module system" is just a small program that when you run it, changes your environment to include a particular package. This is a
temporary change. By default, it won't persist if you log out and log back in.
Let's try it:
[you@cubipmlgn01 ~]$ module avail 2>&1 | grep python

fftw2/openmpi/open64/64/float/2.1.5 python/2.7.11
python-2.7.12-gcc-6.1.0-wsnpieodpk2dbwgsfsqhfgsrwf5t775u
Typing "module avail" will show a long list of every available package. The command above searches that output using "grep" to find just pythons.
We see that we have two available. Let's try one:
[you@cubipmlgn01 ~]$ module load
Python 2.7.12 (default, Jul 18 2016, 11:20:12)
[GCC 6.1.0] on linux2
As you can see, loading the python module changed the version of python you get when you type python. "module unload" undoes the change.
[you@cubipmlgn01 ~]$ module unload

Python 2.6.6 (r266:84292, May 22 2015, 08:34:51)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-15)] on linux2
When you make a submit script for SLURM, it's important that it loads the modules that it needs. For example:
#!/bin/bash
#SBATCH --time=1
#SBATCH --job-name=TryPython
#SBATCH --output=TryPython-%j.log
module load python-2.7.12-gcc-6.1.0-wsnpieodpk2dbwgsfsqhfgsrwf5t775u

python --version
module unload python-2.7.12-gcc-6.1.0-wsnpieodpk2dbwgsfsqhfgsrwf5t775u
python --version
Try it! Note that we've introduced a small trick in the #SBATCH directives. In the #SBATCH directives, %j will be substituted with the job number
of your job. This is useful for making sure that your log files don't get overwritten if you run the same job twice.
Running R
If you need to run R, a recent version will be available as a module:
[you@cubipmlgn01 ~]$ module avail 2>&1 | grep R-3

R-3.3.1-gcc-6.1.0-tpkh25pakjkfxwqth7kp3gxl4grlf6c2
[you@cubipmlgn01 ~]$ module load

R-3.3.1-gcc-6.1.0-tpkh25pakjkfxwqth7kp3gxl4grlf6c2
[you@cubipmlgn01 ~]$ R
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"

Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
You may install CRAN packages into a personal library as needed - just call install.packages inside R and say "yes" to using a personal library.
Running MPI Jobs
In order to run MPI jobs, you only need to load a MPI module and specify the desired number of nodes and cores in your submit script. Example
submit script:
#!/bin/bash
#SBATCH --ntasks=48
#SBATCH --nodes=2
#SBATCH --mem=1000
#SBATCH -J mpitest
#SBATCH --output=/home/you/mpitest.log
#SBATCH --open-mode=append
module load openmpi/gcc/64/1.8.5

mpirun hostname
Specifying --open-mode=append means that your output log file will not be overwritten by subsequent runs of the same job. Instead, new results
will be added to it.
Best Practices
The ideal batch job is longer than a few minutes, but shorter than a day. Jobs in this range of lengths will be scheduled more efficiently by
SLURM, and so you will get more work through the system and, on average experience less queue wait time if you keep your jobs within
these limits.
Ideally, batch jobs should be able to be killed and restarted without losing too much progress. You can accomplish this by writing your
code to checkpoint, or simply by breaking up a long running job into multiple shorter jobs.
Do not hard code the UNIX path of your home directory into your SLURM scripts. If you need to reference a file in your scripts, use the
shell's $HOME variable.
Try it: Type echo $HOME
If you have a file called myfile in the top level of your home, you can reference it as $HOME/myfile

HPC Rosalind Gettingstarted

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

HPC Rosalind Gettingstarted

Caricato da

Copyright:

Formati disponibili

Rosalind - Getting Started

Home and Scratch

echo "test" > $SCRATCH/testfile

Below, a minimal submit script:

[you@cubipmlgn01 ~]$ sbatch ./myjob.sh

[you@cubipmlgn01 ~]$ squeue

[you@cubipmlgn01 ~]$ sacct

[you@cubipmlgn01 ~]$ sacct --format=jobid,elapsed,ncpus,ntasks,state

JobID Elapsed NCPUS NTasks State

Explaining the above submit script one line at a time:

[you@cubipmlgn01 ~]$ python

Let's try it:

[you@cubipmlgn01 ~]$ module avail 2>&1 | grep python

[you@cubipmlgn01 ~]$ module unload

module load python-2.7.12-gcc-6.1.0-wsnpieodpk2dbwgsfsqhfgsrwf5t775u

[you@cubipmlgn01 ~]$ module avail 2>&1 | grep R-3

[you@cubipmlgn01 ~]$ module load

R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"

module load openmpi/gcc/64/1.8.5

Potrebbero piacerti anche