Users Guide

From CAMH SCC

Jump to: navigation, search

Contents

Tutorials and Presentations

Internal Courses and Material

The following list includes presentations and tutorials that cover material regarding the SCC. Use right click and "save-as" to access these materials on all browsers:

External Courses and Material

Primers in Linux/Unix, clustered environments and parallel job submission are available online. A list of resources will be made available to you below and will be updated for your convenience.

Login

First login via ssh with your SCC account at IP:

$ ssh -X <username>@login.scc.camh.net

Access from Windows Machines

MobaXterm

MobaXterm Xserver provides a compact and flexible tool for accessing the SCC. It has a clean interface for logging into the SCC, transferring data and allows for easy launch of graphical interfaces, such as MATLAB.


To download MobaXterm download the installation version: http://mobaxterm.mobatek.net/download-home-edition.html, and double click the file to install.

If you do not have permissions to install MobaXterm on your machine, you can choose download the portable version on the download page and extract to some where or you can contact scc.support@camh.ca or ResearchIT@camh.ca to request administrator to assist.


Putty

Putty is a free implementation of Telnet and SSH for Windows and Unix platforms. One use is to ssh into a Linux based machine from a Windows operating system.

To get Putty on your machine, first download the appropriate version from the Putty downloads page: Putty

1) Click on and accept the download of the Putty executable for your distribution. For most users this will be Windows on Intel x86. It is advised, for stability reasons, that you download the last stable version of the code, rather than the most recent Betas.

2) Click on and accept the download of the PSCP (Putty Secure Copy) executable which you will need for data transfer as outlined below.

3) Double click on the executables, and follow the installation instructions.

Once installation is complete, open Putty and test a ssh login to the SCC.

Cygwin

Cygwin provides a Linux-like environment on windows. When 'OpenSSH' is installed on cygwin one can ssh into the SCC as one would from a standard Linux machine.

To get Cygwin on your machine, first download the newest version from the Cygwin downloads page: Cygwin

1) Click and accept the download for "Setup.exe"

2) Follow the instructions in the installation wizard.

3) When you are given the option to select packages click the radio button for OpenSSH.

4) Continue installation and finish.

5) If there are additional packages that you require you can update Cygwin at any time by running Setup.exe.

Xming and X11 Forwarding

If a graphical user interface is required to manage a cluster pipeline Xming and X11 forwarding with either Cygwin or Putty can be used to interact with the cluster outside of the command line.

Xming acts as the local Windows X server. Download and install Xming from the following repository: http://sourceforge.net/projects/xming/. Once active the Xming server will allow visualization of X11 forwarded X applications.

Cygwin can forward X11 information by including the "-X" or "-Y" flags to the ssh login command used to access the server.

Putty can be set to forward X11 content by clicking "enable X11 forwarding" in the "SSH > X11" window of the Putty interface. It is necessary to specify the display setting in the adjacent field as: "localhost:0.0". Then login as usual using the IP and credentials provided.

Transferring Data

SCC FTP Server

Data should be transferred via the SCC FTP server: ftp.scc.camh.net

The FTP server supports SFTP, FTP, rsync and scp.

Linux: scp and rysnc

Data should be transferred to the SCC using ‘scp’ (secure copy).

Secure copy a local file(s) from a remote host (your local machine) to a directory on the SCC with the following syntax: scp <username>@<local_machine>:/<local_file> <SCC_directory>

scc$ scp david@kimsrv:/home/david/file.c /imaging/home/kimel/david/

Make certain that you have read permission on the data you are copying and write permissions in the destination directory. Note that the speed of transfer will vary depending on daily traffic. Remember we are limited to 1GB Ethernet, over which many users may be transferring substantial datasets.

Windows: WinScp

Transferring data using 'scp' or 'rsync' works well for linux environments, but it is more convenient to transfer data using WinScp on windows machines.

WinScp can be downloaded for free at:

http://winscp.net/eng/download.php

Follow the installation instructions to install on your local machine. WinScp has a user interface that allows you to drag and drop files from your windows machine to your directories on the SCC.

Modules and Environment Variables

To use most packages on the SCC you will have to use the `modules' command. The command module load some-package will set your environment variables (PATH, LD_LIBRARY_PATH, etc) to include the default version of that package. module load some-package/specific-version will load a specific version of that package. This makes it very easy for different users to use different versions of software.

A list of software, installed and maintained on the SCC can be found on our Software page. A list available module scan be seen on the system by typing

$ module avail

To load a module (for example, R for statistical computing)

$ module load R/2.13.2

To list the modules currently loaded

$ module list

To unload a module

$ module unload R/2.13.2

To unload all modules

$ module purge

These commands can go in your .bashrc files and/or in your submission scripts to make sure you are using the correct packages.

Note that a module load command only sets the environment variables in your current shell (and any subprocesses that the shell launches). It does not effect other shell environments; in particular, a queued job that is running is uneffected by you interactively loading a module, and conversely you loading a module at the prompt and then submitting a job does not ensure that the module is loaded when the job runs.

If you always require the same modules, it is easiest to load those modules in your .bashrc and then they will always be present in your environment; if you routinely have to flip back and forth between modules, it is easiest to have almost no modules loaded in your .bashrc and simply load them as you need them (and have the required module load commands in your job submission scripts).


Running Jobs on SCC Clusters

Submitting Jobs to the Queue

The SCC is a shared system, and jobs that are to run on them are submitted to a queue; the Scheduler then orders the jobs in order to make the best use of the machine, and has them launched when resources become available. The intervention of the scheduler can mean that the jobs aren't quite run in a first-in first-out order.

It is important to note that on compute nodes, your home directory is read-only. You have to run your jobs from the /scratch directory instead. See Data Management for more details on the file systems at SciNet.

The maximum wallcock time for a job in the queue is 48 hours; computations that will take longer than this must be broken into 48-hour chunks and run as several jobs. The usual way to do this is with checkpoints, writing out the complete state of the computation every so often in such a way that a job can be restarted from this state information and continue on from where it left off. Generating checkpoints is a good idea anyway, as in the unlikely event of a hardware failure during your run, it allows you to restart without having lost much work.

There are limits to how many jobs you can submit. If your group has a default account, up to 32 nodes at a time for 48 hours per job on the SCC are allowed to be queued. This is a total limit, e.g., you could request 64 nodes for 24 hours. Jobs of users with special allocation will run at a higher priority than others while their resources last. Because of the group-based allocation, it is conceivable that your jobs won't run if your colleagues have already exhausted your group's limits.

Note that scheduling big jobs greatly affects the queuer and other users, so you have to talk to us first to run massively parallel jobs. We will help make sure that your jobs start and run efficiently.

If your job should run in fewer than 48 hours, specify that in your script -- your job will start sooner. (It's easier for the scheduler to fit in a short job than a long job). On the downside, the job will be killed automatically by the queue manager software at the end of the specified wallclock time, so if you guess wrong you might lose some work. So the standard procedure is to estimate how long your job will take and add 10% or so.

Batch Submission Script

You interact with the queuing system through the queue/resource manager, Maui/Torque. To submit a job, you must write a script which describes the job and how it is to be run and submit it to the queue, using the command A sample submission script is shown below with the #PBS directives at the top and the rest being what will be executed on the compute node.

#!/bin/bash -l
#
#PBS -l nodes=2:ppn=12,walltime=1:00:00
#PBS -N test
#PBS -V
#
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from
cd $PBS_O_WORKDIR
./job.exe data_1 > output

The lines that begin #PBS are commands that are parsed and interpreted by qsub at submission time, and control administrative things about your job. In this example, the script above requests two nodes, using 12 processors per node, for a wallclock time of one hour. (The resources required by the job are listed on the #PBS -l line.) Other options can be given in other #PBS lines, such as #PBS -N, which sets the name of the job.

PBS Directives

PBS Directive                                                    Function
 
#PBS -l nodes=1:ppn=1                     

    Specifies a PBS resource requirement of 1 compute node and 1 processor per node. 


#PBS -l walltime=03:00:00                  
          
      Specifies a PBS resource requirement of 3 hours of wall clock time to run the job.
 
#PBS -o output_filename                    
                                                        
       Specifies the name of the file where job  output is to be saved. May be omitted to generate filename appended with jobid number.                                                  
 
#PBS -j oe                                        
         
       Specifies that job output and error messages are to be joined in one file.                                               
 
#PBS -m bea                                     
                                                        
       Specifies that PBS send email notification when the job begins (b), ends (e), or aborts (a).                                            


#PBS -V                                            
                 
       Specifies that all environment variables are to be exported to the batch job.                                      


PBS environmental variables




PBS_O_WORKDIR: Directory where the qsub command was executed

PBS_NODEFILE: Name of the file that contains a list of the HOSTS provided for the job

PBS_JOBID: Job ID number given to this job

PBS_QUEUE: Queue job is running in

PBS_WALLTIME: Walltime in secs requested

PBS_JOBNAME: Name of the job. This can be set using the -N option in the PBS script

PBS_ENVIRONMENT:	Indicates job type, PBS_BATCH or PBS_INTERACTIVE

PBS_O_SHELL:	value of the SHELL variable in the environment in which qsub was executed

PBS_O_HOME: Home directory of the user running qsub

Job Submission

$ qsub [SCRIPT-FILE-NAME]

where you will replace [SCRIPT-FILE-NAME] with the file containing the submission script. This will return a job ID, for example 51923, which is used to identify the jobs. Information about a queued job can be found using

Qbatch

qbatch is a tool for executing commands in parallel across a compute cluster. It takes as input a list of commands (shell command lines or executable scripts) in a file or piped to qbatch. The list of commands are divided into arbitrarily sizedchunks which are submitted as jobs to the cluster either as individual submissions or an array. Each job runs the commands in its chunk in parallel. Commands can also be run locally on systems with no cluster capability. You can get help with qbatch -h

Environment variable defaults qbatch supports several environment variables to customize defaults for your local system. $ export QBATCH_PPJ=12 # requested processors per job $ export QBATCH_CHUNKSIZE=$QBATCH_PPJ # commands to run per job $ export QBATCH_CORES=$QBATCH_PPJ # commonds to run in parallel per job $ export QBATCH_MEM="0" # requested memory per job $ export QBATCH_MEMVARS="mem" # memory request variable to set $ export QBATCH_SYSTEM="pbs" # queuing system to use $ export QBATCH_NODES=1 # (PBS-only) nodes to request per job $ export QBATCH_PE="smp" # (SGE-only) parallel environment name These correspond to the same named options in the qbatch help output above. Some examples:

  1. Submit an array job from a list of commands (one per line)
  2. Generates a job script in ./.scripts/ and job logs appear in ./logs/

$ qbatch commands.txt

  1. Set the walltime for each job

$ qbatch -w 3:00:00 commands.txt

  1. Run 24 commands per job

$ qbatch -c24 commands.txt

  1. Request 1# Run 24 commands per job, but run 12 in parallel at a time

$ qbatch -c24 -j12 commands.txt

  1. Start jobs after successful completion of existing jobs with names starting with "stage1_"

$ qbatch --afterok 'stage1_*' commands.txt

  1. Pipe a list of commands to qbatch

$ parallel echo process.sh {} ::: *.dat | qbatch -

  1. Run jobs locally with GNU Parallel, 12 commands in parallel

$ qbatch -b local -j12 commands.txt

  1. Many options don't make sense locally: chunking, individual vs array, nodes,
  2. ppj, highmem, and afterok are ignored

Running Interactive Jobs

Users may need to run jobs interactively sometimes. Such jobs should not be run on the SCC login node.

Instead allocate an interactive node as described below, and run the interactive job there.

[user@scc] $ qsub -q intq -I -l nodes=1
qsub: waiting for job 2236960 to start
qsub: job 2236960 ready
[user@node02]$ cd /genome/scratch/group/user/myruns
[user@node02]$ module load BEDTOOLS2
[user@node02]$ cd /genome/scratch/group/user/myruns/run1
[user@node02]$ bamToBed -i input.bam > output.bed
[user@node02]$ ...........
[user@node02] exit
qsub: job 2236960 completed
[user@scc]$

The qsub command above will allocate a node with at least 1 GB of memory. If you need more than that, you can specify the memory requirement on the qsub command line, will allocate a node with 14 GB of memory.


[user@scc]$ qsub -I -l nodes=1:g14:c10

There is a global alias for easy access the above command:

[user@scc] $ qsubi -l nodes=1

Monitoring Jobs

To see all the jobs in the queue use

$ showq

Individual job status can be queried using the checkjob command, followed by the JobID:

$ checkjob [JOB-ID]

Jobs can be cancelled with the canceljob command

$ canceljob [JOB-ID]

Again, these commands have many options, which can be read about on their man pages.

Much more information on the queuing system is available on our Scheduler page.

Checking standard output of PBS jobs in real time

Currently, PBS is configured so that the standard output/error of their jobs are redirected to temporary files and copied back to the final destination after their jobs finish. Hence, users can only access the standard output after their jobs finish. However, PBS does provide another method for users to check standard output/error in real time, i.e.

qsub -k oe pbs_script


The "-k oe" option at the qsub command line specifies that standard output or standard error streams will be retained on the execution host. The stream will be placed in the home directory of the user under whose user id the job executed. The file name will be the default file name given by: where is the name specified for the job, and is the sequence number component of the job identifier. For example, if a user submits a job to SCC with job name "test" and job id '1223', then the standard output/error will be test.o1223/test.e1223 in the user's home directory. This allows users to check their stdout/stderr while their jobs are running.

Batch Job Example

Below is an example outlining the submission of a batch job, run from the scratch directory. In this example a text file containing a list of datasets is read, and the executable mycode is run on each. This example makes use of gnu-parallel to group together sets of 12 jobs to make use of the full suite of processors on each node.

scc$ module load gcc
scc$ gcc code.c -o mycode
scc$ mkdir scratch/example2
scc$ cp mycode scratch/example2
scc$ cd scratch/example2
scc$ cat > joblist.txt
mkdir run1; cd run1; ../mycode 1 > out
mkdir run2; cd run2; ../mycode 2 > out
mkdir run3; cd run3; ../mycode 3 > out
. . .
mkdir run30; cd run30; ../mycode 30 > out
scc$ cat > job_script.sh
#!/bin/bash
#PBS -l nodes=1:ppn=12,walltime=24:00:00
#PBS -N JobName_2
cd $PBS_O_WORKDIR
module load gnu-parallel
parallel -j 8 < joblist.txt
scc$ qsub job_script.sh
301.scc
scc$ ls
JobName_2.e301 JobName_2.o301 joblist.txt mycode code.c
Job_script.sh run1/ run2/ run3/...

R Batch Example

Submitting R scripts often requires a few additional settings to operate correctly in the queue. The following is an example that covers a typical R job script.

#!/bin/bash -l
#
#PBS -r n

##Job settings
#PBS -N SCC_RTest
#PBS -o SCC_RTest.out
#PBS -e SCC_RTest.err
#PBS -j oe
#PBS -m abe

##Job configuration

##Job resources
#PBS -l nodes=1:ppn=12
#PBS -l walltime=01:00:00
#PBS -l pmem=100mb
#PBS -V

EXE=/usr/bin/R

SCRDIR=/genome/scratch/$USERGROUP/$USER/$PBS_JOBNAME

if [ ! -d $SCRDIR ]; then
  mkdir -p $SCRDIR
fi

cd $SCRDIR
echo running R on `hostname -s` in `pwd`
$EXE  --vanilla  < SCC_test.r > Test.out
echo Done

You can also use the build in R batch:

R CMD BATCH --vanilla SCC_test.r

MATLAB on the SCC

There are 25 concurrent licenses available on the SCC, that can be called from the compute nodes. MATLAB can be used graphically, by logging into a compute node.

For parallel computing, it is best to call 'matlab' from within a queue submission script without the graphical user interface (i.e. matlab - nodesktop -nojvm -nosplash) and direct your matlab script to the main program with "<" or with the '-r' flag.

cat submit_matlab.sh
#!/bin/bash -l
#
#PBS -r n

##Job settings
#PBS -N SCC_MATLAB
#PBS -o SCC_MATLAB.out
#PBS -e SCC_MATLAB.err
#PBS -j oe
#PBS -m abe

##Job configuration
##Job resources
#PBS -l nodes=1:ppn=8
#PBS -l walltime=04:00:00
#PBS -l pmem=2gb
#PBS -V
cd $PBS_O_WORKDIR
matlab -nodesktop -nojvm -nosplash < matlab_script.m

Alternatively, the contents of the matlab script can be written in-line with the submission script itself as demonstrated in the following scripts. MATLAB is packed on the SCC with the parallel computing toolbox, that can be called from the matlab script (or written inline as the following example) to utilize all processors available on the host compute node.

#!/bin/bash -l
#
#PBS -r n

##Job settings
#PBS -N SCC_MATLAB-PARALLEL
#PBS -o SCC_MATLAB-PARALLEL.out
#PBS -e SCC_MATLAB-PARALLEL.err
#PBS -j oe
#PBS -m abe

##Job configuration
##Job resources
#PBS -l nodes=1:ppn=8
#PBS -l walltime=04:00:00
#PBS -l pmem=2gb
#PBS -V

cd $PBS_O_WORKDIR
# Direct input from start-end of contents of 'EOF' to be interpreted by MATLAB
matlab -nosplash -nodisplay <<EOF

% the size of the pool should equal the PBS ppn=12
matlabpool open local 12

% call the function
myParforLab

% close the pool
matlabpool close

# End the Input Stream to Matlab
EOF

echo ""
echo "Done at " `date`

Running Matlab

Introduction

Matlab is a numerical computing and programming environment with a broad range of functionality (matrix manipulation, numerical linear algebra, general-purpose graphics, etc.). Additionally, specialised application areas (e.g. bioinformatics or financial derivatives) are served by a large number of optional toolboxes Matlab is installed on the SCC clusters along with all the toolboxes covered by the following license. Package Name License number MATLAB 25 Distrib_Computing_Toolbox 25 Curve_Fitting_Toolbox 5 GADS_Toolbox 5 Image_Toolbox 5 Neural_Network_Toolbox 5 Optimization_Toolbox 5 PDE_Toolbox 5 Signal_Toolbox 5 Statistics_Toolbox 5 Wavelet_Toolbox 5

The Matlab Module

Like all applications, Matlab has to be loaded using the module utility prior to running it. This ensures that Matlab is included in the path. Get the list of available versions of Matlab using module avail MATLAB . Load the module for the default Matlab version with module load MATLAB. Load a particular version by specifying the Matlab version in the command, e.g. module load matlab/R2016a. After load the module , you can just type matlab to run matlab at interactive gui mode. Loading the module gives access to the main Matlab product as well as to all the installed toolboxes, including the Parallel Computing Toolbox. To check toolbox availability (and the Matlab version), type ver at the prompt of an interactive Matlab session.

Non-interactive Matlab Sessions

During code development on standard machines Matlab is usually run in interactive mode, in this way making full use of its integrated environment. By contrast, given the "batch processing" nature of supercomputing resources, the preferred mode of operation for Matlab on our systems is non-interactive. One way to run Matlab non-interactively is through

• re-directing the standard input and output when invoking Matlab and

• invoking Matlab from a submission script, submitted to the queue via the PBS Pro scheduler.

Input and output re-direction is arguably the easiest way of running Matlab non-interactivelly. It is achieved using the Linux operators < and >, with Matlab taking a code file as an input and writing the output to a file, e.g.

matlab –nodesktop –nosplash < myScript.m > myOutput.txt.

The main function/program (e.g. myScript.m) should have the exit command at the end in order to force Matlab to quit after finishing the execution of the code. A simple example illustrating non-interactive Matlab use is found in /quarantine/Workshops/Matlab/test-matlab.m. In this example, the MATLAB program test-matlab.m display a simple test. A Matlab job is sent to the queue and executed on a backend node using the job scheduler. A sample submission scripts /quarantine/Workshops/Matlab/submit_test.sh contains the following line to run the Matlab script: matlab -nodisplay -nosplash < test-matlab.m > run.log The flag nodisplay instruct Matlab to run without the GUI, while nosplash prevents the display of the Matlab logo. The < redirection operator ensures that Matlab runs the script test-matlab.m, while the > operator re-directs the standard output (normally to the terminal) to run.log file. Job submission is done with the command qsub submit-test-matlab.sh Matlab license issue The SCC compute node account is more than the MATLAB license. To make sure your torque scripts run correctly , add the following parameters in your scripts, a sample scripts is located at /quarantine/Workshops/Matlab/submit_test.sh  :

  1. PBS -l nodes=1:ppn=1:matlab,walltime=1:00:00

The flag matlab instruct the scripts run only on the compute nodes with matlab properties. For other packages like Image_Toolbox , there are two ways to make sure you have license to run your scripts: 1. add your job after a license check job use pbs dependents we have write a sample scripts under /quarantine/Workshops/Matlab/, copy the file submit_test_Neural_Network_Toolbox.pbs and submit_depend.sh to your directory and modified it to suit your program. In submit_depend.sh we submit job submit_test.pbs depends on submit_test_Neural_Network_Toolbox.pbs finished ok.

  1. !/bin/bash

FIRST=$(qsub check_Neural_Network_Toolbox.pbs) echo $FIRST SECOND=$(qsub -W depend=afterok:$FIRST submit_test.pbs)

2. add a check_license.sh before your matlab command, there are a sample scripts to check the Neural_Network_Toolbox /quarantine/Workshops/Matlab/submit_test_Neural_Network_Toolbox.pbs, you can replace Neural_Network_Toolbox with Image_Toolbox to check the Image_Toolbox packages.

  1. !/bin/bash --login
  2. PBS -l nodes=1:ppn=1:matlab,walltime=1:00:00
  3. PBS -N matlab-test
  4. PBS -V
  5. DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from

cd $PBS_O_WORKDIR module load MATLAB/R2016a /usr/local/bin/check_licensen.sh Neural_Network_Toolbox matlab -nodisplay -nosplash < test-matlab.m

  • Note by using this method, you should consider set walltime more longer for the time spending on acquire the license

Using local/scratch

Using local/scratch disk for jobs The compute nodes access the user's data files via NFS over infiniband network. When a job runs on the compute nodes with read or write operation on those data, these I/O is not only slow and unreliable comparing with local disk, it might also slow/bring down other critical resources when used heavily. For this reason, it is recommended (and strongly recommended for heavy I/O runs) that you use the alternative local disk I/O. Evenmore, we implement tmpfs local scratch folder which store data in a portions of memory. Generally, I/O intensive tasks and programs that run frequent read/write operations can benefit from using this tmpfs folder.

Running on local disk is not as straightforward as running under nfs scratch and requires some care. At the least, you'll need to copy the input files from nfs disk to the local disk, cd there, run the job, and finally copy the output files back. Job scratch directories The local disk on all compute nodes is called /export/ramdisk. The /export/ramdisk on each compute node is a different tmpfs disk. For example, files written to /export/ramdisk on node node03 will not be visible on node node20.

The following tables show Tmpfs mount point and size on SCC cluster. Node mount point tmpfs size node01-node22 /export/ramdisk 100G node23-node32 /export/ramdisk 256G gpu01 /export/ramdisk 256G

Once a job is started in the queue, a scratch directory is created for the job on the local disk on which the job is running. The name of this directory contains the queue job id, i.e., the number in which a job is identified in the queue (and seen in the output to qstat). For example, if a job has the job id of 150719.mgmt2.scc.camh.net, then a directory will be created on the compute node called: /export/ramdisk/150719.mgmt2.scc.camh.net This directory can be referred to as $TMPDIR in job submission scripts. The owner of the job has write permissions in $TMPDIR to write their temporary files.

Job scratch directory policies The scratch area is intended for the writing/reading of temporary files during the course of a job. At the termination of the job, temporary files are removed to prevent the scratch area being over-utilised and thus preventing other jobs from running on that node. The scratch directory (hereafter called $TMPDIR) is created upon a job initiating within the queue. The directory exists for the duration of the job, and is then removed upon the job exiting the queue. It is therefore important that users copy any output files that they require from $TMPDIR to a subdirectory within their $HOME directory before the job is completed. The following Sections contain guidelines to assist the users to this end.

Using a job scratch directory $TMPDIR is used within a PBS job submission script. In the script, the necessary input files should be copied into$TMPDIR first, $TMPDIR should then become the current directory, and the job run from $TMPDIR. For example, if a job has a single input file called input.data, an executable called run.x, and the job produces a single output file called output.log, then the necessary command lines to include in the PBS job submission script (ignoring the PBS directives) might be something like: cp input.data run.x $TMPDIR cd $TMPDIR ./run.x > output.log cp output.log $PBS_O_WORKDIR where $PBS_O_WORKDIR is a variable which contains the directory location from which the job was originally submitted from (the original working directory, for example /genome/home/pat/work).

Request scratch directory resource The $TMPDIR have limited space in different node. To make sure your data can be put into the tmpfs folder. You should use file parameter in your pbs script something like:

  1. PBS -l file=30gb

Or define in command line

qsub -l file=30gb test.pbs

Using a shell trap to prevent loss of files Sometimes a job can die unexpectedly, and before it can terminate successfully. In this scenario, the job might not be able to copy any required output from $TMPDIR back to $PBS_O_WORKDIR, and the job would then have to be restarted from the beginning. To prevent this from happening, users should include a shell trap to catch a terminating signal, and then copy the desired files back to $PBS_O_WORKDIR. For example, when considering the commands given in the above Section Using a job scratch directory, the following line should be added to prevent the loss of the output file myoutput in the event of a disaster: cp input.data run.x $TMPDIR cd $TMPDIR run.x > output.log cp output.log $PBS_O_WORKDIR

trap "cp output.log $PBS_O_WORKDIR" EXIT SIGTERM Monitoring important output whilst job is running When a job is running in $TMPDIR, the output files cannot be viewed unless the user logs into the compute node on which they are actually running on. To permit the viewing of an output file, it is suggested to use the tee command to copy the job output to an additional file that is viewable from the login node (i.e., is in the users $HOME area). When considering the example in the above Sections, the commands would now be: cp input.data run.x $TMPDIR cd $TMPDIR run.x | tee $PBS_O_WORKDIR/current_output.log > output.log cp output.log $PBS_O_WORKDIR

trap "cp output.log $PBS_O_WORKDIR" EXIT SIGTERM


  • Note a sample scripts can be found on /quarantine/Workshops/Ramdisk/ramdisk.pbs