On a sunny afternoon at the UA,
I was threatened: “Explain Seastar !!! Okay ?!?”
So I started writing documentation,
with everything but it’s location,
’cause that is TOP-secret, so I’ll never say !
This manual is distributed in the hope that it will be useful, but without any warranty, without even the implied warranty of not setting the printer on fire by printing it out, destroying your computer by reading it with a pdf-reader, ...
This is still an alpha version of the manual, most likely it will contain a lot of grammatical and spelling errors. There might also be errors in the code fragments which could cause serious damage.
It’s probably best to skip this section if you’ve never used Seastar before, this is more of a technical reference…
Seastar is a Beowulf cluster1 with the following specifications:
The properties above make /scratch very useful to save intermediate results of jobs before saving the final results on /l.
In order to get started you’ll need an account, so contact me and I’ll create it for you. Most of the account names on seastar start with the first character of the first name of the user followed by the last name (e.g. someone named Kurt Cobain would receive the username “kcobain”).
I’ll also ask you to choose a password. I might forget to tell you but make sure that this is a strong password. You can do this by following the usual rules that most sites demand (9 chars or longer, mix in special chars, don’t base it on recognizable words, ...) but if you don’t mind typing in long passwords then I would recommend XKCD-style13 passwords. Whatever password you choose, make sure you don’t use it anywhere else.
In this document I’m assuming that you are using a POSIX-compliant14 operating system. To be honest I have to admit that it’s also possible to connect to the cluster from other operating systems, so why am I doing this ?
I also have to assume that you already have some basic experience with GNU/Linux15 . Otherwise I would have to explain every tiniest step and this document would become unreadably long.
So now your account is created. You should now be able to login if you start an SSH16 -client and connect to server seastar-64.cmi.uantwerpen.be17 with your username and password and use the TCP-port18 that’s registered for ssh. For our example user “Kurt Cobain” this would work like this:
The screen will now show some uninteresting information about the cluster
and on the last line you will have a prompt that looks like this:
0 kcobain@seastar-64 .../ $
In case you are wondering what the number at the front is, it’s the
returncode20
of the last command you executed. When you want to logout, just type
the command exit and you’ll be disconnected.
0kcobain@seastar-64.../ $qsubnevermind.sh
314159.beosrv-c
0kcobain@seastar-64.../ $cat nevermind.sh.o314159
Here we are now, entertain us...
0 kcobain@seastar-64.../ $cat nevermind.sh.e314159
Oops, there was a suicide
In section 3 I glanced over how you should submit scripts. I’ll now look at how you should write scripts and give some more info about submitting them. The most important thing that you should know is: qsub submits the scripts and transforms them into jobs. All the rest are just details.
I even have some good news: If you don’t want to do anything “fancy” and just want to run a simple program you don’t even need a script. You can just use pipe the commands to qsub:
0kcobain@seastar-64.../ $echo"sing’SmellsLikeTeenSpirit’"| qsub
314160.beosrv-c
What’s now happening is the following:
Most likely you won’t like the default options for the jobs so let’s see how we can change them. I’ll show you how to change one option now and I’ll give you a list with the other possible options later:
0kcobain@seastar-64.../ $echo"sing’Comeasyouare’"|qsub-N singasong
314161.beosrv-c
This is pretty much the same job as the previous with one, but by using the -N option we gave it the name singasong and the 2 files that will be created after the job is finished will be singasong.o314161 and singasong.e314161
The previous way of creating jobs works, but usually it’s better to write a script. When you write a script you don’t have to pipe anything to qsub, instead you just run qsub (with the arguments you want) and behind the last argument you place the path to the script:
0kcobain@seastar-64.../ $qsub-Nsingasong teenspirit.sh
314162.beosrv-c
As you can see, the scriptname ends with .sh, the reason for this is that the script is a shell script. You don’t have to end your scripts with .sh but it’s a convention that most people follow so I would recommend doing the same thing if you don’t want to complicate things. Although the submit scripts don’t have to be shell scripts (you can also write submit scripts in perl, python, ...) I would recommend using regular shell scripts written in (ba)sh24 . Other scripts can cause strange problems25 .
Covering shell scripts completely will lead us to far. For now I’ll have to be very brief about it:
For example, this could be the submitscript of the last job (teenspirit.sh):
The following is a example of a submitscript that contains some interesting features of bash:
Arguments that you pass to qsub can also be placed in the submitscript by adding lines in this format right behind #!/bin/bash:
So if we change teenspirit.sh like this:
then we can submit it like this and it would still have the name “singasong”:
0kcobain@seastar-64.../ $qsubteenspirit.sh
314163.beosrv-c
I’m only mentioning the options here that I think are actually useful for you.
For a whole list, see man qsub .
Option | Arguments |
Explanation |
-a | [[[[CC]YY]MM]DD]hhmm[.SS] |
Do not launch the job before this time. (e.g. -a 2150 will make sure the job is not launched before 9:50pm today (or tomorrow if it’s already after 21:50) and with -a 11250345.07 it won’t start before 03:45:07 on the 25th of november) |
-c | enabled |
Enables checkpointing of the job, the job will run a tiny bit slower (nanoseconds, not noticable) but it has the advantage that if for some reason the cluster needs to be rebooted the job can be saved first. It’s always a good idea to use this option |
-I |
Makes job “interactive”: when it starts running, the 3 standard streams will be connected to the shell where you ran qsub, this makes it look as if the job is running in that shell |
|
-j | oe or eo |
STDOUT and STDERR of the job will be joined together, with oe they both end up in STDOUT, with eo they land in STDERR |
-l | resourcelist |
A list of resources and their values separated by comma’s that the job will need that are different from the default values. (e.g. -l walltime=50,nodes=3 to request that the job can run on 3 nodes for 50 seconds27). See also the resources-table in section 4.3.1. |
-m | abe |
You will receive a mail when the job is aborted, when it begins or when it ends. (You’re not obliged to use all 3 of these chars) |
-M | kurt.cobain@nirvana.com |
Mail-address used by -m. If you want multiple people to receive the mail, seperate the addresses with commas |
-N | name |
Gives the job a name (max. 15 chars long and the first char should be alphabetic). |
-q | node, queue or node@queue |
Tells the cluster where the job should be executed. The node@queue option is handy because a node can be part of multiple queues and in that case it has multiple versions of default resources |
-t | n |
The job will be submitted n times.28 |
-V |
The job will receive all environment variables available in the shell where you ran qsub (seastar-64’s bash). If you want to know what these vars are, run set |
|
-v | var1=value1[,var2=value2...] |
A list of environment variables and their values that the job will need |
-w | /some/where |
Set the default working directory to /some/where |
-W | attributelist |
A list of attributes and their values separated by commas that the job will need. (e.g. -W depend=afterok:314164.beosrv-c to request that this job gets scheduled when 314164.beosrv-c finishes without any errors) Be careful if you start combining multiple attributes, you can get some strange effects! See also the attributes-table in section 4.3.2 |
-X |
Enables X-forwarding29 |
|
-z |
The job id will not be sent to STDOUT when you submit the job. |
|
There are a lot of different resources that you can request, for a full list see
http://docs.adaptivecomputing.com/torque/4-_1-_3/help.htm#topics/2-_jobs/requestingRes.htm.
If you are overwhelmed by the options, then just using the default instead of
choosing options yourself is mostly a good choice. This is a list with the most
popular options:
Option | Arguments |
Explanation |
cput | [[HH:]MM;]SS |
Maximum amount of CPU-time used by all processes in the job |
walltime | [[HH:]MM;]SS |
Maximum amount of real time during which the job can be in the running state |
nodes | options |
Reserve a list of nodes, see the examples on the website mentioned above |
mem | size30 |
Maximum amount of memory the job can use |
pmem | size |
Maximum amount of memory any single process can use |
pvmem | size |
Maximum amount of virtual memory any single process can use |
vmem | size |
Maximum amount of virtual memory |
Attributes are useful if you want to arrange the order of your own jobs, to see a full list run man qsub and scroll to the -W additional_attributes option. But the most important attributes are the following
Option | Arguments |
Explanation |
depend=afterok: | jobid |
This job can only be scheduled after the job jobid has terminated without errors |
depend=beforeok: | jobid |
If this job has terminated without errors, then job jobid can begin |
depend=afterany: | jobid |
The same thing as afterok but you can ignore errors |
depend=beforeany: | jobid |
The same thing as beforeok but you can ignore errors |
depend=afternotok: | jobid |
The same thing as afterok but replace “without” by “with” |
depend=beforenotok: | jobid |
The same thing as beforeok but replace “without” by “with” |
Now that you know everything that you should know to create and submit jobs, you’ll probably also want to be able to monitor them. Seastar has multiple ways to follow them of which there are 3 that regular users should know about31 :
0kcobain@seastar-64.../ $qstat-a-ukcobain
Most of the info shown is not really important, but what is important is the difference between “Elap Time” and “Req’d Time”. This difference is how long your job can still run. Also important is the character below S, this has the following meaning:
beosrv-c:
Req’d Req’d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
----------------------- ----------- -------- ---------------- ------ ----- ------ ------ --------- - ---------
314165.beosrv-c kcobain infiniba ComeAsYouAre 27183 1 1 1gb 300:00:00 R 299:59:24
314166.beosrv-c kcobain infiniba Bleach 27184 1 1 2gb 300:00:00 R 299:49:24
314167.beosrv-c kcobain infiniba Nevermind 27185 1 1 3gb 300:00:00 R 299:39:24
0kcobain@seastar-64.../ $showstart314168.beosrv-c
job 314168.beosrv-c requires 2 procs for 0:33:20
Estimated Rsv based start in 1:04:55 on Fri Jul 15 12:53:40
Estimated Rsv based completion in 2:44:55 on Fri Jul 15 14:33:40
Estimated Priority based start in 5:14:55 on Fri Jul 15 17:03:40
Estimated Priority based completion in 6:54:55 on Fri Jul 15 18:43:40
Estimated Historical based start in 00:00:00 on Fri Jul 15 11:48:45
Estimated Historical based completion in 1:40:00 on Fri Jul 15 13:28:45
Best Partition: fast
0kcobain@seastar-64.../ $qmgr-c’ps’
#
# Create queues and set their attributes.
#
#
# Create and define queue fast
#
create queue fast
set queue fast queue_type = Execution
set queue fast Priority = 10
set queue fast resources_max.nodect = 22
set queue fast resources_max.pmem = 1gb
set queue fast resources_max.walltime = 48:00:00
set queue fast resources_default.nodect = 1
set queue fast resources_default.nodes = 1
set queue fast resources_default.pmem = 500mb
set queue fast resources_default.walltime = 48:00:00
set queue fast resources_available.nodect = 26
set queue fast enabled = True
set queue fast started = True
#
# Create and define queue stress
...and soon...
This command does not show which queues contain which nodes, so here
is a small list:
fast | stress | Qgpu | uhimem | himem | dque | infiniband | interactive | |
beo-01 | X | X | X | X | ||||
beo-02 | X | X | X | X | ||||
beo-03 | X | X | X | X | ||||
beo-04 | X | X | X | X | ||||
beo-16 | X | X | X | X | ||||
beo-17 | X | X | X | X | ||||
beo-18 | X | X | X | X | ||||
beo-19 | X | X | X | X | ||||
beo-20 | X | X | X | X | ||||
beo-21 | X | X | X | X | ||||
beo-22 | X | X | X | X | ||||
beo-23 | X | X | X | X | ||||
beo-24 | X | X | X | X | ||||
beo-25 | X | X | X | X | ||||
beo-26 | X | X | X | X | ||||
beo-27 | X | X | X | X | ||||
beo-28 | X | X | X | X | ||||
beo-29 | X | X | X | |||||
beo-30 | X | X | X | |||||
beo-31 | X | X | X | |||||
beo-32 | X | X | X | |||||
beo-33 | X | X | X | X | ||||
beo-34 | X | X | X | |||||
beo-35 | X | X | X | |||||
beo-36 | X | X | X | X | ||||
beo-37 | X | X | X | X | ||||
beo-38 | X | X | X | X | ||||
beo-80 | X | |||||||
beo-81 | X | |||||||
beo-82 | X | |||||||
If you are planning to run interactive programs, make sure you know
about X-forwarding and showstart. The most interesting script here is
runmatlabXXX. It works like this: You replace the XXX by a version
number (most likely this will be 714 for 7.14). As first argument you
give the amount of hours (as integer) you want matlab to run, all
the other arguments are passed on to matlab itself. So Kurt Cobain
would run something like this if he wanted to run matlab -desktop for
almost32
3 hours:
0kcobain@seastar-64.../ $runmatlab7143-desktop
Many of you will also have an account on the VSC-system’s (also known as Calcua or Turing and Hopper), these are the most important differences:
Obviously there are also a lot of smaller differences hidden away. If you are planning to work with the VSC-system I would recommend that you start here: https://www.uantwerpen.be/en/research-_and-_innovation/research-_at-_uantwerp/core-_facilities/core-_facilities/calcua/support/
Seamouse does a lot of things of which the following might be interesting for you:
... I am waiting for input from the reader here ...
You just reached the end of the documentation.
Contact me for even more information,
Also, report what’s unclear,
I will fix it, don’t fear!
And when bored: Feel free to write a translation !