Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
By
Al-Baraa Bahgat Ezzat
Ayman Aboulmagd Ahmed
Medhat Hamdy Mohamed
Mohamed Ibrahim Abd El-khalik
Mohamed Mounir Mahmoud
Under the Supervision of
Dr. Ahmed T. Sayed
Table of Contents
List of Tables ................................................................................................................. x
List of Figures ............................................................................................................... xi
List of Symbols and Abbreviations............................................................................ xiii
Acknowledgments....................................................................................................... xiv
Abstract ........................................................................................................................ xv
Chapter 1:
1.1
1.1.1
Uniprocessor Computers:......................................................................... 1
1.1.2
Multiple Processors:................................................................................. 2
1.1.2.1
1.1.2.2
Chapter 2:
2.1
2.2
Linux ............................................................................................................. 10
2.2.1
2.2.2
2.2.2.1
2.2.3
2.2.3.1
2.2.3.2
2.2.3.3
Kernel ............................................................................................. 14
2.2.3.4
Shell: ............................................................................................... 14
2.2.3.5
2.2.3.6
ii
2.2.3.7
ext3 ................................................................................................. 16
2.2.3.8
swap ................................................................................................ 16
2.2.3.9
Grub ................................................................................................ 16
2.2.3.10
2.2.3.11
KDE&GNOME .............................................................................. 16
2.2.3.12
Linux distributions.......................................................................... 17
2.2.3.13
2.2.3.14
2.2.4
2.2.4.1
ls...................................................................................................... 20
2.2.4.2
cd .................................................................................................... 21
2.2.4.3
cp .................................................................................................... 22
2.2.4.4
mkdir dirname................................................................................. 23
2.2.4.5
halt .................................................................................................. 23
2.2.4.6
reboot .............................................................................................. 23
2.2.4.7
whoami ........................................................................................... 23
2.2.4.8
which .............................................................................................. 24
2.2.4.9
cat filename.................................................................................... 24
2.2.4.10
head ................................................................................................. 24
2.2.4.11
tail ................................................................................................... 24
2.2.4.12
touch ............................................................................................... 24
2.2.4.13
rm .................................................................................................... 25
2.2.4.14
mv ................................................................................................... 25
iii
2.2.4.15
2.2.5
Permissions: ........................................................................................... 26
2.2.5.1
2.2.6
Links ...................................................................................................... 28
2.2.7
2.2.8
2.2.8.1
Pipes................................................................................................ 29
2.2.8.2
Searching ........................................................................................ 29
2.2.9
2.2.9.1
RPM ................................................................................................ 30
2.2.9.2
2.2.9.3
2.2.10
2.2.11
Mounting: ............................................................................................... 34
2.2.11.1
2.2.11.2
2.2.12
2.3
On the job:.............................................................................................. 34
2.2.12.1
Shortcuts ......................................................................................... 35
2.2.12.2
OSCAR.......................................................................................................... 36
2.3.1
2.3.2
Installation steps..................................................................................... 36
Errors: ...................................................................................................................... 36
2.3.2.1
2.4
2.4.1
ROCKS .................................................................................................. 39
2.4.2
ROLLS ................................................................................................... 39
2.4.2.1
2.4.2.2
Base ................................................................................................ 40
2.4.2.3
Area51 ............................................................................................ 40
2.4.2.4
HPC: ............................................................................................... 40
2.4.2.5
Ganglia: .......................................................................................... 41
2.4.2.6
2.4.3
Installation: ............................................................................................ 41
2.4.3.1
2.4.3.2
2.4.3.3
2.4.3.4
2.4.3.5
2.4.4
2.4.4.1
2.4.4.2
RAMS problem............................................................................... 52
2.4.4.3
2.4.4.4
2.4.4.5
2.4.4.6
2.4.4.7
2.4.5
Useful Commands:................................................................................. 54
v
2.4.5.1
ADD interface................................................................................. 54
2.4.5.2
2.4.5.3
2.4.5.4
2.4.5.5
2.4.5.6
2.4.5.7
2.4.5.8
2.4.5.9
2.4.5.10
2.4.5.11
2.4.5.12
2.4.5.13
2.4.5.14
2.4.5.15
2.4.5.16
2.4.5.17
2.4.5.18
2.4.6
2.5
Cbench ........................................................................................................... 57
2.5.1
2.5.2
2.5.3
2.5.4
2.5.4.1
2.5.4.2
2.5.4.3
2.5.5
Chapter 3:
Parallel programming............................................................................. 63
3.1
3.1.1
Goal ........................................................................................................ 63
3.1.2
3.1.3
3.1.3.1
3.1.3.2
3.1.3.3
3.1.3.4
3.1.3.5
Cluster............................................................................................. 69
Advantages:.......................................................................................................... 69
Disadvantages: ..................................................................................................... 69
3.1.4
3.1.4.1
Goal ................................................................................................ 69
3.1.4.2
Timing ............................................................................................ 69
3.1.4.3
Profiling: ......................................................................................... 70
3.1.4.4
3.1.5
3.1.6
3.1.7
3.2
MPI ................................................................................................................ 75
vii
3.2.1
3.2.1.1
Rank ................................................................................................ 76
3.2.1.2
Communicator ................................................................................ 76
3.2.1.3
Group .............................................................................................. 77
3.2.1.4
MPI message................................................................................... 77
3.2.1.5
3.2.2
3.3
3.2.2.1
3.2.2.2
3.3.1
3.3.2
3.3.3
3.3.4
3.4
PVM .............................................................................................................. 88
3.4.1
Introduction ............................................................................................ 88
3.4.2
3.4.3
3.4.4
3.4.4.1
Setup Summary............................................................................... 92
3.4.5
3.4.6
3.4.7
3.4.8
3.4.9
3.4.10
ix
List of Tables
Table 2-1: Linux Vs. windows..................................................................................... 12
Table 2-2: the runlevels .............................................................................................. 15
Table 2-3:colours on terminal ...................................................................................... 21
Table 2-4:vi options ..................................................................................................... 28
Table 3-1: Distrbuted memory Vs. shared memory..................................................... 68
Table 3-2: Blocking and non-blocking communication .............................................. 78
Table 3-3: MPI reduction operation ............................................................................. 82
Table 3-4:Results of running satisfiablity problem code ............................................. 87
List of Figures
Figure 1-1: UMA architecture ....................................................................................... 2
Figure 1-2: NUMA architecture..................................................................................... 3
Figure 1-1-3: Symmetric cluster .................................................................................... 5
Figure 1-1-4: Asymmetric cluster .................................................................................. 6
Figure 1-5: Expanded cluster ......................................................................................... 6
Figure 2-1: Windows file system ................................................................................. 12
Figure 2-2: Linux file system ....................................................................................... 13
Figure 2-3:The kernel .................................................................................................. 14
Figure 2-4:printing the home directory path ................................................................ 18
Figure 2-5:add user using GUI .................................................................................... 19
Figure 2-6:the super user command ............................................................................. 19
Figure 2-7: ls help command ....................................................................................... 20
Figure 2-8:listing the contents of a current directory................................................... 20
Figure 2-9:the long list command ................................................................................ 21
Figure 2-10:changing the current directory ................................................................. 22
Figure 2-11: pwd command ......................................................................................... 22
Figure 2-12:making a new directory. ........................................................................... 23
Figure 2-13: printing the current user name ................................................................ 23
Figure 2-14:which command ....................................................................................... 24
Figure 2-15: removing a file using rm command ........................................................ 25
Figure 2-16 moving a file to the parent directory ........................................................ 25
Figure 2-17:the permissions ......................................................................................... 26
Figure 2-18:setting permissions using chmod ............................................................. 27
Figure 2-19: setting permissions using chmod ............................................................ 37
Figure 2-20: First Rocks screen ................................................................................... 42
Figure 2-21: Selecting Rocks Rolls 1 .......................................................................... 43
Figure 2-22: Selecting Rocks Rolls 2 .......................................................................... 43
Figure 2-23: Rocks cluster information screen ............................................................ 44
Figure 2-24: Eth0 IP..................................................................................................... 44
Figure 2-25: Eth1 IP..................................................................................................... 45
Figure 2-26: Gateway and DNS IPs............................................................................. 45
Figure 2-27: Setting root password .............................................................................. 46
xi
GPL
EULA
Grub
EP
Embarrassingly Parallel
OSCAR
HPC
SSH
Secure Shell
API
MPI
xiii
Acknowledgments
We would like to express our thanks, sincere gratitude and appreciations to Dr.
Ahmed Tarek Sayed and Dr. Hosam Ali Fahmi for suggesting the idea of the project,
continues advice, valuable guidance and constructive instructions during the whole
year.
We would like also to thank Eng. Amgad, Eng Amaal, our colleague Ahmed Abu ElFotoh.
Last but not least we thank Dr. Yahya Bahnas and the communications lab staff for
their help throughout the year.
xiv
Abstract
The goal of the project is providing high performance computer at low price.
Combining many commodity computers in to one supercomputer, the cluster provides
high performance needed in compute intensive calculations.
The cluster is managed by software called Rocks, Rocks is an open source cluster
management package based on a RedHat Linux distribution (CentOS 5).
Currently the cluster has a master node and 7 heterogeneous nodes.
xv
Chapter 1:
Introduction to clusters
xvi
Chapter 1:
Introduction to clusters
A very common example is that of a horse drawn cart. The techniques will be followed:
Get a stronger horse. (But we will never find the horse of TARWADA).
Uniprocessor Computers.
Multiple Processors.
o Centralized multiprocessors.
o Multicomputers (Clusters).
overall performance while minimizing cost. Frequently used data is placed in very fast cache
memory, while less frequently used data is placed in slower but cheaper memory.
We reach a limit: supercomputers have been pipelined with a single CPU. This is the "big
iron" of the past, often requiring "forklift upgrades" and air conditioners to prevent them from
melting from the heat they generate.
Although of the high performance of supercomputers, we always need to increase it, But now
technologies in manufacturing have reached a physical & quantum limitation in increasing
the transistor performances, & we must remember that transistors is the main building unit of
any electrical system, so its impossible to increase speed & performances of our systems.
(We will never find processor of 10 GHz).
So the second approach of getting TARWADA horse is limited.
With UMA machines there is a common shared memory. Identical memory addresses map,
regardless of the CPU, to the same location in physical memory. Main memory is equally
accessible to all CPUs. To improve memory performance, each processor has its own cache.
cache entry for the other CPU updated? While several techniques are available, the most
common is snooping. With snooping, each cache listens to all memory accesses. If a cache
contains a memory address that is being written to in main memory, the cache updates its
copy of the data to remain consistent with main memory.
1.1.2.1.2 Non-uniform memory access (NUMA)
In this architecture, each CPU maintains its own piece of memory. Effectively, memory is
divided among the processors, but each process has access to all the memory.
Each individual memory address, regardless of the processor, still references the same
location in memory.(i.e. as if memory is divided into parts).
But there is a disadvantage in NUMA that memory access is non-uniform in the sense that
some parts of memory will appear to be much slower than other parts of memory since the
bank of memory "closest" to a processor can be accessed more quickly by that processor.
While this memory arrangement can simplify synchronization, the problem of memory
coherency increases.
In this type, the synchronization problem will exist but, the cache consistency wont exist
cause the address space here will be divided among processes.
1.1.2.2 Multicomputers (Clusters).
1.1.2.2.1 A brief history of Clusters
The basis of cluster computing as a means of doing parallel work of any sort was arguably
invented by Gene Amdahl of IBM, who in 1967 published what has come to be regarded as
the seminal paper on parallel processing: Amdahl's Law. Amdahl's Law describes
mathematically the speedup one can expect from parallelizing any given otherwise serially
performed task on a parallel architecture. This article defined the engineering basis for both
multiprocessor computing and cluster computing, where the primary differentiator is whether
or not the interprocessor communications are supported "inside" the computer (on for
example a customized internal communications bus or network) or "outside" the computer on
a commodity network.
The history of clusters is directly tied into the history of Network. As the main goal of
networking is to link between computing resources. After the appearance of packet switching
concept, the ARPANET project began its development and it succeeded to make the first
cluster by linking four different computers. The ARPANET grew to the Internet. Now,the
Internet provide the linking between all cluster around the world.
The first commercial cluster made was ARCnet. ARCnet wasn't a commercial success and
clustering didn't really take off until DEC released their VAXcluster product in the 1980s for
the VAX/VMS operating system. The ARCnet and VAXcluster products not only supported
parallel computing, but also shared file systems and peripheral devices. They were supposed
to give you the advantage of parallel processing, while maintaining data reliability and
uniqueness. VAXcluster, now VMScluster, is still available on OpenVMS systems from HP
running on Alpha and Itanium systems.
1.1.2.2.2 What is a cluster?
A cluster is a network of computers that work together so that they can be viewed as though
they are a single computer. A cluster has three basic elements:
Software that enables a computer to share work among the other computers via the
network.
1. Symmetric clusters.
2. Asymmetric clusters.
3. Expanded cluster.
In this architecture each node is used independently. The user use each node separately.(e.g.
the user log to the first node & run a certain task then log to the second node & run another
task).
There are main two disadvantages to a symmetric cluster. Cluster management and security
can be more difficult. Workload distribution can become a problem, making it more difficult
to achieve optimal performance.
In asymmetric clusters one computer is the head node or frontend. It serves as a gateway
between the remaining nodes and the users. The remaining nodes often have very minimal
operating systems.
Since all traffic must pass through the head, asymmetric clusters tend to provide a high level
of security. If the remaining nodes are physically secure and your users are trusted, you'll
only need to harden the head node.
So the head often acts as a primary server for the remainder of the clusters. Since it will be
configured differently from the remaining nodes, it may be easier to keep all customizations
on that single machine. This simplifies the installation of the remaining machines.
The disadvantage of this architecture comes from the performance limitations imposed by the
cluster head, & this will effectively in case of large clusters.
1.1.2.2.3.3 Expanded cluster
In this architecture additional servers are used beside the head node (i.e. distributing the work
of the head nodes on many computers).
For example, one of the nodes might function as an NFS server, a second as a management
station that monitors the health of the clusters, and so on.
Chapter 2:
Getting started with the cluster
Chapter 2:
It's a very fascinating thing to work on something that you like. The first step in our project
was learning, we began to read about Clusters, OSCAR and parallelization.
Then, we began to define the way that we will take in our journey (OSCAR Vs Rocks). We,
at first took Rocks as the software that we will use to setup the cluster. But, unfortunately
Rocks had a lot of problems in its installation which made us divide the group into two teams
working in OSCAR and Rocks in parallel. The amazing thing is that we were able to install
both softwares. But finally, we decided to use Rocks as it was easy in its installation and we
also could solve a lot of its problems which made us familiar with it.
After that, we began to study MPI besides working on Benchmarking, installing Scilab and
making the documentation.
Along the way, we faced a lot of problems related to the hardware, we will mention these
problems in this Chapter and how we solved them.
2.2 Linux
2.2.1 Linux history
To understand why Linus Torvalds make it free we must introduce GNU project not UNIX.
In the early of 1980s Richard Stallman at the Massachusetts Institute of Technology proposed
an alternative to the standard corporate software development model In 1983, Stallman
launched the GNU Project which is centered around the idea that the source code for
applications and operating systems should be freely distributable to anyone who wants it so
source code is free for programmers to copy, modify and redistribute it. This will result in
high quality software.
Linux is licensed under the GNU General Public License (GPL). It requires that the source
code remain freely available to anyone who wants it, anyone can download the Linux
kernels source code, modify it, recompile it, and run it.
On the other hand , most operating systems have follows the End User License Agreement
(EULA) that prevents the user from reverse-compiling the operating system.
At early of 1990s there were three main O.S: DOS, Mac OS & UNIX. Windows was getting
started at this time; it was simply a shell that ran on DOS. It really wasnt a true operating
system yet.
Note: the above O.S was commercially developed and source code was being
protected by copyrights.
UNIX source code used to be free for educational purpose ,then its gone which put prof.
Andrew S. Tanenbaum in a problem with teaching the inner core of O.S. so he decide to build
the clone of UNIX to use it in his class which called Minix. he also put the source code in his
book Operating Systems: Design and Implementation (Prentice Hall, 2006).
From this point a graduate student at the University of Helsinki in Finland, initially
developed the Linux kernel. This was Linus Torvalds. it was not a full operating system ,
Linux version 0.02, released on October 5, 1991 was only consist of
10
After that, he put his Linux source code on the internet, Access to the Linux source code was
open to anyone who wanted it, then Torvalds focus in improving his Linux and it turned to a
worldwide issue.
Topic
Linux
Windows
Price
Ease
Reliability
Software
Software
Cost
Hardware
11
Security
Open
Source
Support
12
Linux: there is only one root .one tree and all devices are mounted in the tree fig(2-2)
13
2.2.3.3Kernel
Its the core of the operating system .it manages the communication between the hardware
and the software .simply we can say it is responsible for everything the O.S must do.
Kernel
Hard ware
Soft ware
2.2.3.4Shell:
Its impossible for the end user to deal directly with the Kernel so the need to the shell a rise.
The shell is a program that enables the end user to deal with the kernel.
There are different shells like:
sh: Bourne shell, the earliest shell.
bash: Bourne again shell, improved one of sh and it's the default for Linux.
csh: uses syntax very similar to C language.
zsh: improved version of Bash.
tch: improved version of csh, it's the default shell for free BSD.
There are different ways to access the shell:
1-CLI (command line interface) :communicate with shell by writing commands.
2-GUI (graphical user interface): will be explained in details later.
3-Terminal which is the command line interface in the GUI.
14
2.2.3.5Virtual consoles:
Refers to the combination of one I/P device & one O/P device which enables you to interact
with your computer by communicating with the shell.
We have seven consoles in Linux to change between them press alt+ctrl+F1:F7
So its clearly now that Linux is a multi-user multitasking as we can open more than one
terminal and work in different consoles at the same time.
2.2.3.6Run levels:
Run levels represent the several modes that Linux can run in. We have seven Run
levels as follows:
Run
level
Description
Unused
To change the Run level you can type the following command in the Terminal:
$ init 0
In this command, you are telling the system to change the Run level into Run
level 0 which is halting the system.
15
2.2.3.7ext3
Its a journaling file system used by Linux kernel to manage files and folders ,The main
advantage over ext2 is the ability to fast recovery after unclean shutdown. There are other file
systems like (Resier, fat, ntfs, ext4).
2.2.3.8swap
This partition is used for virtual memory by the Linux operating system. Linux uses it as
system RAM.
2.2.3.9Grub
(the Grand Unified Boot loader) , It starts the Linux kernel, Most Linux distributions that use
GRUB come with it installed and ready to use. Many of the distributions that do not have
GRUB installed by default have it available in their package systems; check there first before
doing a manual installation. If something goes wrong during your attempted GRUB install
you can cause your computer not to boot, So dont do anything if you dont know exactly
what you are doing.
2.2.3.10 GUI (graphical user interface)
Its the best solution for the end user to deal with Linux. Where the user clicks on a visual
screen that has icons, windows and menus, by using a pointing device, such as a mouse. GUI
is pronounced like "gooey"
GUI consists of X-window, Window manager and a desktop environment (KDE, GNOME).
GUI was created using X-window system software the x-window is the engine which we
refer to it as (X11) or (X). X-window allows programmers to run applications in windows but
its alone is never enough, we need window manager, GUI toolkits and Desktop
environments.
Window manager handles the widow placements & movements. It allows you to maximize
and minimize window.
2.2.3.11 KDE&GNOME
Both are complete desktop environments GNOME uses a window manager called metacity &
KDE uses Kwin. They may use any other window manager.There is no Best Desktop but,
there's the Best of you. Both can be customized to behave the way you want.
16
Debian
Ubuntu
Yellow dog
Mandriva
Gentoo
17
DONT USE root IN YOUR DIALY ROUTINE Because it may destroy your system.
Adding users:
Using root we can add users through terminal by writing
Adduser
username
Passwd
******
18
Super user (su): this command helps you to live in the root environment and can do what
you want but you actually not a root. See fig.(2-6) and you will notice the change from $
user sign to # root sign
Notes:
1- Each user has its own directory and its own desktop and this is applied for root.
2- root and users must have different password
3- When you are root it displays # in terminal and when you are user it displays $.
2.2.3.14 Parent directory and child directory:
We represent parent directory by .. , its the folder which contains the child directory .it
may contain more than one child directory and child directory may be parent for another
directories so parent and child is a relative relation between directories.
To go to parent directory
$cd..
To go to child directory
$cd ./childdierctoryname
19
2.2.4.1ls
Lists the contents of the current directory as shown
20
One of ls options is ls l which list the contents in the long list mode which displays the
permissions and infos about the contents as shown fig.(2-9)
Note:
1-You may noticed that ls displays the contents in different colours , this coluors represents
the type of the contents:
Directory
blue
Compressed files
Red
Executables
Green
Links
Light blue
Sockets
pink
Special devices
yellow
Table 2-3:colours on terminal
2- You can use ls a to see the configuration files at the home directory.
2.2.4.2cd
Its used to move between directories
To explain the cd command, we move to the Desktop directory after we execute the
command its displayed before $ sign.
So its now clearly we print before the $ sign your computer name, the user name and the
current directory.
21
Current directory:
Shell keeps tracking your current directory, because commands perform their jobs on the
current directory. For example, ls command which displays the current directory contents and
touch command which creates empty file in the current directory. Also to install package we
must have the package on the current directory.
To print the current directory
$ pwd
It will print it as shown
2.2.4.3cp
$ cp file1 file2
$cp -r dir1 dir2
The -r option is used to copy the directory & all its contents.
22
2.2.4.4mkdir dirname
to make a directory in the fig. (2-12) we made a directory called (moh) and we display the
contents before creating the directory and after creating it.
2.2.4.5halt
Shut down (Root only)
2.2.4.6reboot
Restart (Root only)
2.2.4.7whoami
prints the current user as shown in fig(2-13)
23
2.2.4.8which
Displays the path of the files, which runs when this name is given as a command. Next fig.
shows the first two no-response trails because (moh) cannot be given to shell its a directory,
and in the second trail we entered (vlc) but it was not installed.
2.2.4.9cat filename
Displays the content of a file
$ cat filename
2.2.4.10 head
Displays the first lines only from a file
$head filename
2.2.4.11 tail
Displays the last lines only from a file
$ tail filename
2.2.4.12 touch
Creates empty file of any format.
$ touch filename.xxx
24
2.2.4.13 rm
Its used to remove files we create new file then we display the contents of the current
directory , then we remove the file using rm command .
$rm filename
To force quit file or directory and its contents we use the rf option, see the fig.
$rm rf filename
2.2.4.14 mv
Its used to move files between folders
25
2.2.5 Permissions:
If you excute the ls l command you will notice a combination of letters before the name
of the file .
This letters represents the file type and The permissions for the owner, groups and others. See
fig (2-17)
26
The owner permissions are for the file creator the user who creates the file- the group
permissions are for the users in the same group of the owner and the others refers to the users
who are not in the same group.
2.2.5.1Setting permissions:
Read =4
Write=2
Execute=1
27
2.2.6 Links
One of the linux features is links which can be hard links the pointer & the pointee uses the
same inode-or symbolic links which has its own inode1.
Note:
To create links we use ln command
Creating hard link:
$ ln pointeefile pointer file
Creating symbolic links:
$ ln -s pointee pointer.
:w
Save file
:q
:q!
Force quit
Table 2-4:vi options
28
29
// install package
$rpm q packagename
30
you
wanted
to
install
freshrpms
repository,
go
to
this
site
http://www.ayo.freshrpms.net and download the package that fits with your system and
double-click on it to be installed. After that, you can go to /etc/yum.repos.d and you will find
a .repo file that corresponds to freshrpms repository. But, what is really happening?
2.2.9.2.3 Installing using yum
When you are about to install a specific software, you are really ordering your system to
search in your repositories for all the packages that this software needs to be installed on your
system. And after that it downloads the packages and installs it. Lets complete on the
previous example; you want to install VLC media player, you will type in the Terminal the
following command:
$ yum install vlc
31
Your system will search in the repositories - suppose it found it in freshrpms repository - so it
will download all the packages that VLC needs from this repository and installs them.
Note: rpm installs packages from your current directory ,yum uses the repositories to
download the required packages then install it.
2.2.9.3Installing from source code:
This is the steps to install source code:
1-download the source code from which has the extension (.tar.gz) or (.tar.bz2)
2-for (.tar.gz) use the following command to extract.
$tar -zxvf ./filename
For(.tar.bz2) use the following command to extract.
$tar jxvf ./filename
-z : uses gzip to decompress
-x:dextract the files from the compresed archive file.
-v display each file as its processed
-f the name of the file to extract.
3-the files will be extracted in a directory in the current directory, which will contains the
source code files.
4-go inside the directory and use configure command to prepare the installation files to be
compiled.
#./configure
This command will verify the environment, C++ compiler and it will create the make file.
make file contains the instructions for how the executable should be compiled to run on your
platform.
5-Now, its time to convert the text based source code to binary executable file
Using make command.
#make
32
Notes:
1-Before configure we may read the READ ME file or INSTALL file to know if theres
something special we have to do before configuring the files.
2-Generally: it will be installed in /usr/local/XXX.
3-to clean up go to the directory where it installed then use rm command to remove it.
2.2.11 Mounting:
2.2.11.1 Mounting device using mount command
This command is rarely used, because auto-mounting is one of the features in the new
versions of the distributions.
$mount t
filesystemtype devicename
$mount -a
devicename
mountpoint
mountpoint
(/dev/fd0)
34
2.2.12.1 Shortcuts
1- Type ls m push the tab key. Linux is going to beep a couple of times, but you keep
pushing. You will now see every file in the directory that begins with the letter 'm'.
2-to use a file you dont have to write the whole name you can write the first letters only and
press tab and Linux will write it for you.
3-the commands you are executing are saved in a file called history in your home directory.
You can write history in terminal and it will display the commands.
2.2.12.2 general notes:
1-We can change the password by logging in (root)
#passwd username
Then it will ask for the new password.
2-To know the file type you can use this command
$file filename
3-to print the passes where the system execute its executables automatically
$echo $PATH
echo is a command to display something on screen.
PATH is an environment variable3 where the executable paths are stored, to display all the
environment variables we can use this command
$env
To display all the variables -user and environment variables- use this command, we use less
because there a lot of variables.
$set |less
4-to know about your system go to /proc and execute this command.
$cat devices
$cat cpuinfo
$cat version
35
2.3 OSCAR
2.3.1 What is OSCAR?
OSCAR is a snapshot of the best known methods for building, programming, and using
clusters. It consists of a fully integrated and easy to install software bundle designed for high
performance computing (HPC) cluster. Everything needed to install, build, maintain, and use
a Linux cluster is included in the suite.
OSCAR is the primary project of the Open Cluster Group. For more information on the group
and its projects, visit its website http://www.openclustergroup.org/.
2.3.2
Installation steps
Errors:
error1 : ssh-check failed
error2 : yum-check failed
You shall uninstall SSH from yum and everything will be fine as the installer will install SSH
for you.
And make sure that there is no error while opening the Package Manager as we discovered
that yum is not working properly caused by changing data in repo files.
36
After these steps we could open the GUI of OSCAR but it wasn't complete as some of the
buttons were not active, and that is normal because you must go through the steps one by one.
2.3.2.1OSCAR GUI installation steps:
Solution: Tried to install them using yum, but they werent found in the installation DVD or
in Fedora repos, so we installed many repositories for yum like: rpmfusion, rpmforge, livna,
kiwzart but couldnt find those packages in those repos as well
37
So we removed the packages that needed those 4 packages, then it worked and OSCAR
server is now installed.
Note: We later found them on rpmfind.net
Step 4: Build OSCAR client image,
error: many packages all starting with opkg-*-client (OSCAR package) are not installed,
e.g:opkg-sis-client.
Solution: Use package manager to install them from your hard disk, by make a new yum
repo file pointing to local file on hard disk that contains the Oscar packages that was
extracted:
oscar-base-5.1rc1.tar
oscar-repo-common-rpms-5.1rc1.tar
oscar-repo-fc-8-i386-5.1rc1.tar
Then Package manager will find them and install them
38
Any distribution of GNU-Linux (Centos 5 in our case & this is the default).
From the INSTALLATION desiccations this was clear as the installed packages were:
ROLLS.
In simple words:
ROCKS= (Kernel + OS +Rolls) =New distribution of GNU-Linux.
2.4.2 ROLLS
Rolls in our cluster:
1. Base.(required)
2. Area51. (required)
3. HPC. (required)
4. Web-server. (required)
5. Ganglia.
6. Sge.
7. Xen.
8. Java.
Other rolls might be added from www.rocksclusters.org in the download department.
2.4.2.1Installing Rolls
On a new cluster:
The Roll should be installed during the initial installation of your cluster as introduced
before.
ON an Existing cluster:
After installing the cluster any Rolls might be added on the frontend.
Assume that you have an iso image of the roll called area51.iso.
39
Very important note: The roll that will be installed must be a real roll (i.e. Dont try to use
this method in installing any packages except rolls).
2.4.2.2 Base
The base contains the core rocks structure that is used to install, configure and maintain
clusters. So Base roll makes the collection between the kernel and Operating system from a
side and the other ROLLS in another side.
2.4.2.3Area51
Contain two packages:
Tripewire: Tripwire is configured to automatically scan the files on your frontend daily.
Open the Tripewire from the home page (localhost).
Chkrootkit: To see if your frontend has been infected by a rootkit, execute:
# /opt/chkrootkit/bin/chkrootkit
2.4.2.4HPC:
The primary purpose of the HPC Roll is to provide configured software tools that can be used
to run parallel applications on your cluster.
The following software packages are included in the HPC Roll:
PVM
Very important note: we will not use this benchmarks as we will use Cbench which contain
this benchmarks and other ones.
40
2.4.2.5Ganglia:
Monitor the activity of each node.
Open ganglia from the home page (localhost).
Disadvantage: you might feel that ganglia is slow in monitoring
2.4.2.6SGE (SUN Grid Engine scheduler)
SGE is a distributed resource management software and it allows the resources within the
cluster (cpu time,software, licenses etc) to be utilized effectively.
Another thing that the Roll does is that that generic queues are setup automatically the
moment new nodes are being integrated within the Rocks cluster and booted up.
2.4.3 Installation:
Source of Rocks: www.rocksclusters.org
You have two options to download from the site. The first is through DVD, the other is
through several CDs, on both ways, there are required materials that you have to download
and optional material you can download it or not.
Kernel/Boot Roll CD
Base Roll CD
Web Server Roll CD
OS Roll CD - Disk 1
OS Roll CD - Disk 2
Disk Capacity: 30 GB
Memory Capacity: 1 GB
Disk Capacity: 30 GB
Memory Capacity: 1 GB
On our installation we didnt use the optional materials; we used only the required material
DVD copy. Installation contains three steps:
1- Installing frontend.
2- Installing computed nodes
3- Check the network and connectivity between nodes.
2.4.3.3 Installing frontend:
First connect the node to a network have DHCP server in the first network card public card.
Insert the DVD the power on the node.
1-
Now type frontend word to enter the installation of the frontend. The node will take
the public IP from the DHCP server and take the private address as default 10.1.1.0/24.
The next screen will appear to you:
42
2-
Be sure that the DVD on the drive and press CD/DVD-based Roll. Another message
3-
The cluster information screen will appear. You can fill it with any data you want
43
4-
Two messages will appear contain the public and private IPs. Be sure that you noted
these IPs.
Note: you can change the private IP or the Netmask to anything you want "it should be on
the range of private network IPs
44
Note that eth0 is for private address and eth1 is for public address.
Gateway and DNS servers it will be filled by DHCP server what you will have to do is
press next.
45
46
First delete all exists partitions and begin to create new ones
Specify 16 GBs at least for the root partition under mount point of /
The frontend will format its Disk, and then it will ask for DVD to download the rolls from it.
47
5-
6-
On The first time you open the terminal, the system will ask you to configure SSH.
In the first one just copy the file (/root/.ssh/identity. In the second , enter the password you
want for SSH.
48
cross-over
cable.
2-
3-
Choose compute from this menu, you may have to choose Ethernet switches if
you are connecting the node to the frontend through programmable switch note that if your
switch is not programmable you will choose compute also.
4-
49
Switch on the computed node and boot from the Rocks CD. Here do nothing to enter
6-
Now the frontend discovered the computed node which has this MAC address.
50
Insert-ethers discovered the node but the node didnt request a kickstart file yet.
Figure 2-37: The node has successfully received the kick start file
The node has successfully requested the kick start file. And the installation will begin to
install the files on the computed node.
Note: a message may appear to you telling you that there is a missing file with two options
"reboot or retry". Just insert the CD again and press retry .
2.4.3.5 Check connection between the frontend and any computed node:
If there is a problem on connection between the frontend and computed node the installation
will not complete.
51
After you finished the installation of the computed node, you have to check the connection
between it and the frontend through any way of these ways:
2.4.3.5.1 By SSH:
On the computed node write this command
SSH frontend or SSH (then IP of the front end without brackets)
Then enter the password of SSH. If you entered the front end, the connection is ok. If not,
you have to check this problem.
2.4.3.5.1.1 By ping:
On the computed node write this command Ping frontend or ping (then IP of the front end
without brackets) If it replies, the connection is ok. If not the connection is lost.
2.4.4.2RAMS problem
At the first we have nodes with 256 and 512 MB ram. Every time we try to install it we faced
these errors:
Solution: Finally after a lot of trials and searches we found that the min requirement to the
frontend or the computed nodes is 1 GB Ram.
Explanation: the cluster couldnt cache all its data on rams less than 1 GB so errors
appeared.
52
Explanation: installing the computed node doesnt depend on the CD only, but also depend
on some files on the frontend which the computed node get it through the network.
3. After installing the two packages; the device manager will keep the status of the
video card unknown device but it is successfully installed.
2.4.5.2Disable rolls:
# rocks disable roll ganglia version=5.0 arch=i386
Here we disable ganglia roll ver. 5 with arch=i386
2.4.5.3Enable rolls:
# rocks enable roll ganglia version=5.0 arch=i386
54
APPLIANCE
frontend
compute-0-1: compute
55
56
2.5 Cbench
2.5.1 What is Benchmarking
The term Benchmarking is a general term, it exists in many fields. Here, we are interested
in the definition of Benchmarking in computer cluster.
2.5.2
Benchmarking is the process of comparing your resources (cost, cycle time, productivity,
quality ....) to a specific standard or a best practice.
57
compute clusters. Cbench is used to facilitate scalable testing, benchmarking, and analysis of
a Linux parallel compute cluster.
Node Level.
2.
3.
2.5.4.1NODE LEVEL:
In this testing level we are only concerned with testing each node separately, without
worrying about high-speed interconnects or system MPI libraries, etc.
The goal of this test is to determine the node that has acceptable performance.
Examples:
58
Choosing lapack
Make and install the standard collection of tests.
Note: While installing try to be connected to the internet, because some
packages will be downloaded.
Execute:
$ make install
This compiles and installs the standard collection of test binaries in $CBENCHOME/bin.
So you find the binaries in /opt/cbench/bin, which was not present before installation.
Install the Cbench test set tree.
Execute:
$ make installtests
This will install all test sets that are currently packaged in Cbench into the tree specified by
the CBENCHTEST environment variable.
So you will find the installed packages in /opt/cbench/cbench_tests
Compiling non-default parts of Cbench.
Some code within Cbench is not compiled by default. Example: HPC Challenge and NAS
Parallel Benchmarks.
Building HPCC with the system MPI
Execute:
$ make -C opensource/hpcc distclean
$ make -C opensource/hpcc
$ make -C opensource/hpcc install
$ make itests (to update the CBENCTEST tree)
Building NPB
Execute:
$ make -C opensource/NPB
To make sure your NPB binaries get updated in the CBENCHTEST tree, from the top-level
of the Cbench tree run :
sbin/install_npb
Building the self-contained MPICH for Cbench.
Execute:
$ make -C opensource/mpich
Building only what is required for node-level hardware testing
60
Execute:
$ make nodehwtest
EROOR: cannot find llapack
This error means that lapack library, which we choose to be the BLAS library cannot be
found.
The solution: Download lapack from package manger.
Type lapack in the search and install lapack devel .
Then execute:
$ make nodehwtest
Building HPCC with the Cbench MPICH
Execute:
$ make -C opensource/hpcc distclean
$ make -C opensource/hpcc local
$ make -C opensource/hpcc install
$ make itests (to update the CBENCTEST tree)
61
Chapter 3:
Parallel Programming
62
Chapter 3:
Parallel programming
Then we distribute the code and coordinate the communication and data transferring between
the head node and the computing nodes
do
not
contain
any
parallelism,
there
is
63
only
one
processor
The above figure shows the SISD performs one instruction per cycle.
The solution for this problem is Pipelining the advantage with all the functional units being
busy so it will produce one result per cycle.
64
Example:
Adding two matrices A + B = C.
Say we have two matrices A and B of order 2 and we have 4 processors.
A11 + B11 = C11 ... A12 + B12 = C12
A21 + B21 = C21 ... A22 + B22 = C22
The same instruction is issued to all 4 processors (add the two numbers) and all
processors
execute
the
instructions
simultaneously.
65
There is a huge disadvantage is that all processors have to share the bandwidth provided by
the bus.
This type of shared memory may be found in desktop systems and small servers. To
overcome the BW problem direct connection from memory to CPU is desired.
66
Advantage:
The big advantage of shared memory systems is that all processors can make use of the whole
memory.
Disadvantage:
Shared memory solves the inter-processor communication problem but introduces the
problem
of
simultaneous
accessing
of
the
same
location
in
the
memory.
The limiting factor to their performance is the number of processors and memory modules
that can be connected to each other.
3.1.3.3.2 Distributed memory
Each CPU has its own local memory and CPU can only access its own memory.
67
The importance of the network connections is lower than in the case of a shared memory
system, distributed memory systems can be hugely expanded
3.1.3.4Gathering up:
Distributed Memory
Shared Memory
(100's - 1000's)
(10's - 100's)
High power
Modest power
Unlimited expansion
Limited expansion
Revolutionary parallel
Evolutionary parallel
programming
programming
68
3.1.3.5Cluster
A cluster consists of several cheap computers (nodes) linked together. The simplest case is
the combination of several desktop computers
Advantages:
Clusters offer lots of computing power for little money; they are ideally suited to problems
with a high degree of parallelism.
Its easy to upgrade a cluster.
Disadvantages:
32 bit systems cannot address more than 4 GB of RAM and x 86-64 systems are limited by
the number of memory slots, the size of the available memory modules, and the chip sets
69
virtual machine monitor (VMM), is a computer software/hardware platform virtualization software that allows
70
assuming we have :
N=# CPUs
t: time to perform addition.
q: # stages.
C: time to collect results from sub-sums.
71
To reduce the evaluation of the dot product into a summation of two numbers on each
processor we want to have N=2^q.
In our case (Error!
Its clearly even if we ignore the communication time the efficiency still less than perfect
So we can notice that there are factors limits the parallelism degree e.g the number of CPUs
doesnt match exactly with the problem. The latency time which may slow the data
transferring and the memory limits as the problem may not fit in the memory.
72
Figure 3-12: Speed up vs number of processors for three different degree of parallelization
We can notice from Error! Reference source not found., its not only about the number of
processors, we may have only five cores and we have the job done exactly like 100 cores.
This is directly depending on the degree of parallelization ().
Notes:
1-Amdahls law relies on the assumption that the serial work () does not depend on
the size of the problem.
2-In practice: decreases as a function of problem size and the upper bound of the
speed factor usually increases as a function of the problem size.
3- Amdahls law represents the upper limit of the speed up. But it ignores the
processors efficiency as it represents the speed up as a function of number of CPUs
only.
73
a(n)+b(n)=1 equ.2
Equation2 : the execution of program under Gustafsons law
Where:
a(n) is the serial fraction and b(n) is the parallel fraction .
P: the numbers of CPUs.
Then;
Speed up =a(n)+P*b(n). equ.3
From equ.2 and equ.3
Speed up =a(n)+P(1-a(n)).
Equation4: Speed up under Gustafsons law
Its clearly the increasing of CPUs will not affect the serial part.
74
3.2 MPI
Message Passing Interface (MPI) is a portable standard API for programming parallel
computers. It is a language-independent communications protocol used to program parallel
computers.
The basic idea behind MPI is that multiple parallel processes working concurrently towards a
common goal using "messages" as their means of communicating with each other.
MPI is a static library linked to C code to enable you to parallelize your code by sending
messages to other processes. See the following example:
Hellowrold.c
#include <stdio.h>
#include <mpi.h>
int main (argc, argv)
int argc;
char *argv[];
{
int rank, size,
/* starts MPI */
/* get
current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of
processes */
printf( "Hello world from process %d of %d\n", rank, size );
MPI_Finalize();
return 0;
}
The previous code is a simple C file to make every process print its hello world message in
the terminal. Don't worry about understanding this code for now; instead we will discuss the
called subroutines of MPI in the code. But, the important thing that we want to mention is
75
that this file is downloaded on each node (process) and each of them executing its file
according to its rank5.
In the coming part we will discuss the very important subroutines in MPI which you will
need to begin your first step in parallel programming.
From the previous figure you can see that each process has its own rank and they all belong
to the default communicator (MPI_COMM_WORLD).
76
Again, you can see from the previous figure that each process has its own rank inside the
default communicator and also it has its own rank inside another communicator (BLACK,
WHITE, BLUE)
3.2.1.3Group
Group is a communicator that holds some processes. This communicator resides in the
default communicator (MPI_COMM_WORLD)
3.2.1.4MPI message
MPI message consists of two parts (Data & envelope). Data is the portion that holds
information about your message (e.g.: the buffer which holds the data, the length of the data
and the data type). Envelope is the other portion that holds information about the sending and
receiving operation (e.g.: the destination, the communicator of the process )
77
User
User Buffer
System Buffer
When you are sending data, the data first is put in a user buffer then go to a send buffer which
sends it to the connection medium. In a Blocking communication like MPI_ Send, the routine
doesnt return until the user buffer is empty which holds safe operation. But, in a Nonblocking communication like MPI_Isend, the routine returns immediately without checking
the user buffer.
Blocking send
MPI_Send(buffer,count,type,dest,tag,comm)
Non-blocking sends
MPI_Isend(buffer,count,type,dest,tag,comm,request)
Blocking receive
MPI_Recv(buffer,count,type,source,tag,comm,status)
Non-blocking receive
MPI_Irecv(buffer,count,type,source,tag,comm,request)
3.2.2.1.2 MPI_Comm_size
Syntax: MPI_Comm_size (comm, int &size)
This instruction gets you the number of processes in the specified Communicator
comm: the process Communicator.
&size: integer variable to store the number of processes (the & sign means pass by reference)
3.2.2.1.3 MPI_Comm_rank
Syntax: MPI_Comm_rank (comm,&rank)
This instruction is used to get the number (rank) of the process in the specified
Communicator.
&rank: integer variable to store the rank of the process.
3.2.2.1.4 MPI_Finalize
Syntax: MPI_Finalize()
Terminates the MPI execution environment. This function should be the last MPI routine
called in every MPI program - no other MPI routines may be called after it.
3.2.2.2Other Subroutines
3.2.2.2.1
MPI point-to-point operations typically involve message passing between two, and only two,
different MPI tasks. One task is performing a send operation and the other task is performing
a matching receive operation. You can refer to Table 3.1 to see the basic Subroutines of
Point-to-Point communication.
Lets discuss each of these Subroutines:
3.2.2.2.1.1 MPI_Send
Syntax: MPI_Send(&buffer,count,type,dest,tag,comm)
79
This is a Blocking send, which means that after calling the MPI_Send Subroutine it wont
return until it checks if the user buffer is empty or not to guarantee that it can send another
data in the user buffer.
buffer: variable to store the data which will be sent.
count: the length of the data
type: the data type (int, char.)
dest: the rank of the destination process
tag: arbitrary integer assigned by the programmer to uniquely identify the message, for send
and receive operation they must have the same tag
comm: the communicator used in the communication process.
As we mentioned previously, the first three arguments are called Data and the others are
called the envelope
3.2.2.2.1.2 MPI_Recv
Syntax: MPI_Recv (&buffer,count,datatype,source,tag,comm)
This Subroutine is a Blocking receive, Blocking has the same concept discussed before in the
send operation
The arguments here are the same as the MPI_Send Subroutine except for source argument
which is used to define the rank of the process which will receive the data.
Note:
We have two other Subroutines which are MPI_ISend and MPI_IRecv. The only difference is
that MPI_ISend means immediate send which means call the MPI_Send subroutine but,
return immediately from it without checking the user buffer to be empty.(the same concept is
applied to MPI_IRecv)
80
3.2.2.2.2.1 MPI_Bcast
Syntax: MPI_Bcast(&buffer,count,datatype,root,comm)
This subroutine broadcasts the message from the process of the rank "root" to all the
processes in the communicator.
buffer: variable to store the message
count: the length of the message
datatype: the type of the message.
root: the rank of the process which broadcasts the message.
comm: the communicator.
3.2.2.2.2.2 MPI_Scatter
Syntax: MPI_Scatter(&sendbuf,sendcnt,sendtype,&recvbuf,recvcnt,recvtype,root,comm)
This subroutine distributes a single message to all the processes in the communicator. To
understand this illustration, take a look at the following figure:
3.2.2.2.2.3 MPI_Gather
Syntax: MPI_Gather(&sendbuf,sendcnt,sendtype,&recvbuf,recvcount,recvtype,root,comm)
81
3.2.2.2.2.4 MPI_Reduce
Syntax: MPI_Reduce(&sendbuf,&recvbuf,count,datatype,operation,root,comm)
This subroutine does a reduction operation on all the messages and put the result in the
process with rank "root"
Operation: MPI functions that define the operation to be applied on the messages, see the
following table:
Operation
Description
MPI_MAX
Maximum
MPI_MIN
Minimum
MPI_SUM
Summation
MPI_PROD
Product
MPI_LAND
Logical AND
MPI_BAND
bit-wise AND
82
The command ( rocks sync users ) make synchronization for the new user on all nodes (i.e. it
makes directory (commfe) in home).
But sometimes this command does not work , so how to make synchronization.
1. First of all after you make the new user (commfe) change
to this user by executing:
$ su commfe (the OS will not ask for the password as you was the root)
2. When you change to this user for the first time the OS will
ask you for entering the SSH phrase ( we make it
sshcommfe).
3. As we said making new user make new directory use its
name in home (/home)
But if you open (/home) with the Graphical interface you will not find it
but you find it in (/export/home).
So to make this directory appear move to /home when you are commfe
user ,execute:
$ cd /home
83
$ ls
You will find commfe.
4. You must make the previous steps with every node.
A. Change user to commfe
$ su commfe
B. Move to the node.
$ ssh compute-0-0 (for example).
C. Enter the SSH phrase (sshcommfe).
D. Move to /home.
$ cd /home
E. Make sure user directory is made.
$ ls
F. Repeat the previous steps with all nodes.
With these steps we create a new user & make synchronization to
it.
Write the name of node you want to run the code on.
compute-0-0
compute-0-1
84
.
o
Make sure that the file machines has the all permeations
(read,write,execute)6
Move to its home (you must run when you are standing on
home)
$ cd
o
/opt/mpi-tests/bin/mpi-ring
mpirun is our command so we verify its path opt/openmpi/bin.
-machinefile is an option to make us verify the nodes.
machines the file that contain the name of nodes.
mpi-ring the code that will run ,so we verify its path
/opt/mpi-
tests/bin/mpi-ring
-np 2 this means the number of process (NOT PROCESSORS).
for example:
you make it -np 2 & the run 3 node ,this means that the cluster will use 2
node & 1 node will be idle. As 1 process will run on each node.
85
86
58.11
29.10
19.40
14.53
used nodes
Time in sec
12.64
10.76
We also can have better time as we can increase the no of processes as there still idle cores.
70
60
Time (sec)
50
40
30
20
10
0
1
Number of nodes
87
3.4 PVM
3.4.1 Introduction
Parallel processing, the method of having many small tasks solve one large problem, has
emerged as a key enabling technology in modern computing. The past several years have
witnessed an ever-increasing acceptance and adoption of parallel processing, both for highperformance scientific computing and for more ``general-purpose'' applications, was a result
of the demand for higher performance, lower cost, and sustained productivity. The
acceptance has been facilitated by two major developments: massively parallel processors
(MPPs) and the widespread use of distributed computing.
The Parallel Virtual Machine (PVM) system uses the message-passing model to allow
programmers to exploit distributed computing across a wide variety of computer types,
including MPPs. A key concept in PVM is that it makes a collection of computers appear as
one large virtual machine, hence its name.
88
target applications are written in C and Fortran, with an emerging trend in experimenting
with object-based languages and methodologies.
The C and C++ language bindings for the PVM user interface library are implemented as
functions, following the general conventions used by most C systems, including Unix-like
operating systems. To elaborate, function arguments are a combination of value parameters
and pointers as appropriate, and function result values indicate the outcome of the call. In
addition, macro definitions are used for system constants, and global variables such as errno
and pvm_errno are the mechanism for discriminating between multiple possible outcomes.
Application programs written in C and C++ access PVM library functions by linking against
an archival library (libpvm3.a) that is part of the standard distribution.
Below is a simple PVM example:
main()
{
int cc, tid, msgtag;
char buf[100];
printf("i'm t%x\n", pvm_mytid());
cc = pvm_spawn("hello_other", (char**)0, 0, "", 1, &tid);
if (cc == 1) {
msgtag = 1;
pvm_recv(tid, msgtag);
pvm_upkstr(buf);
printf("from t%x: %s\n", tid, buf);
} else
printf("can't start hello_other\n");
pvm_exit();
}
Figure 3-19: PVM program hello.c
Shown in Figure 3-19 is the body of the PVM program hello, a simple example that
illustrates the basic concepts of PVM programming. This program is intended to be invoked
manually; after printing its task id (obtained with pvm_mytid()), it initiates a copy of another
program called hello_other using the pvm_spawn() function. A successful spawn causes the
program to execute a blocking receive using pvm_recv. After receiving the message, the
program prints the message sent by its counterpart, as well its task id; the buffer is extracted
from the message using pvm_upkstr. The final pvm_exit call dissociates the program from
the PVM system.
89
#include "pvm3.h"
main()
{
int ptid, msgtag;
char buf[100];
ptid = pvm_parent();
strcpy(buf, "hello, world from ");
gethostname(buf + strlen(buf), 64);
msgtag = 1;
pvm_initsend(PvmDataDefault);
pvm_pkstr(buf);
pvm_send(ptid, msgtag);
pvm_exit();
}
Figure 3-20: PVM program hello_other.c
Figure 3-20 is a listing of the ``slave'' or spawned program; its first PVM action is to obtain
the task id of the ``master'' using the pvm_parent call. This program then obtains its
hostname and transmits it to the master using the three-call sequence - pvm_initsend to
initialize the send buffer; pvm_pkstr to place a string, in a strongly typed and architectureindependent manner, into the send buffer; and pvm_send to transmit it to the destination
process specified by ptid, ``tagging'' the message with the number 1.
Xnetlib is a X-Window interface that allows a user to browse or query netlib for available
software and to automatically transfer the selected software to the user's computer. To get
90
xnetlib send email to netlib@netlib.org with the message send xnetlib.shar from
xnetlib
The PVM software can be requested by email. To receive this software send email to
netlib@netlib.org
with the message: send index from pvm3. An automatic mail handler
will return a list of available files and further instructions by email. The advantage of this
method is that anyone with email access to Internet can obtain the software.
The PVM software is distributed as a uuencoded, compressed, tar file.
the
content of file $PVM_ROOT/lib/cshrc.stub. The stub should be placed after PATH and
PVM_ROOT are defined. This stub automatically determines the PVM_ARCH for this host
and is particularly useful when the user shares a common file system (such as NFS) across
several different architectures.
The PVM source comes with directories and makefiles for most architectures you are likely
to have. Building for each architecture type is done automatically by logging on to a host,
going into the PVM_ROOT directory, and typing make. The makefile will automatically
determine which architecture it is being executed on, create appropriate subdirectories, and
build pvm, pvmd3, libpvm3.a, and libfpvm3.a, pvmgs, and libgpvm3.a. It places all these
91
Create a .rhosts file on each host listing all the hosts you wish to use
Create a $HOME/.xpvm_hosts file listing all the hosts you wish to use prepended by
an ``&''.
and you should get back a PVM console prompt signifying that PVM is now running on this
host. You can add hosts to your virtual machine by typing at the console prompt
pvm> add hostname
And you can delete hosts (except the one you are on) from your virtual machine by typing
pvm> delete hostname
If you get the message ``Can't Start pvmd,'' then check the common startup problems section
and try again.
To see what the present virtual machine looks like, you can type
pvm> conf
To see what PVM tasks are running on the virtual machine, you type
pvm> ps -a
Of course you don't have any tasks running yet; that's in the next section. If you type ``quit"
at the console prompt, the console will quit but your virtual machine and tasks will continue
to run. At any Unix prompt on any host in the virtual machine, you can type
% pvm
and you will get the message ``pvm already running" and the console prompt. When you are
finished with the virtual machine, you should type
92
pvm> halt
This command kills any PVM tasks, shuts down the virtual machine, and exits the console.
This is the recommended method to stop PVM because it makes sure that the virtual machine
shuts down cleanly.
You should practice starting and stopping and adding hosts to PVM until you are comfortable
with the PVM console. A full description of the PVM console and its many command options
is given at the end of this chapter.
If you don't want to type in a bunch of host names each time, there is a hostfile option. You
can list the hostnames in a file one per line and then type
% pvm hostfile
PVM will then add all the listed hosts simultaneously before the console prompt appears.
Several options can be specified on a per-host basis in the hostfile . These are described at the
end of this chapter for the user who wishes to customize his virtual machine for a particular
application or environment.
There are other ways to start up PVM. The functions of the console and a performance
monitor have been combined in a graphical user interface called XPVM , which is available
precompiled on netlib (see Chapter 8 for XPVM details). If XPVM has been installed at your
site, then it can be used to start PVM. To start PVM with this X window interface, type
% xpvm
The menu button labled ``hosts" will pull down a list of hosts you can add. If you click on a
hostname, it is added and an icon of the machine appears in an animation of the virtual
machine. A host is deleted if you click on a hostname that is already in the virtual machine.
On startup XPVM reads the file $HOME/.xpvm_hosts, which is a list of hosts to display in
this menu. Hosts without leading ``\&" are added all at once at startup.
The quit and halt buttons work just like the PVM console. If you quit XPVM and then restart
it, XPVM will automatically display what the running virtual machine looks like. Practice
starting and stopping and adding hosts with XPVM. If there are errors, they should appear in
the window where you started XPVM.
93
first check that your .rhosts file on the remote host contains the name of the host from
which you are starting PVM. An external check that your .rhosts file is set correctly is to
type
% rsh remote_host ls
If your .rhosts is set up
correctly, then you will see a listing of your files on the remote
host.
Other reasons to get this message include not having PVM installed on a host or not having
PVM_ROOT set correctly on some host. You can check these by typing
% rsh remote_host $PVM_ROOT/lib/pvmd
Some Unix shells, for example ksh, do not set environment variables on remote hosts when
using rsh. In PVM 3.3 there are two work arounds for such shells. First, if you set the
environment variable, PVM_DPATH, on the master host to pvm3/lib/pvmd, then this will
override the default dx path. The second method is to tell PVM explicitly were to find the
remote pvmd executable by using the dx= option in the hostfile.
If PVM is manually killed, or stopped abnormally (e.g., by a system crash), then check for
the existence of the file /tmp/pvmd.<uid>. This file is used for authentication and should
exist only while PVM is running. If this file is left behind, it prevents PVM from starting.
Simply delete this file.
If the message says
[t80040000] Login incorrect
it probably means that no account is on the remote machine with your login name. If your
login name is different on the remote machine, then you must use the lo= option in the
hostfile.
94
If you get any other strange messages, then check your .cshrc file. It is important that you
not have any I/O in the .cshrc file because this will interfere with the startup of PVM. If you
wish to print out information (such as who or uptime) when you log in, you should do it in
your .login script, not when you're running a csh command script.
The examples directory contains a Makefile.aimk and Readme file that describe how to build
the examples. PVM supplies an architecture-independent make, aimk, that automatically
determines PVM_ARCH and links any operating system specific libraries to your
application. aimk was automatically added to your $PATH when you placed the cshrc.stub
in your .cshrc file. Using aimk allows you to leave the source code and makefile unchanged
as you compile across different architectures.
The master/slave programming model is the most popular model used in distributed
computing. (In the general parallel programming arena, the SPMD model is more popular.)
To compile the master/slave C example, type
% aimk master slave
If you prefer to work with Fortran, compile the Fortran version with
% aimk fmaster fslave
Depending on the location of PVM_ROOT, the INCLUDE statement at the top of the Fortran
examples may need to be changed. If PVM_ROOT is not HOME/pvm3, then change the
include to point to $PVM_ROOT/include/fpvm3.h. Note that PVM_ROOT is not expanded
inside the Fortran, so you must insert the actual path.
The makefile moves the executables to $HOME/pvm3/bin/PVM_ARCH, which is the default
location PVM will look for them on all hosts. If your file system is not common across all
95
your PVM hosts, then you will have to build or copy (depending on the architectures) these
executables on all your PVM hosts.
Now, from one window, start PVM and configure some hosts. These examples are designed
to run on any number of hosts, including one. In another window cd to
$HOME/pvm3/bin/PVM_ARCH
and type
% master
The program will ask how many tasks. The number of tasks does not have to match the
number of hosts in these examples. Try several combinations.
The first example illustrates the ability to run a PVM program from a Unix prompt on any
host in the virtual machine. This is just like the way you would run a serial a.out program on
a workstation. In the next example, which is also a master/slave model called hitc, you will
see how to spawn PVM jobs from the PVM console and also from XPVM.
hitc
illustrates dynamic load balancing using the pool-of-tasks paradigm. In the pool of tasks
paradigm, the master program manages a large queue of tasks, always sending idle slave
programs more work to do until the queue is empty. This paradigm is effective in situations
where the hosts have very different computational powers, because the least loaded or more
powerful hosts do more of the work and all the hosts stay busy until the end of the problem.
To compile hitc, type
% aimk hitc hitc_slave
Since hitc does not require any user input, it can be spawned directly from the PVM
console. Start up the PVM console and add a few hosts. At the PVM console prompt type
pvm> spawn -> hitc
The ``->" spawn option causes all the print statements in hitc and in the slaves to appear in
the console window. This feature can be useful when debugging your first few PVM
programs. You may wish to experiment with this option by placing print statements in hitc.f
and hitc_slave.f and recompiling.
hitc
can be used to illustrate XPVM's real-time animation capabilities. Start up XPVM and
build a virtual machine with four hosts. Click on the ``tasks" button and select ``spawn" from
the menu. Type ``hitc" where XPVM asks for the command, and click on ``start". You will
see the host icons light up as the machines become busy. You will see the hitc_slave tasks get
96
spawned and see all the messages that travel between the tasks in the Space Time display.
Several other views are selectable from the XPVM ``views" menu. The ``task output" view is
equivalent to the ``->" option in the PVM console. It causes the standard output from all tasks
to appear in the window that pops up.
There is one restriction on programs that are spawned from XPVM (and the PVM console).
The programs must not contain any interactive input, such as asking for how many slaves to
start up or how big a problem to solve. This type of information can be read from a file or put
on the command line as arguments, but there is nothing in place to get user input from the
keyboard to a potentially remote task.
The -n option is useful for specifying an alternative name for the master pvmd (in case
hostname doesn't match the IP address you want). Once PVM is started, the console prints the
prompt
pvm>
and accepts commands from standard input. The available commands are
add: followed by one or more host names, adds these hosts to the virtual machine.
Alias: defines or lists command aliases.
Conf: lists the configuration of the virtual machine including hostname, pvmd task ID,
architecture type, and a relative speed rating.
Delete: followed by one or more host names, deletes these hosts from the virtual machine.
PVM processes still running on these hosts are lost.
97
PVM supports the use of multiple consoles. It is possible to run a console on any host in an
existing virtual machine and even multiple consoles on the same machine. It is also possible
to start up a console in the middle of a PVM application and check on its progress.
DTID
99
Compute-0-0...
Verifying Local Path to "rsh"...
Rsh found in /usr/bin/rsh - O.K.
Testing Rsh/Rhosts Access to Host "compute-0-0"...
Rsh/Rhosts Access FAILED - "compute-0-0: Connection refused"
Connect to compute-0-0 port 543: Connection refused
Trying krb4 rlogin
Connect to 10.255.255.254 port 543: Connection refused
Trying normal rlogin
(/usr/bin/rlogin)
The connection fails, by searching in /etc/xinetd.d, you shall find rsh and rlogin files, when
opening them we found that disabled=yes, so we changed it to disabled=no, in both files to
enable the 2 services to be enabled.
Trying to connect again, we found the same error message.
When logging into compute-0-0 we found that rsh and rlogin are not installed, so we installed
rsh-server and xinetd which is needed by rsh on the compute node, unfortunately the
connection failed again.
At last we found out that Rocks by default disables the rsh and rlogin services, they are
replaced by ssh, which is safer.
So to enable rsh do the following steps:
1-
# cp /state/partition1/home/install/rocksdist/lan/i386/build/graphs/default/base-rsh.xml \
/state/partition1/home/install/site-profiles/4.3/graphs/default/
2- When opening base-rsh.xml file from the new location you shall find:
<!-- Uncomment to enable RSH on your cluster
<edge from="client">
<to>xinetd</to>
<to>rsh</to>
</edge>
-->
3-
Follow the instruction and uncomment this block. This will force all appliance
types that reference the client class (compute nodes, nas nodes, ...) to enable an rsh
service that trusts all hosts on the private side network. This uncommented block should
look like this:
100
<edge from="client">
<to>xinetd</to>
<to>rsh</to>
</edge>
defines the initial configuration of hosts that PVM combines into a virtual
machine. It also contains information about hosts that you may wish to add to the
configuration later.
The hostfile in its simplest form is just a list of hostnames one to a line. Blank lines are
ignored, and lines that begin with a # are comment lines. This allows you to document the
hostfile and also provides a handy way to modify the initial configuration by commenting out
various hostnames (see Figure 3-21).
# configuration used for my run
sparky
azure.epm.ornl.gov
thud.cs.utk.edu
sun4
Figure 3-21: Simple hostfile listing virtual machine configuration
Several options can be specified on each line after the hostname. The options are separated by
white space.
101
lo= userid
allows you to specify an alternative login name for this host; otherwise, your login
name on the start-up machine is used.
so=pw
will cause PVM to prompt you for a password on this host. This is useful in the cases
where you have a different userid and password on a remote system. PVM uses rsh by
default to start up remote pvmd's, but when pw is specified, PVM will use rexec()
instead.
dx= location of pvmd
allows you to specify a location other than the default for this host. This is useful if
you want to use your own personal copy of pvmd,
ep= paths to user executables
allows you to specify a series of paths to search down to find the requested files to
spawn on this host. Multiple paths are separated by a colon. If ep= is not specified,
then PVM looks in $HOME/pvm3/bin/PVM_ARCH for the application tasks.
sp= value
specifies the relative computational speed of the host compared with other hosts in the
configuration. The range of possible values is 1 to 1000000 with 1000 as the default.
bx= location of debugger
specifies which debugger script to invoke on this host if debugging is requested in the
spawn
routine.
Note: The environment variable PVM_DEBUGGER can also be set. The default
debugger is pvm3/lib/debugger.
wd= working_directory
specifies a working directory in which all spawned tasks on this host will execute.
The default is $HOME.
ip= hostname
102
which you should relay back to the master pvmd. At that point, you will see
Thanks
103
104
References
[1]
https://computing.llnl.gov/tutorials/mpi/#Collective_Communication_Ro utines
[2]
[3]
[4]
High performance linux cluster with Oscar ,Rocks, openmosix and MPI by joseph
D. sloan
[5]
http://www.cm.cf.ac.uk/Parallel/Year2/
[6]
[7]
http://en.wikipedia.org/wiki/Gustafson's_Law
[8]
www.gnome.org
[9]
www.kde.org
[10]
www.linuxreviews/software/desktops
[11]
www.wikipedia.org
[12]
www.rocksclusters.org
[13]
http://www.netlib.org/pvm3/book/pvm-book.html
[14]
[15]
http://web.mit.edu/kerberos/
105
Job
halt
init0
shutdown system
Reboot
restart system
locate missingfilename
which missingfilename
ls l
rm name
rm -rf name
cat filetoview
cp filename /home/dirname
mv filename /home/dirname
tail filetoview
head filetoview
106
./configure
adduser accountname
passwd accountname
su
exit
chmod 7 5 5 filename
107
* mpi_latency
* mpi_overhead - simple MPI memory overhead measurer
* mpi_slowcpu
* mpi_tokensmash
* word9
* LAMMPS .
* MPQC .
To find the locations where you can get source code of each package check the
documentation downloaded with Cbench.
109
pvm_addhosts
pvm_barrier blocks the calling process until all processes in a group have called it.
pvm_bcast
pvm_bufinfo
pvm_exit
pvm_f772sci
pvm_gettid
returns the tid of the process identified by a group name and instance number
pvm_gsize
pvm_halt
pvm_joingroup
pvm_kill
pvm_lvgroup
pvm_mytid
pvm_parent
receive a message.
pvm_spawn_independent
pvm_start
pvm_tasks
pvm_tidtohost
pvmd3
PVM daemon
111
# of units
# unit price
total
mouse
keyboard
mouse USB
keyboard USB
2
2
1
1
15
15
20
30
30
30
20
30
master node:
CPU 2.6 core 2 duo
RAM DDR2 2GB
case DR66
Motherboard MSI
DVD ROM
LAN CARD
TOTAL MASTER NODE:
1
1
1
1
1
2
645
105
110
275
90
20
645
105
110
275
90
40
1265
TYP1:
COMPAQ
CPU 2.4 CASH
HE
512
512
H.D 40 GB
RAM 256
DDR1
410
820
TYP2:
DELL
CPU 2.8 CASHE 1 M
H.D 40 GB
RAM 1 GB
DDR2
625
1250
COMPUTING NODES
TOTAL:
3445
TYP3
DELL
CPU 2.8 CASHE
HE1512
M
H.D 40 GB
RAM 1 GB
DDR1
500
1500
RAM 1G
RAM 1 GB
1
1
130
60
130
60
1.3
10.4
cables
DDR1
DDR1
8m
TOTAL ALL
5145.4
I/P
7000
BUDGET
1854.6
112