Sei sulla pagina 1di 128

Rocks Linux Cluster

By
Al-Baraa Bahgat Ezzat
Ayman Aboulmagd Ahmed
Medhat Hamdy Mohamed
Mohamed Ibrahim Abd El-khalik
Mohamed Mounir Mahmoud
Under the Supervision of
Dr. Ahmed T. Sayed

Dr. Hosam A. Fahmy

A Graduation Project Report Submitted to


the Faculty of Engineering at Cairo University
in Partial Fulfillment of the Requirements for the
Degree of Bachelor of Science
in
Electronics and Communications Engineering
Faculty of Engineering, Cairo University
Giza, Egypt
July 2009

Table of Contents
List of Tables ................................................................................................................. x
List of Figures ............................................................................................................... xi
List of Symbols and Abbreviations............................................................................ xiii
Acknowledgments....................................................................................................... xiv
Abstract ........................................................................................................................ xv
Chapter 1:
1.1

Introduction to clusters ............................................................................ 1

Types of computers: ........................................................................................ 1

1.1.1

Uniprocessor Computers:......................................................................... 1

1.1.2

Multiple Processors:................................................................................. 2

1.1.2.1

Centralized multiprocessors: ............................................................ 2

1.1.2.2

Multicomputers (Clusters). ............................................................... 3

Chapter 2:

Getting started with the cluster ................................................................ 9

2.1

Choices made .................................................................................................. 9

2.2

Linux ............................................................................................................. 10

2.2.1

Linux history .......................................................................................... 10

2.2.2

Linux vs. Windows ................................................................................ 11

2.2.2.1
2.2.3

Linux Vs. windows file system approach- ................................... 12

Important definitions .............................................................................. 13

2.2.3.1

Free software .................................................................................. 13

2.2.3.2

Open source .................................................................................... 13

2.2.3.3

Kernel ............................................................................................. 14

2.2.3.4

Shell: ............................................................................................... 14

2.2.3.5

Virtual consoles: ............................................................................. 15

2.2.3.6

Run levels: ...................................................................................... 15

ii

2.2.3.7

ext3 ................................................................................................. 16

2.2.3.8

swap ................................................................................................ 16

2.2.3.9

Grub ................................................................................................ 16

2.2.3.10

GUI (graphical user interface) ........................................................ 16

2.2.3.11

KDE&GNOME .............................................................................. 16

2.2.3.12

Linux distributions.......................................................................... 17

2.2.3.13

Root and User ................................................................................. 18

2.2.3.14

Parent directory and child directory: .............................................. 19

2.2.4

The most popular Commands: ............................................................... 20

2.2.4.1

ls...................................................................................................... 20

2.2.4.2

cd .................................................................................................... 21

2.2.4.3

cp .................................................................................................... 22

2.2.4.4

mkdir dirname................................................................................. 23

2.2.4.5

halt .................................................................................................. 23

2.2.4.6

reboot .............................................................................................. 23

2.2.4.7

whoami ........................................................................................... 23

2.2.4.8

which .............................................................................................. 24

2.2.4.9

cat filename.................................................................................... 24

2.2.4.10

head ................................................................................................. 24

2.2.4.11

tail ................................................................................................... 24

2.2.4.12

touch ............................................................................................... 24

2.2.4.13

rm .................................................................................................... 25

2.2.4.14

mv ................................................................................................... 25
iii

2.2.4.15
2.2.5

more ,less : ...................................................................................... 26

Permissions: ........................................................................................... 26

2.2.5.1

Setting permissions: ........................................................................ 27

2.2.6

Links ...................................................................................................... 28

2.2.7

Text editor: ............................................................................................. 28

2.2.8

Pipes and Searching: .............................................................................. 29

2.2.8.1

Pipes................................................................................................ 29

2.2.8.2

Searching ........................................................................................ 29

2.2.9

Installing packages: ................................................................................ 30

2.2.9.1

RPM ................................................................................................ 30

2.2.9.2

Repositories and yum utility: .......................................................... 31

2.2.9.3

Installing from source code: ........................................................... 32

2.2.10

Linux File system:.................................................................................. 33

2.2.11

Mounting: ............................................................................................... 34

2.2.11.1

Mounting device using mount command ....................................... 34

2.2.11.2

unmounting devices: ....................................................................... 34

2.2.12

2.3

On the job:.............................................................................................. 34

2.2.12.1

Shortcuts ......................................................................................... 35

2.2.12.2

general notes: .................................................................................. 35

OSCAR.......................................................................................................... 36

2.3.1

What is OSCAR? ................................................................................... 36

2.3.2

Installation steps..................................................................................... 36

Errors: ...................................................................................................................... 36
2.3.2.1
2.4

OSCAR GUI installation steps: ...................................................... 37

Rocks cluster ................................................................................................. 39


iv

2.4.1

ROCKS .................................................................................................. 39

2.4.2

ROLLS ................................................................................................... 39

2.4.2.1

Installing Rolls ................................................................................ 39

2.4.2.2

Base ................................................................................................ 40

2.4.2.3

Area51 ............................................................................................ 40

2.4.2.4

HPC: ............................................................................................... 40

2.4.2.5

Ganglia: .......................................................................................... 41

2.4.2.6

SGE (SUN Grid Engine scheduler) ................................................ 41

2.4.3

Installation: ............................................................................................ 41

2.4.3.1

The minimum requirement to bring up a frontend is following rolls:


41

2.4.3.2

The minimum Hardware Requirements ......................................... 41

2.4.3.3

Installing frontend: ......................................................................... 42

2.4.3.4

Installing the computed nodes: ....................................................... 49

2.4.3.5

Check connection between the frontend and any computed node: 51

2.4.4

Problems we faced during installation: .................................................. 52

2.4.4.1

IPs problem ..................................................................................... 52

2.4.4.2

RAMS problem............................................................................... 52

2.4.4.3

IP conflict problem ......................................................................... 53

2.4.4.4

On board LAN card on the frontend ............................................... 53

2.4.4.5

Insert-ethers and switch .................................................................. 53

2.4.4.6

Installing the computed nodes and network: .................................. 53

2.4.4.7

Scilab does not open the plotting window ...................................... 53

2.4.5

Useful Commands:................................................................................. 54
v

2.4.5.1

ADD interface................................................................................. 54

2.4.5.2

Disable rolls: ................................................................................... 54

2.4.5.3

Enable rolls: .................................................................................... 54

2.4.5.4

Host interface .................................................................................. 55

2.4.5.5

host interface configration .............................................................. 55

2.4.5.6

dump specific host .......................................................................... 55

2.4.5.7

Dump all hosts ................................................................................ 55

2.4.5.8

Host appliance ................................................................................ 55

2.4.5.9

List partitions .................................................................................. 56

2.4.5.10

List rolls .......................................................................................... 56

2.4.5.11

List network .................................................................................... 56

2.4.5.12

List roll commands ......................................................................... 56

2.4.5.13

Remove hosts .................................................................................. 56

2.4.5.14

Remove interfaces .......................................................................... 56

2.4.5.15

Remove rolls ................................................................................... 56

2.4.5.16

Remove networks ........................................................................... 56

2.4.5.17

Set ips for interfaces ....................................................................... 56

2.4.5.18

Set password ................................................................................... 57

2.4.6
2.5

Notes for running clusters: ..................................................................... 57

Cbench ........................................................................................................... 57

2.5.1

What is Benchmarking ........................................................................... 57

2.5.2

General definition of Benchmarking ..................................................... 57

2.5.3

Benchmarking for A cluster ................................................................... 57

2.5.4

Levels of Cluster Testing ....................................................................... 58


vi

2.5.4.1

NODE LEVEL: .............................................................................. 58

2.5.4.2

Point-to-Point System Level:.......................................................... 58

2.5.4.3

MPI system level: ........................................................................... 58

2.5.5

Installing Cbench: .................................................................................. 58

Chapter 3:

Parallel programming............................................................................. 63

3.1

Introduction to parallel programming ........................................................... 63

3.1.1

Goal ........................................................................................................ 63

3.1.2

Types of parallelism ............................................................................... 63

3.1.3

System architectures .............................................................................. 63

3.1.3.1

Single instructions single data SISD- ........................................... 63

3.1.3.2

Single instruction multiple data SIMD ........................................... 64

3.1.3.3

Multiple instruction multiple data (MIMD) ................................... 65

3.1.3.4

Gathering up: .................................................................................. 68

3.1.3.5

Cluster............................................................................................. 69

Advantages:.......................................................................................................... 69
Disadvantages: ..................................................................................................... 69
3.1.4

Performance analysis ............................................................................. 69

3.1.4.1

Goal ................................................................................................ 69

3.1.4.2

Timing ............................................................................................ 69

3.1.4.3

Profiling: ......................................................................................... 70

3.1.4.4

Measuring performance .................................................................. 70

3.1.5

Example to illustrate Parallel programming .......................................... 71

3.1.6

Amdahls law ......................................................................................... 72

3.1.7

Gustafson's law: ..................................................................................... 73

3.2

MPI ................................................................................................................ 75
vii

3.2.1

3.2.1.1

Rank ................................................................................................ 76

3.2.1.2

Communicator ................................................................................ 76

3.2.1.3

Group .............................................................................................. 77

3.2.1.4

MPI message................................................................................... 77

3.2.1.5

Blocking Vs. Non-blocking Communication ................................. 78

3.2.2

3.3

Basic MPI definitions: ........................................................................... 76

MPI Subroutines .................................................................................... 78

3.2.2.1

Basic Subroutines ........................................................................... 78

3.2.2.2

Other Subroutines ........................................................................... 79

Running a code using MPI & testing the cluster. .......................................... 83

3.3.1

Creat a new user ..................................................................................... 83

3.3.2

Running the testing code. ....................................................................... 84

3.3.3

Running a satisfiability code.................................................................. 85

3.3.4

Result of running the satisfiability code ................................................ 87

3.4

PVM .............................................................................................................. 88

3.4.1

Introduction ............................................................................................ 88

3.4.2

PVM Overview ...................................................................................... 88

3.4.3

How to Obtain the PVM Software ......................................................... 90

3.4.4

Setup to Use PVM ................................................................................. 91

3.4.4.1

Setup Summary............................................................................... 92

3.4.5

Starting PVM ......................................................................................... 92

3.4.6

Common Startup Problems .................................................................... 94

3.4.7

Running PVM Programs ........................................................................ 95

3.4.8

PVM Console Details ............................................................................ 97

3.4.9

Errors we faced ...................................................................................... 99

3.4.10

Host File Options ................................................................................. 101


viii

References .................................................................................................................. 105


Appendix A: Linux commands .................................................................................. 106
Appendix B: list of the open source tools in Cbench................................................. 108
Appendix C: PVM functions in Scilab ...................................................................... 110
Appendix D: The H.W and the budget. ..................................................................... 112

ix

List of Tables
Table 2-1: Linux Vs. windows..................................................................................... 12
Table 2-2: the runlevels .............................................................................................. 15
Table 2-3:colours on terminal ...................................................................................... 21
Table 2-4:vi options ..................................................................................................... 28
Table 3-1: Distrbuted memory Vs. shared memory..................................................... 68
Table 3-2: Blocking and non-blocking communication .............................................. 78
Table 3-3: MPI reduction operation ............................................................................. 82
Table 3-4:Results of running satisfiablity problem code ............................................. 87

List of Figures
Figure 1-1: UMA architecture ....................................................................................... 2
Figure 1-2: NUMA architecture..................................................................................... 3
Figure 1-1-3: Symmetric cluster .................................................................................... 5
Figure 1-1-4: Asymmetric cluster .................................................................................. 6
Figure 1-5: Expanded cluster ......................................................................................... 6
Figure 2-1: Windows file system ................................................................................. 12
Figure 2-2: Linux file system ....................................................................................... 13
Figure 2-3:The kernel .................................................................................................. 14
Figure 2-4:printing the home directory path ................................................................ 18
Figure 2-5:add user using GUI .................................................................................... 19
Figure 2-6:the super user command ............................................................................. 19
Figure 2-7: ls help command ....................................................................................... 20
Figure 2-8:listing the contents of a current directory................................................... 20
Figure 2-9:the long list command ................................................................................ 21
Figure 2-10:changing the current directory ................................................................. 22
Figure 2-11: pwd command ......................................................................................... 22
Figure 2-12:making a new directory. ........................................................................... 23
Figure 2-13: printing the current user name ................................................................ 23
Figure 2-14:which command ....................................................................................... 24
Figure 2-15: removing a file using rm command ........................................................ 25
Figure 2-16 moving a file to the parent directory ........................................................ 25
Figure 2-17:the permissions ......................................................................................... 26
Figure 2-18:setting permissions using chmod ............................................................. 27
Figure 2-19: setting permissions using chmod ............................................................ 37
Figure 2-20: First Rocks screen ................................................................................... 42
Figure 2-21: Selecting Rocks Rolls 1 .......................................................................... 43
Figure 2-22: Selecting Rocks Rolls 2 .......................................................................... 43
Figure 2-23: Rocks cluster information screen ............................................................ 44
Figure 2-24: Eth0 IP..................................................................................................... 44
Figure 2-25: Eth1 IP..................................................................................................... 45
Figure 2-26: Gateway and DNS IPs............................................................................. 45
Figure 2-27: Setting root password .............................................................................. 46
xi

Figure 2-28: Disk partitioning ..................................................................................... 46


Figure 2-29: Manual Partitioning................................................................................. 47
Figure 2-30: Kernel - Disk1 inquiry ............................................................................ 47
Figure 2-31: Rocks frontend installing status .............................................................. 48
Figure 2-32: Insert-ethers window ............................................................................... 49
Figure 2-33: Insert-ethers waiting for new compute node ........................................... 49
Figure 2-34: Rocks's first screen .................................................................................. 50
Figure 2-35: Insert-ethers discovering compute node ................................................. 50
Figure 2-36: Insert-ethers didnt send a kickstart file yet ............................................ 51
Figure 2-37: The node has successfully received the kick start file ............................ 51
Figure 2-38: Nvidia driver readme file ........................................................................ 54
Figure 3-1: SISD performs one instruction per cycle .................................................. 63
Figure 3-2: The advantage of pipelining ...................................................................... 64
Figure 3-3: The SIMD ................................................................................................. 64
Figure 3-4: the MIMD ................................................................................................. 65
Figure 3-5: shared memory MIMD.............................................................................. 66
Figure 3-6: shared memory with a bus ........................................................................ 66
Figure 3-7: Shared memory using crossbar switch ...................................................... 67
Figure 3-8: Distributed memory .................................................................................. 67
Figure 3-9: Flyn's Taxonomy for architecture ............................................................. 68
Figure 3-10: cluster architecture .................................................................................. 69
Figure 3-11:Inner product following the divide and conquer algorithm ..................... 71
Figure 3-12: Speed up vs number of processors for three different degree of
parallelization ............................................................................................................... 73
Figure 3-13: MPI_COMM_WORLD .......................................................................... 76
Figure 3-14: Communicator & group .......................................................................... 77
Figure 3-15: Send operation block diagram................................................................. 78
Figure 3-16: MPI_scatter ............................................................................................. 81
Figure 3-17: mpi_gather .............................................................................................. 82
Figure 3-18: Results of running satisfiability problem code ....................................... 87
Figure 3-19: PVM program hello.c.............................................................................. 89
Figure 3-20: PVM program hello_other.c ................................................................... 90
Figure 3-21: Simple hostfile listing virtual machine configuration ........................... 101
Figure 3-22: PVM hostfile illustrating customizing options ..................................... 104
xii

List of Symbols and Abbreviations

GPL

General Public License

EULA

End User License Agreement

Grub

Grand Unified Boot loader

EP

Embarrassingly Parallel

OSCAR

Open Source Cluster Application Resources

HPC

High Performance Computer

SSH

Secure Shell

API

Application Programming Interface

MPI

Message Passing Interface

xiii

Acknowledgments
We would like to express our thanks, sincere gratitude and appreciations to Dr.
Ahmed Tarek Sayed and Dr. Hosam Ali Fahmi for suggesting the idea of the project,
continues advice, valuable guidance and constructive instructions during the whole
year.
We would like also to thank Eng. Amgad, Eng Amaal, our colleague Ahmed Abu ElFotoh.
Last but not least we thank Dr. Yahya Bahnas and the communications lab staff for
their help throughout the year.

xiv

Abstract
The goal of the project is providing high performance computer at low price.
Combining many commodity computers in to one supercomputer, the cluster provides
high performance needed in compute intensive calculations.
The cluster is managed by software called Rocks, Rocks is an open source cluster
management package based on a RedHat Linux distribution (CentOS 5).
Currently the cluster has a master node and 7 heterogeneous nodes.

xv

Chapter 1:
Introduction to clusters

xvi

Chapter 1:

Introduction to clusters

When computing, there are three basic approaches to improve performance:

Use a better algorithm.

Use a faster computer.

Divide the calculation among multiple computers.

A very common example is that of a horse drawn cart. The techniques will be followed:

Lighten the load.

Get a stronger horse. (But we will never find the horse of TARWADA).

Get a team of horses & distribute the load.

1.1 Types of computers:

Uniprocessor Computers.

Multiple Processors.
o Centralized multiprocessors.
o Multicomputers (Clusters).

1.1.1 Uniprocessor Computers:


The basic structure of a uniprocessor computer is a CPU connected to memory by a
communications channel or bus. Instructions and data are stored in memory and are moved to
and from the CPU across the bus. The overall speed of a computer depends on both the speed
at which its CPU can execute individual instructions and the overhead involved in moving
instructions and data between memory and the CPU. This architecture was suggested by the
Hungarian mathematician John von Neumann in 1945.
Several technologies are currently used to speed up the processing speed of CPUs like
pipelining. But it doesn't matter how fast the CPU is theoretically capable of running if you
can't get instructions and data into or out of the CPU fast enough to keep the CPU busy.
Consequently, memory access has created a performance bottleneck for the classical von
Neumann architecture: the von Neumann bottleneck.
Computer architects and manufacturers have developed a number of techniques to minimize
the impact of this bottleneck. Computers use a hierarchy of memory technology to improve

overall performance while minimizing cost. Frequently used data is placed in very fast cache
memory, while less frequently used data is placed in slower but cheaper memory.
We reach a limit: supercomputers have been pipelined with a single CPU. This is the "big
iron" of the past, often requiring "forklift upgrades" and air conditioners to prevent them from
melting from the heat they generate.
Although of the high performance of supercomputers, we always need to increase it, But now
technologies in manufacturing have reached a physical & quantum limitation in increasing
the transistor performances, & we must remember that transistors is the main building unit of
any electrical system, so its impossible to increase speed & performances of our systems.
(We will never find processor of 10 GHz).
So the second approach of getting TARWADA horse is limited.

1.1.2 Multiple Processors:


1.1.2.1 Centralized multiprocessors:
With centralized multiprocessors, there are two architectural approaches based on how
memory is managed:
Uniform memory access (UMA) / symmetric multiprocessors (SMP)
Non-uniform memory access (NUMA).
1.1.2.1.1 Uniform memory access (UMA) / symmetric multiprocessors (SMP)

With UMA machines there is a common shared memory. Identical memory addresses map,
regardless of the CPU, to the same location in physical memory. Main memory is equally
accessible to all CPUs. To improve memory performance, each processor has its own cache.

Figure 1-1: UMA architecture

There are two difficulties when designing UMA machine:


The first is synchronization: Communications among processes and access to
peripherals must be coordinated to avoid conflicts.
The second is cache consistency: If two different CPUs are accessing the same
location in memory and one CPU changes the value stored in that location, then how is the
2

cache entry for the other CPU updated? While several techniques are available, the most
common is snooping. With snooping, each cache listens to all memory accesses. If a cache
contains a memory address that is being written to in main memory, the cache updates its
copy of the data to remain consistent with main memory.
1.1.2.1.2 Non-uniform memory access (NUMA)

In this architecture, each CPU maintains its own piece of memory. Effectively, memory is
divided among the processors, but each process has access to all the memory.
Each individual memory address, regardless of the processor, still references the same
location in memory.(i.e. as if memory is divided into parts).

Figure 1-2: NUMA architecture

But there is a disadvantage in NUMA that memory access is non-uniform in the sense that
some parts of memory will appear to be much slower than other parts of memory since the
bank of memory "closest" to a processor can be accessed more quickly by that processor.
While this memory arrangement can simplify synchronization, the problem of memory
coherency increases.
In this type, the synchronization problem will exist but, the cache consistency wont exist
cause the address space here will be divided among processes.
1.1.2.2 Multicomputers (Clusters).
1.1.2.2.1 A brief history of Clusters

A computer cluster is a group of linked commodity computers, working together closely so


that in many respects they form a single supercomputer. These computers are linked to each
other through local area network to provide higher performance than a single computer.

The basis of cluster computing as a means of doing parallel work of any sort was arguably
invented by Gene Amdahl of IBM, who in 1967 published what has come to be regarded as
the seminal paper on parallel processing: Amdahl's Law. Amdahl's Law describes
mathematically the speedup one can expect from parallelizing any given otherwise serially
performed task on a parallel architecture. This article defined the engineering basis for both
multiprocessor computing and cluster computing, where the primary differentiator is whether
or not the interprocessor communications are supported "inside" the computer (on for
example a customized internal communications bus or network) or "outside" the computer on
a commodity network.
The history of clusters is directly tied into the history of Network. As the main goal of
networking is to link between computing resources. After the appearance of packet switching
concept, the ARPANET project began its development and it succeeded to make the first
cluster by linking four different computers. The ARPANET grew to the Internet. Now,the
Internet provide the linking between all cluster around the world.

The first commercial cluster made was ARCnet. ARCnet wasn't a commercial success and
clustering didn't really take off until DEC released their VAXcluster product in the 1980s for
the VAX/VMS operating system. The ARCnet and VAXcluster products not only supported
parallel computing, but also shared file systems and peripheral devices. They were supposed
to give you the advantage of parallel processing, while maintaining data reliability and
uniqueness. VAXcluster, now VMScluster, is still available on OpenVMS systems from HP
running on Alpha and Itanium systems.
1.1.2.2.2 What is a cluster?

A cluster is a network of computers that work together so that they can be viewed as though
they are a single computer. A cluster has three basic elements:

Collection of individual computers.

Network connecting those computers.

Software that enables a computer to share work among the other computers via the
network.

1.1.2.2.3 Cluster structures

1. Symmetric clusters.
2. Asymmetric clusters.
3. Expanded cluster.

1.1.2.2.3.1 Symmetric cluster

Figure 1-1-3: Symmetric cluster

In this architecture each node is used independently. The user use each node separately.(e.g.
the user log to the first node & run a certain task then log to the second node & run another
task).
There are main two disadvantages to a symmetric cluster. Cluster management and security
can be more difficult. Workload distribution can become a problem, making it more difficult
to achieve optimal performance.

1.1.2.2.3.2 Asymmetric cluster

Figure 1-1-4: Asymmetric cluster

In asymmetric clusters one computer is the head node or frontend. It serves as a gateway
between the remaining nodes and the users. The remaining nodes often have very minimal
operating systems.
Since all traffic must pass through the head, asymmetric clusters tend to provide a high level
of security. If the remaining nodes are physically secure and your users are trusted, you'll
only need to harden the head node.
So the head often acts as a primary server for the remainder of the clusters. Since it will be
configured differently from the remaining nodes, it may be easier to keep all customizations
on that single machine. This simplifies the installation of the remaining machines.
The disadvantage of this architecture comes from the performance limitations imposed by the
cluster head, & this will effectively in case of large clusters.
1.1.2.2.3.3 Expanded cluster

Figure 1-5: Expanded cluster


This architecture used to overcome the main disadvantage of asymmetric cluster that there is
performance limitation on head node.
6

In this architecture additional servers are used beside the head node (i.e. distributing the work
of the head nodes on many computers).
For example, one of the nodes might function as an NFS server, a second as a management
station that monitors the health of the clusters, and so on.

Chapter 2:
Getting started with the cluster

Chapter 2:

Getting started with the cluster

2.1 Choices made

It's a very fascinating thing to work on something that you like. The first step in our project
was learning, we began to read about Clusters, OSCAR and parallelization.
Then, we began to define the way that we will take in our journey (OSCAR Vs Rocks). We,
at first took Rocks as the software that we will use to setup the cluster. But, unfortunately
Rocks had a lot of problems in its installation which made us divide the group into two teams
working in OSCAR and Rocks in parallel. The amazing thing is that we were able to install
both softwares. But finally, we decided to use Rocks as it was easy in its installation and we
also could solve a lot of its problems which made us familiar with it.
After that, we began to study MPI besides working on Benchmarking, installing Scilab and
making the documentation.
Along the way, we faced a lot of problems related to the hardware, we will mention these
problems in this Chapter and how we solved them.

2.2 Linux
2.2.1 Linux history
To understand why Linus Torvalds make it free we must introduce GNU project not UNIX.
In the early of 1980s Richard Stallman at the Massachusetts Institute of Technology proposed
an alternative to the standard corporate software development model In 1983, Stallman
launched the GNU Project which is centered around the idea that the source code for
applications and operating systems should be freely distributable to anyone who wants it so
source code is free for programmers to copy, modify and redistribute it. This will result in
high quality software.
Linux is licensed under the GNU General Public License (GPL). It requires that the source
code remain freely available to anyone who wants it, anyone can download the Linux
kernels source code, modify it, recompile it, and run it.
On the other hand , most operating systems have follows the End User License Agreement
(EULA) that prevents the user from reverse-compiling the operating system.
At early of 1990s there were three main O.S: DOS, Mac OS & UNIX. Windows was getting
started at this time; it was simply a shell that ran on DOS. It really wasnt a true operating
system yet.

Note: the above O.S was commercially developed and source code was being
protected by copyrights.

UNIX source code used to be free for educational purpose ,then its gone which put prof.
Andrew S. Tanenbaum in a problem with teaching the inner core of O.S. so he decide to build
the clone of UNIX to use it in his class which called Minix. he also put the source code in his
book Operating Systems: Design and Implementation (Prentice Hall, 2006).
From this point a graduate student at the University of Helsinki in Finland, initially
developed the Linux kernel. This was Linus Torvalds. it was not a full operating system ,
Linux version 0.02, released on October 5, 1991 was only consist of

bash A command-line interface

update A utility for flushing file system buffers

gcc A C++ compiler

10

After that, he put his Linux source code on the internet, Access to the Linux source code was
open to anyone who wanted it, then Torvalds focus in improving his Linux and it turned to a
worldwide issue.

2.2.2 Linux vs. Windows


This is a comparison between Linux and windows which displays why Linux was our choice.

Topic

Linux

Windows

Price

Free or much lower price

Between $50.00 - $150.00 US


dollars per each license copy.

Ease

Windows is still much easier to use for


new computer users.

Microsoft ha made several


advancements and changes that
have made it a much easier to use
operating system, and although
arguably it may not be the easiest
operating system, it is still Easier
than Linux.

Reliability

Reliable and can often run for months


and years without needing to be
rebooted.

it still cannot match the reliability


of Linux.

Software

Linux has a large variety of available


software programs, utilities, and
games.

Because of the large amount of


Microsoft Windows users, there
is a much larger selection of
available software programs,
utilities, and games for Windows.

Software
Cost

Many of the available software


Although Windows does have
programs, utilities, and games
software programs, utilities, and
available on Linux are freeware and/or games for free, the majority of the
open source. Even such complex
programs will cost you a lot of
programs such as Gimp, Open Office,
money.
Star Office, and wine are available for
free or at a low cost.

Hardware

Linux companies and hardware


Because of the amount of
manufacturers have made great
Microsoft Windows users and the
advancements in hardware support for broader driver support, Windows
Linux and today Linux will support
has a much larger support for
most hardware devices. However,
hardware devices and a good
many companies still do not offer
majority of hardware
drivers or support for their hardware in manufacturers will support their
Linux.
products in Microsoft Windows.

11

Security

Linux is and has always been a very


secure operating system. Although it
still can be attacked when compared to
Windows, it is much more secure.

Although Microsoft has made


great improvements over the
years with security on their
operating system, their operating
system continues to be the most
vulnerable to viruses and other
attacks.

Open
Source

Many of the Linux variants and many


Linux programs are open source and
enable users to customize or modify
the code however they wish to.

Microsoft Windows is not open


source and the majority of
Windows programs are not open
source.

Support

Although it may be more difficult to


find users familiar with all Linux
variants, there are vast amounts of
available online documentation and
help, available books, and support
available for Linux.

Microsoft Windows includes its


own help section, has vast amount
of available online documentation
and help, as well as books on
each of the versions of Windows.

Table 2-1: Linux Vs. windows


2.2.2.1 Linux Vs. windows file system approachWindows: simply each storage device is the root of its own; each root can have the ability to
create a structure that commonly called tree (Figure 2-1)

Figure 2-1: Windows file system

12

Linux: there is only one root .one tree and all devices are mounted in the tree fig(2-2)

Figure 2-2: Linux file system

Note: folders are called directories under Linux.

2.2.3 Important definitions


There are a very important things you must know about Linux,
we will explain them in a very simple and clear way .
2.2.3.1Free software
The Linux community supports the concept of free software, that is, software that is free
from restrictions, subject to the GNU General Public License. Although there may be a cost
involved in obtaining the software, it can thereafter be used in any way desired and is usually
distributed in source form.
2.2.3.2Open source
Open source refers to any program whose source code is made available for use or
modification as users or other developers see fit. Open source software is usually developed
as a public collaboration and made freely available.

13

2.2.3.3Kernel
Its the core of the operating system .it manages the communication between the hardware
and the software .simply we can say it is responsible for everything the O.S must do.

Kernel

Hard ware

Soft ware

Figure 2-3:The kernel

2.2.3.4Shell:
Its impossible for the end user to deal directly with the Kernel so the need to the shell a rise.
The shell is a program that enables the end user to deal with the kernel.
There are different shells like:
sh: Bourne shell, the earliest shell.
bash: Bourne again shell, improved one of sh and it's the default for Linux.
csh: uses syntax very similar to C language.
zsh: improved version of Bash.
tch: improved version of csh, it's the default shell for free BSD.
There are different ways to access the shell:
1-CLI (command line interface) :communicate with shell by writing commands.
2-GUI (graphical user interface): will be explained in details later.
3-Terminal which is the command line interface in the GUI.

14

2.2.3.5Virtual consoles:
Refers to the combination of one I/P device & one O/P device which enables you to interact
with your computer by communicating with the shell.
We have seven consoles in Linux to change between them press alt+ctrl+F1:F7
So its clearly now that Linux is a multi-user multitasking as we can open more than one
terminal and work in different consoles at the same time.
2.2.3.6Run levels:
Run levels represent the several modes that Linux can run in. We have seven Run
levels as follows:

Run
level

Description

Halts the system

Single user mode CLI enabled

Multi user mode CLI enabled No


Networking

Full multi-user mode

Unused

Multi-user - GUI Networking (X11)

Reboot the system


Table 2-2: the runlevels

To change the Run level you can type the following command in the Terminal:
$ init 0
In this command, you are telling the system to change the Run level into Run
level 0 which is halting the system.

15

2.2.3.7ext3
Its a journaling file system used by Linux kernel to manage files and folders ,The main
advantage over ext2 is the ability to fast recovery after unclean shutdown. There are other file
systems like (Resier, fat, ntfs, ext4).
2.2.3.8swap
This partition is used for virtual memory by the Linux operating system. Linux uses it as
system RAM.
2.2.3.9Grub
(the Grand Unified Boot loader) , It starts the Linux kernel, Most Linux distributions that use
GRUB come with it installed and ready to use. Many of the distributions that do not have
GRUB installed by default have it available in their package systems; check there first before
doing a manual installation. If something goes wrong during your attempted GRUB install
you can cause your computer not to boot, So dont do anything if you dont know exactly
what you are doing.
2.2.3.10 GUI (graphical user interface)
Its the best solution for the end user to deal with Linux. Where the user clicks on a visual
screen that has icons, windows and menus, by using a pointing device, such as a mouse. GUI
is pronounced like "gooey"
GUI consists of X-window, Window manager and a desktop environment (KDE, GNOME).
GUI was created using X-window system software the x-window is the engine which we
refer to it as (X11) or (X). X-window allows programmers to run applications in windows but
its alone is never enough, we need window manager, GUI toolkits and Desktop
environments.
Window manager handles the widow placements & movements. It allows you to maximize
and minimize window.
2.2.3.11 KDE&GNOME
Both are complete desktop environments GNOME uses a window manager called metacity &
KDE uses Kwin. They may use any other window manager.There is no Best Desktop but,
there's the Best of you. Both can be customized to behave the way you want.

16

2.2.3.12 Linux distributions


Linux kernel only is never enough so we have to make our recipe to make the kernel work for
us. We can take Ubuntu as an example.
Ubuntu (Debian based) is a Desktop distribution so its the best solution for an end user only
cares about his simple daily life jobs, But if we tried to use it in a cluster it will be very bad
choice.
Fedora (red hat) is a general purpose so you can make it the way you want but for an end user
its not easy as much as Ubuntu.
You may deal with a lot of Distributions, here are some of them:

RedHat Company (RHEL, Fedora, Centos)

Novel Company (Suse, open suse)

Debian

Ubuntu

Yellow dog

Mandriva

Gentoo

17

2.2.3.13 Root and User


Root is controlling everything in Linux it can access everything
Root directory
The directory which contains all files and directories in the file system
/ is the root directory shortcut in the CLI.
Home directory
Its related to Linux user .all the users data will be stored somewhere in the home directory.
Its located at /home and its name is referred to the users name.
~ is the home directory shortcut in the CLI
To display your home directory
$echo $HOME.
It will print the home directory of the current user as shown fig.(2-4):

Figure 2-4:printing the home directory path

DONT USE root IN YOUR DIALY ROUTINE Because it may destroy your system.
Adding users:
Using root we can add users through terminal by writing
Adduser

username

Passwd

******

Or we can add user through GUI as shown

18

Figure 2-5:add user using GUI

Super user (su): this command helps you to live in the root environment and can do what
you want but you actually not a root. See fig.(2-6) and you will notice the change from $
user sign to # root sign

Figure 2-6:the super user command

Notes:
1- Each user has its own directory and its own desktop and this is applied for root.
2- root and users must have different password
3- When you are root it displays # in terminal and when you are user it displays $.
2.2.3.14 Parent directory and child directory:
We represent parent directory by .. , its the folder which contains the child directory .it
may contain more than one child directory and child directory may be parent for another
directories so parent and child is a relative relation between directories.
To go to parent directory
$cd..
To go to child directory
$cd ./childdierctoryname

19

2.2.4 The most popular Commands:


Notes:
1- Linux commands are case sensitive.
2- All commands has options, to know about the command and its options you may write in
the terminal
$ commandname --help
OR
$ man commandname

it displays the manual pages

Figure 2-7: ls help command

2.2.4.1ls
Lists the contents of the current directory as shown

Figure 2-8:listing the contents of a current directory

20

One of ls options is ls l which list the contents in the long list mode which displays the
permissions and infos about the contents as shown fig.(2-9)

Figure 2-9:the long list command

Note:
1-You may noticed that ls displays the contents in different colours , this coluors represents
the type of the contents:
Directory

blue

Compressed files

Red

Executables

Green

Links

Light blue

Sockets

pink

Special devices

yellow
Table 2-3:colours on terminal

2- You can use ls a to see the configuration files at the home directory.
2.2.4.2cd
Its used to move between directories
To explain the cd command, we move to the Desktop directory after we execute the
command its displayed before $ sign.
So its now clearly we print before the $ sign your computer name, the user name and the
current directory.

21

Figure 2-10:changing the current directory

There is some shortcuts try to execute them in terminal:


$cd ..

//go to parent directory

$cd ~ // go to home directory


$cd /

//go to root directory

Current directory:
Shell keeps tracking your current directory, because commands perform their jobs on the
current directory. For example, ls command which displays the current directory contents and
touch command which creates empty file in the current directory. Also to install package we
must have the package on the current directory.
To print the current directory
$ pwd
It will print it as shown

Figure 2-11: pwd command

2.2.4.3cp
$ cp file1 file2
$cp -r dir1 dir2
The -r option is used to copy the directory & all its contents.

22

2.2.4.4mkdir dirname
to make a directory in the fig. (2-12) we made a directory called (moh) and we display the
contents before creating the directory and after creating it.

Figure 2-12:making a new directory.

2.2.4.5halt
Shut down (Root only)
2.2.4.6reboot
Restart (Root only)
2.2.4.7whoami
prints the current user as shown in fig(2-13)

Figure 2-13: printing the current user name

23

2.2.4.8which
Displays the path of the files, which runs when this name is given as a command. Next fig.
shows the first two no-response trails because (moh) cannot be given to shell its a directory,
and in the second trail we entered (vlc) but it was not installed.

Figure 2-14:which command

2.2.4.9cat filename
Displays the content of a file
$ cat filename
2.2.4.10 head
Displays the first lines only from a file
$head filename
2.2.4.11 tail
Displays the last lines only from a file
$ tail filename
2.2.4.12 touch
Creates empty file of any format.
$ touch filename.xxx

24

2.2.4.13 rm
Its used to remove files we create new file then we display the contents of the current
directory , then we remove the file using rm command .
$rm filename

Figure 2-15: removing a file using rm command

To force quit file or directory and its contents we use the rf option, see the fig.
$rm rf filename
2.2.4.14 mv
Its used to move files between folders

Figure 2-16 moving a file to the parent directory

25

2.2.4.15 more ,less :


we will take ls command as an example ,if you print the contents of the current directory
which contains a lot of files- or you open a text file with a lot of data. The screen is not
enough to display the data, the more & less commands help you .
more: it gives the move forward using the spacebar.
less: moves forward and backward using up& down arrows.
To exit the more and less mode you must enter q.
$ls |less
$ls |more
Note: Before applying these two commands choose a directory with a lot of files just to feel
the more & less effect.

2.2.5 Permissions:
If you excute the ls l command you will notice a combination of letters before the name
of the file .
This letters represents the file type and The permissions for the owner, groups and others. See
fig (2-17)

Figure 2-17:the permissions

26

The owner permissions are for the file creator the user who creates the file- the group
permissions are for the users in the same group of the owner and the others refers to the users
who are not in the same group.
2.2.5.1Setting permissions:
Read =4

Write=2

Execute=1

chomd used to set permissions .


Example: fig.(2-16) illustrates this example
We will create a directory called (moh) and we will set the owner read and execute, group
can write only and others have full permissions.
read+ excute=5 ,write=2,read+write+execute=7
so it will be:
$ chmod 527 moh

Figure 2-18:setting permissions using chmod

27

2.2.6 Links
One of the linux features is links which can be hard links the pointer & the pointee uses the
same inode-or symbolic links which has its own inode1.
Note:
To create links we use ln command
Creating hard link:
$ ln pointeefile pointer file
Creating symbolic links:
$ ln -s pointee pointer.

2.2.7 Text editor:


In Linux there is a lot of text editors like emacs,vi,picoetc.
Text editors give us the ability to write something in plain text with no formatting.
We will explain the most popular text editor (vi)
To open the file if not exist it will create it$vi filename
Insert : then choose the mode following this options.
:i

Insert mode so you can write in the file.

:w

Save file

:q

Quit but we must save the file first

:q!

Force quit
Table 2-4:vi options

inode stores infos about file in Linux file system

28

2.2.8 Pipes and Searching:


2.2.8.1Pipes
Pipes are taking the output of one process as an input to another process. In technical terms,
the standard output (stout) of one command is sent to the standard input (stdin) of a second
command. Lets see the following example:
$ cd /Desktop
$ touch new.txt
$ls >new.txt
the previous commands are used to make a file located on the Desktop called (new.txt). we
are interested in the last command, this command tells the shell to list the content of the
Desktop and save the name of them in the file (new.txt)
The last command is representing Pipes, you told the shell to take the output of the (ls)
command and put it as an input to the file called (new.txt).
Another type of Pipes may use the sign ( | ).
2.2.8.2Searching
To learn about searching files and folders, try to execute the following example:
$ find / -name new | mv new new1
Here, it will search in the root directory (/) for a file called new and then rename this file
from new to new1
$ locate new.txt | mv new new1
In this example we replaces find with locate. locate is more fast than find; here, it
will search for new.txt on your system and rename it to new1 (of course you will perform one
of the two commands; find or locate)
Another command that we want to mention is grep. It is used to search for a specific word,
lets see an example:
# find / -print |grep grub
This command seems to be strange. Here, I told the shell to search in the root for anything
that has in its name the word grub and print all the paths of these files.

29

2.2.9 Installing packages:


2.2.9.1RPM
We use Red Hat package manager (RPM), so we must install RPM packages2.
2.2.9.1.1 The rpm package title
Example: this example illustrates the meaning of the package title.
unrar-3.4.3-1.0.rh8.i386.rpm
Name of the package: unrar
version: 3.4.3
release: 1.0.
then it describes if this package is for a certain Linux distribution
Fcx: fedora core x
rhx: RedHat version x
then it displays the architecture type:
i386: software run on i386 or later CPU
i686: software run on Intel pentium II or later CPU
no arch: no certain architecture.
2.2.9.1.2 Installing rpm packages:
Rpm packages are installed using rpm command
$rpm i packagename

// install package

$rpm q packagename

// query installed packages

$rpm -qp packagename

//query the present directory

You can find rpm packages at: www.sourceforge.net, www.rpmfind.net/linux/rpm

30

2.2.9.2Repositories and yum utility:


2.2.9.2.1 What are repositories
To understand the mean of a Repository lets consider an example. Assume that you are
working in your super-market and you need to buy some goods, so you have to go to a place
where you can find the goods you need and you can call this place is your repository.
You can apply the same concept here. Assume you want to install Octave on your Linux
machine, so you have to go to a place where you can find the packages of Octave and that
place is what we call a Repository.
So, we can define the Repository as a place on the internet where you can find the packages
you want.
Lets go into more details. There are a lot of repositories in the web such as freshrpms, livna,
rpmfusion and a lot. But dont forget that each distribution has its own repository (e.g.:
Centos has its own Repository and so on). Your system must know which Repository it will
use; of course it will use its default repository in the beginning.
To see the repositories your system uses, go to /etc/yum.repos.d. In this directory you will
find a file corresponding to each repository defined on your system (.repo file to each
repository)
If you want to enable another repository, you shall go to the repository website and download
its package and double click on it to install it. Lets take an example:
2.2.9.2.2 Getting repositories:
Consider

you

wanted

to

install

freshrpms

repository,

go

to

this

site

http://www.ayo.freshrpms.net and download the package that fits with your system and
double-click on it to be installed. After that, you can go to /etc/yum.repos.d and you will find
a .repo file that corresponds to freshrpms repository. But, what is really happening?
2.2.9.2.3 Installing using yum
When you are about to install a specific software, you are really ordering your system to
search in your repositories for all the packages that this software needs to be installed on your
system. And after that it downloads the packages and installs it. Lets complete on the
previous example; you want to install VLC media player, you will type in the Terminal the
following command:
$ yum install vlc

31

Your system will search in the repositories - suppose it found it in freshrpms repository - so it
will download all the packages that VLC needs from this repository and installs them.

Note: rpm installs packages from your current directory ,yum uses the repositories to
download the required packages then install it.
2.2.9.3Installing from source code:
This is the steps to install source code:
1-download the source code from which has the extension (.tar.gz) or (.tar.bz2)
2-for (.tar.gz) use the following command to extract.
$tar -zxvf ./filename
For(.tar.bz2) use the following command to extract.
$tar jxvf ./filename
-z : uses gzip to decompress
-x:dextract the files from the compresed archive file.
-v display each file as its processed
-f the name of the file to extract.
3-the files will be extracted in a directory in the current directory, which will contains the
source code files.
4-go inside the directory and use configure command to prepare the installation files to be
compiled.
#./configure
This command will verify the environment, C++ compiler and it will create the make file.
make file contains the instructions for how the executable should be compiled to run on your
platform.
5-Now, its time to convert the text based source code to binary executable file
Using make command.
#make

32

6-finally, install it using the following command.


#make install

Notes:
1-Before configure we may read the READ ME file or INSTALL file to know if theres
something special we have to do before configuring the files.
2-Generally: it will be installed in /usr/local/XXX.
3-to clean up go to the directory where it installed then use rm command to remove it.

2.2.10 Linux File system:


The basic directory structure in Linux.:
/bin: contains the excutable files and the shells.
/etc: This houses most of the text based configuration files for Linux.
for example: lilo.conf, the file that tells you which OS to boot is in there.
/usr: which contains the application files ,it contains subdirectories .
/bin: most executables programs.
/lib: library files.
/sbin: system administration programs.
/share: documentations and man pages
/dev: contains the file needed to represent the hardware devices. Here your hard drive if you
are using
an IDE hard drive will be known as /dev/hda
/boot: boot loader files which are needed to boot your system.
Usually this is the home of the Linux Kernel.
/root: if you are not a root we will see this directory but its a restricted area.
/sbin: it contains the important system management files.e.g: fdisk,init,mkfs.
/tmp: contains temporary files created by user or by the system.
33

this files are deleted when its not in use.


/sys: contains informations about your H.W.
/var: contains varity of variable data and files which may change its size.
/opt: contains files of some programs installed .
/media =OR=/mnt: both are used to mount external devices but which one will be used , it
depends on the distributions

2.2.11 Mounting:
2.2.11.1 Mounting device using mount command
This command is rarely used, because auto-mounting is one of the features in the new
versions of the distributions.
$mount t

filesystemtype devicename

$mount -a

devicename

mountpoint

mountpoint

-a option will try to mount with all supported file systems.


mountpoint: the directory which will be used to access the device.
Filesystemtype: refers to (ntfs, fat, fat32.etc)
Devicename: each device has a name under Linux
e.g.:
IDE hard disk (/dev/hdaXX) where XX refers to the partition number.
Floppy disk

(/dev/fd0)

2.2.11.2 unmounting devices:


Use this command to un-mount a device.
$ umount devicename

2.2.12 On the job:


In this section we will explain some important skills you need to know about linux and some
problems face us on our daily life routine or during working on cluster.

34

2.2.12.1 Shortcuts
1- Type ls m push the tab key. Linux is going to beep a couple of times, but you keep
pushing. You will now see every file in the directory that begins with the letter 'm'.
2-to use a file you dont have to write the whole name you can write the first letters only and
press tab and Linux will write it for you.
3-the commands you are executing are saved in a file called history in your home directory.
You can write history in terminal and it will display the commands.
2.2.12.2 general notes:
1-We can change the password by logging in (root)
#passwd username
Then it will ask for the new password.
2-To know the file type you can use this command
$file filename
3-to print the passes where the system execute its executables automatically
$echo $PATH
echo is a command to display something on screen.
PATH is an environment variable3 where the executable paths are stored, to display all the
environment variables we can use this command
$env
To display all the variables -user and environment variables- use this command, we use less
because there a lot of variables.
$set |less
4-to know about your system go to /proc and execute this command.
$cat devices

displays the installed devices.

$cat cpuinfo

infos about your system

$cat version

displays your kernel version.

Environment variables are variables used by the O.S.

35

2.3 OSCAR
2.3.1 What is OSCAR?
OSCAR is a snapshot of the best known methods for building, programming, and using
clusters. It consists of a fully integrated and easy to install software bundle designed for high
performance computing (HPC) cluster. Everything needed to install, build, maintain, and use
a Linux cluster is included in the suite.
OSCAR is the primary project of the Open Cluster Group. For more information on the group
and its projects, visit its website http://www.openclustergroup.org/.
2.3.2

Installation steps

Note: next steps were made on both Fedora 8 and Centos 5.


Download the 3 files of OSCAR 5.1 from http://sourceforge.net/projects/oscar :
oscar-base-5.1rc1.tar
oscar-repo-common-rpms-5.1rc1.tar
oscar-repo-fc-8-i386-5.1rc1.tar
Extract (oscar-base-5.1rc1.tar) on (/opt) then rename it to (oscar)
Make a new directory in the root (/tftpboot)
Extract the other 2 files into the new directory.
Make a new directory inside the (/tftpboot) directory and name it (distro).
Put all the rpm's of the DVD of your distribution on the (/tftpboot/distro).
Open a terminal window.
Go to /opt/oscar
Do the command (./install oscar eth0)

Errors:
error1 : ssh-check failed
error2 : yum-check failed
You shall uninstall SSH from yum and everything will be fine as the installer will install SSH
for you.
And make sure that there is no error while opening the Package Manager as we discovered
that yum is not working properly caused by changing data in repo files.

36

error3: "couldn't initialize the global database values table at


/opt/oscar/scripts/prepare_oda"
Check the network manager to see what the working device is. If it was peth0, you should
change the command to: ./install_oscar peth0

After these steps we could open the GUI of OSCAR but it wasn't complete as some of the
buttons were not active, and that is normal because you must go through the steps one by one.
2.3.2.1OSCAR GUI installation steps:

Figure 2-19: setting permissions using chmod

Step 0: inactive button and this is normal in OSCAR version 5.1


Step 1: Select OSCAR Packages, didnt change it
Step 2: Configure selected packages, didnt change it
Step 3: Install Server Packages, Received errors because of packages that couldnt be found
like: Heartbeat-pils, Heartbeat-stonith

Solution: Tried to install them using yum, but they werent found in the installation DVD or
in Fedora repos, so we installed many repositories for yum like: rpmfusion, rpmforge, livna,
kiwzart but couldnt find those packages in those repos as well
37

So we removed the packages that needed those 4 packages, then it worked and OSCAR
server is now installed.
Note: We later found them on rpmfind.net
Step 4: Build OSCAR client image,
error: many packages all starting with opkg-*-client (OSCAR package) are not installed,
e.g:opkg-sis-client.
Solution: Use package manager to install them from your hard disk, by make a new yum
repo file pointing to local file on hard disk that contains the Oscar packages that was
extracted:
oscar-base-5.1rc1.tar
oscar-repo-common-rpms-5.1rc1.tar
oscar-repo-fc-8-i386-5.1rc1.tar
Then Package manager will find them and install them

Now the OSCAR client image is in /var/lib/systemimager/images/oscarimage of about


645MB.
Note: After installing another image we noticed that its size is about 27MB only.
Then we stopped working on OSCAR as Rocks is now working and the group chose ROCKS
to continue our Project using it.

38

2.4 Rocks cluster


2.4.1 ROCKS
Rocks is considered as a distribution of GNU-Linux.it is used for managing clusters.
Rocks is build from two main units:

Any distribution of GNU-Linux (Centos 5 in our case & this is the default).

Packages ((Rolls)) used for managing the cluster.

From the INSTALLATION desiccations this was clear as the installed packages were:

Kernel (Linux kernel).

OS (Operating system Centos5 in our cluster, might use any OS).

ROLLS.

In simple words:
ROCKS= (Kernel + OS +Rolls) =New distribution of GNU-Linux.

2.4.2 ROLLS
Rolls in our cluster:
1. Base.(required)
2. Area51. (required)
3. HPC. (required)
4. Web-server. (required)
5. Ganglia.
6. Sge.
7. Xen.
8. Java.
Other rolls might be added from www.rocksclusters.org in the download department.
2.4.2.1Installing Rolls
On a new cluster:
The Roll should be installed during the initial installation of your cluster as introduced
before.
ON an Existing cluster:
After installing the cluster any Rolls might be added on the frontend.
Assume that you have an iso image of the roll called area51.iso.

39

on the Terminal type:


$ su - root
# rocks add roll area51.iso
# rocks enable roll area51
# rocks-dist dist
# kroll area51 | bash
# init 6

Very important note: The roll that will be installed must be a real roll (i.e. Dont try to use
this method in installing any packages except rolls).
2.4.2.2 Base
The base contains the core rocks structure that is used to install, configure and maintain
clusters. So Base roll makes the collection between the kernel and Operating system from a
side and the other ROLLS in another side.
2.4.2.3Area51
Contain two packages:
Tripewire: Tripwire is configured to automatically scan the files on your frontend daily.
Open the Tripewire from the home page (localhost).
Chkrootkit: To see if your frontend has been infected by a rootkit, execute:
# /opt/chkrootkit/bin/chkrootkit
2.4.2.4HPC:
The primary purpose of the HPC Roll is to provide configured software tools that can be used
to run parallel applications on your cluster.
The following software packages are included in the HPC Roll:

MPI over ethernet environments (OpenMPI, MPICH, MPICH2)

PVM

Benchmarks (stream, iperf, IOzone).

Stream used to measure the bandwidth (memory performance).

Iperf used to measure performance of network.

iOzone tests file I/O performance (disk performance).

Very important note: we will not use this benchmarks as we will use Cbench which contain
this benchmarks and other ones.
40

2.4.2.5Ganglia:
Monitor the activity of each node.
Open ganglia from the home page (localhost).
Disadvantage: you might feel that ganglia is slow in monitoring
2.4.2.6SGE (SUN Grid Engine scheduler)
SGE is a distributed resource management software and it allows the resources within the
cluster (cpu time,software, licenses etc) to be utilized effectively.
Another thing that the Roll does is that that generic queues are setup automatically the
moment new nodes are being integrated within the Rocks cluster and booted up.

2.4.3 Installation:
Source of Rocks: www.rocksclusters.org
You have two options to download from the site. The first is through DVD, the other is
through several CDs, on both ways, there are required materials that you have to download
and optional material you can download it or not.

2.4.3.1The minimum requirement to bring up a frontend is following rolls:

Kernel/Boot Roll CD

Base Roll CD
Web Server Roll CD
OS Roll CD - Disk 1
OS Roll CD - Disk 2

2.4.3.2The minimum Hardware Requirements


For Frontend Node:

Disk Capacity: 30 GB

Memory Capacity: 1 GB

Ethernet: 2 physical ports (e.g., "eth0" and "eth1")

BIOS Boot Order: CD, Hard Disk


For Compute Node:

Disk Capacity: 30 GB

Memory Capacity: 1 GB

Ethernet: 1 physical port (e.g., "eth0")


41

BIOS Boot Order: CD, PXE (Network Boot), Hard Disk

On our installation we didnt use the optional materials; we used only the required material
DVD copy. Installation contains three steps:
1- Installing frontend.
2- Installing computed nodes
3- Check the network and connectivity between nodes.
2.4.3.3 Installing frontend:
First connect the node to a network have DHCP server in the first network card public card.
Insert the DVD the power on the node.

Figure 2-20: First Rocks screen

1-

Now type frontend word to enter the installation of the frontend. The node will take

the public IP from the DHCP server and take the private address as default 10.1.1.0/24.
The next screen will appear to you:

42

Figure 2-21: Selecting Rocks Rolls 1

2-

Be sure that the DVD on the drive and press CD/DVD-based Roll. Another message

will appear, mark on all options on it and press submit.

Figure 2-22: Selecting Rocks Rolls 2

3-

The cluster information screen will appear. You can fill it with any data you want

43

Figure 2-23: Rocks cluster information screen

4-

Two messages will appear contain the public and private IPs. Be sure that you noted

these IPs.

Figure 2-24: Eth0 IP

Note: you can change the private IP or the Netmask to anything you want "it should be on
the range of private network IPs

44

Figure 2-25: Eth1 IP

Note that eth0 is for private address and eth1 is for public address.

Figure 2-26: Gateway and DNS IPs

Gateway and DNS servers it will be filled by DHCP server what you will have to do is
press next.
45

Figure 2-27: Setting root password

You have to define the password of the cluster's root now.


Now we reached to the part of disk partitioning. You can choose manual or automatic. Of
course automatic is much easier so mark on it and press next

Figure 2-28: Disk partitioning

If you want to use manual partitioning you have to do the following:

46

First delete all exists partitions and begin to create new ones

Specify 16 GBs at least for the root partition under mount point of /

Specify 4 GBs for /var

Specify twice size of your rams to /swap

Specify the rest of the hard disk for /export

Figure 2-29: Manual Partitioning

The frontend will format its Disk, and then it will ask for DVD to download the rolls from it.

Figure 2-30: Kernel - Disk1 inquiry

47

5-

Congratulations now you have finished the installing of frontend

Figure 2-31: Rocks frontend installing status

6-

On The first time you open the terminal, the system will ask you to configure SSH.

This message will appear:

It doesn't appear that you have set up your ssh key


This process will make the files:
/root/.ssh/id_rsa.pub
/root/.ssh/id_rsa
/root/.ssh/authorized_keys
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): /root/.ssh/id_rsa)
Enter passphrase (empty for no passphrase):
Your identification has been saved in /root/.ssh/id_rsa).
Your public key has been saved in /root/.ssh/id_rsa).pub.
The key fingerprint is:
76:42:7a:14:cf:4c:2f:0e:93:bd:3d:e3:24:94:20:74 root@cluster.hpc.org

In the first one just copy the file (/root/.ssh/identity. In the second , enter the password you
want for SSH.
48

2.4.3.4Installing the computed nodes:


Installing the computed nodes is very simply through these steps:
1First connect the node to the frontend through switch or directly through

cross-over

cable.
2-

Open the terminal on the frontend and write this command


# insert-ethers

Figure 2-32: Insert-ethers window

3-

Choose compute from this menu, you may have to choose Ethernet switches if

you are connecting the node to the frontend through programmable switch note that if your
switch is not programmable you will choose compute also.
4-

Then you will see

Figure 2-33: Insert-ethers waiting for new compute node

49

This indicates that insert-ethers is waiting for new compute nodes.


5-

Switch on the computed node and boot from the Rocks CD. Here do nothing to enter

computed node installation.

Figure 2-34: Rocks's first screen

6-

These screens will appear to you respectively on the frontend :

Figure 2-35: Insert-ethers discovering compute node

Now the frontend discovered the computed node which has this MAC address.

50

Figure 2-36: Insert-ethers didnt send a kickstart file yet

Insert-ethers discovered the node but the node didnt request a kickstart file yet.

Figure 2-37: The node has successfully received the kick start file

The node has successfully requested the kick start file. And the installation will begin to
install the files on the computed node.
Note: a message may appear to you telling you that there is a missing file with two options
"reboot or retry". Just insert the CD again and press retry .
2.4.3.5 Check connection between the frontend and any computed node:
If there is a problem on connection between the frontend and computed node the installation
will not complete.
51

After you finished the installation of the computed node, you have to check the connection
between it and the frontend through any way of these ways:
2.4.3.5.1 By SSH:
On the computed node write this command
SSH frontend or SSH (then IP of the front end without brackets)
Then enter the password of SSH. If you entered the front end, the connection is ok. If not,
you have to check this problem.
2.4.3.5.1.1 By ping:
On the computed node write this command Ping frontend or ping (then IP of the front end
without brackets) If it replies, the connection is ok. If not the connection is lost.

2.4.4 Problems we faced during installation:


2.4.4.1IPs problem
We were choosing manual IP's entering method on the installation ,then entering any IP for
public IPs field. So the system was trying to find this real IP but it doesnt find it , so its
hanging and restart the installation.!
Solution: connect the frontend to the DHCP server and begin the installation.
Explanation: the IP's we entered is not real IP's especially public IP. When we searched, we
found that the cluster must have DHCP server when you are installing the frontend.

2.4.4.2RAMS problem
At the first we have nodes with 256 and 512 MB ram. Every time we try to install it we faced
these errors:

10.1.1.1/images/minstge.img file not found on sever

Unable to retrieve http://10.1.1.1//install rocks_ist/lon/i386/images/minstge.img

Solution: Finally after a lot of trials and searches we found that the min requirement to the
frontend or the computed nodes is 1 GB Ram.
Explanation: the cluster couldnt cache all its data on rams less than 1 GB so errors
appeared.

52

2.4.4.3IP conflict problem


Our DHCP was generating from network 10.44.0.0/24. We let the default private IP as its
10.1.1.1/8. So conflict happened due to the netmask.
Solution: We changed the netmask of the private IP to 24 instead of 8.
Note: the faculty changed the range of IPs to 172.x.x.x ,so this problem will not appear to
you now

2.4.4.4On board LAN card on the frontend


We tried a lot to work with network card built in the motherboard but we failed
Solution: We bought two external cards. One for public network and the other for private.
But on the computed nodes the on board cards works normally. "The faculty changed the
range of the IPs, so this problem now does not exist.

2.4.4.5Insert-ethers and switch


We used "Ethernet" option from insert-ethers option since we connected nodes through
switch.
Solution: "Ethernet" option is used only for programmable switches not normal switches.
So if you are using normal switch select compute option not Ethernet

2.4.4.6Installing the computed nodes and network:


While the installing of the computed node on the part of loading file to the hard disk, the
network became down due to fault from one of the team. The installation didnt resume until
the network become up again.

Explanation: installing the computed node doesnt depend on the CD only, but also depend
on some files on the frontend which the computed node get it through the network.

2.4.4.7 Scilab does not open the plotting window


Problem : installing video driver
Steps :
1. Get the driver name from the manual of the motherboard.
53

Nvidia MCP73U/PV/V chipset.


2. download the driver packages from:
ftp://download.nvidia.com/XFree86/Linux-x86/185.18.14

Figure 2-38: Nvidia driver readme file

Then read README file to understand the steps


ftp://download.nvidia.com/XFree86/Linux-x86/185.18.14/README/README.txt

3. After installing the two packages; the device manager will keep the status of the
video card unknown device but it is successfully installed.

2.4.5 Useful Commands:


These commands is the most commonly used , but you can get all Rocks commands from
user guide on rocksclusters.org
2.4.5.1ADD interface
# rocks add host interface compute-0-0 eth1 ip=192.168.1.2 subnet=private name=fast-0-0.
This command is used to define anew interface eth1 to host compute-0-0 with ip 192.168.1.2
to private subnetmask.

2.4.5.2Disable rolls:
# rocks disable roll ganglia version=5.0 arch=i386
Here we disable ganglia roll ver. 5 with arch=i386
2.4.5.3Enable rolls:
# rocks enable roll ganglia version=5.0 arch=i386
54

enable ganglia roll ver.5.0


2.4.5.4Host interface
#rocks list host interface compute-0-0
list all network interfaces on compute -0-0
2.4.5.5host interface configration
# rocks config host interface compute-0-1 iface=eth0
Output :
DEVICE=eth0
HWADDR=00:12:3f:3a:09:cc
IPADDR=10.255.255.253
NETMASK=255.0.0.0
BOOTPROTO=static
ONBOOT=yes
This command list all details about the configuration of interface eth0 of compute -0-1
2.4.5.6 dump specific host
$rocks dump host compute-0-0
Output:
/opt/rocks/bin/rocks add host compute-0-1 cpus=1 rack=0 rank=1 membership="Compute"
This command list the details of the host like its membership type and number of cpus
2.4.5.7Dump all hosts
$dump host
Output
/opt/rocks/bin/rocks add host cluster cpus=1 rack=0 rank=0 membership="Frontend"
/opt/rocks/bin/rocks add host compute-0-1 cpus=1 rack=0 rank=1 membership="Compute"
This command list the details of all hosts
2.4.5.8 Host appliance
# rocks list host appliance
Output:
HOST
cluster:

APPLIANCE
frontend

compute-0-1: compute
55

this list all appliance with there names.


2.4.5.9List partitions
$ rocks list host partition compute-0-0
list all partitions on the compute -0-0
2.4.5.10 List rolls
$ rocks list host roll frontend-0-0-0
list all rolls on the front end
2.4.5.11 List network
$ rocks list network private
list the info about the private network.
2.4.5.12 List roll commands
$ rocks list roll command base
List all commands provided by base roll
2.4.5.13 Remove hosts
# rocks remove host compute-0-0
Remove compute -0-0 from the database of the cluster.
2.4.5.14 Remove interfaces
# rocks remove host interface compute-0-0 eth1
Remove eth1 from the compute-0-0's interface
2.4.5.15 Remove rolls
# rocks remove host roll frontend-0-0-0 base 5.2 x86_64
Remove base roll ver. 5.2 from frontend
2.4.5.16 Remove networks
## rocks remove network private
Remove private network
2.4.5.17 Set ips for interfaces
# rocks set host interface ip compute-0-0 eth1 192.168.0.10
Set ip of eth1 equal 192.168.0.10

56

2.4.5.18 Set password


#rocks set password
Set password of rocks

2.4.6 Notes for running clusters:


There is an important note you shall know, when you deal with rocks you dont deal with
pure linux. So to change some configuration you may have to use rocks command not linux
command.
Network configuration is an example. If you change the network configuration from linux's
GUI, your configuration may be damaged and you have to reinstall the cluster. So when you
want to change network configuration, use rocks commands like rocks set, rocks add and
rocks remove.
When you are installing the computed nodes or front end, be sure that you took out the CD
before the reboot is done. Otherwise reinstallation will start, and in this case you may have
two options on the boot loaders. The default will be the reinstall option not rocks options.
You can edit it by choosing rocks option to be first.

2.5 Cbench
2.5.1 What is Benchmarking
The term Benchmarking is a general term, it exists in many fields. Here, we are interested
in the definition of Benchmarking in computer cluster.

2.5.2

General definition of Benchmarking

Benchmarking is the process of comparing your resources (cost, cycle time, productivity,
quality ....) to a specific standard or a best practice.

2.5.3 Benchmarking for A cluster


Benchmarking is a tool for testing the performance of your cluster with the comparison with
the best cluster existing now. One tool of this is Cbench. Cbench is a framework for using
various tests, benchmarks, applications, and utilities to stress and analyze *nix-based parallel

57

compute clusters. Cbench is used to facilitate scalable testing, benchmarking, and analysis of
a Linux parallel compute cluster.

2.5.4 Levels of Cluster Testing


Cluster testing seems to be divided into three levels.
1.

Node Level.

2.

Point-to-Point System Level.

3.

MPI System Level.

2.5.4.1NODE LEVEL:
In this testing level we are only concerned with testing each node separately, without
worrying about high-speed interconnects or system MPI libraries, etc.
The goal of this test is to determine the node that has acceptable performance.
Examples:

Memory performance tests (STREAMS).

Disk performance tests (iozone).

CPU performance tests like DGEMM.

2.5.4.2Point-to-Point System Level:


In this test we test the connection between our nodes. Point-to-point testing is important for
the links in a high-speed interconnect such as Infiniband or Myrinet, since we are using
Ethernet which is not very high network we will not this test level as it is very easy to check
our network without any test.
2.5.4.3MPI system level:
This test measure how well our system runs MPI applications. This test encompasses
everything in the system acting together including the high speed interconnect, management
networks, OS installs, file systems, parallel file systems, batch system, scheduler, job launch,
etc.

2.5.5 Installing Cbench:


First of all you must move to the place of Cbench after expanding it. So execute:
$ cd /opt/cbench

58

Cbench requires the following environment variables to be set:

1. CBENCHOME - Location of the Cbench distribution tree (i.e. the place of


Cbench after expanding it)
We expand Cbench in /opt/cbench.
So execute:
$ export CBENCHOME=/opt/cbench
2. CBENCHTEST - Location of the Cbench testing tree
So execute:
$ export CBENCHTEST=/opt/cbench/cbench_tests
This command will also make a directory cbench_tests in this path
/opt/cbench/cbench_tests you can rename the directory with any name.
3. MPIHOME - Location of the MPI tree (i.e the places of our MPI after
installation )
We choose OpenMpi ,which is in /opt/openmpi
So execute:
$ export MPIHOME=/opt/openmpi

4. COMPILERCOLLECTION - Pick the compiler collection you want to compile


with. Examples include: intel, gcc, pgi.
We choose gcc.
So execute:
$ export COMPILERCOLLECTION=gcc
Finally, in order to run Linpack or any other tests we require a BLAS (Basic Linear
Algebra Subprograms) library
We choose Lapack library as the BLAS library.
Now we must set the BLASLIB and RPATH variables.
So execute:
$ export RPATH=/usr/lib
The place of the libraries in our cluster .
$ export BLASLIB="-Wl,-rpath,$RPATH -L$RPATH -llapack -lm"
59

Choosing lapack
Make and install the standard collection of tests.
Note: While installing try to be connected to the internet, because some
packages will be downloaded.
Execute:
$ make install
This compiles and installs the standard collection of test binaries in $CBENCHOME/bin.
So you find the binaries in /opt/cbench/bin, which was not present before installation.
Install the Cbench test set tree.
Execute:
$ make installtests
This will install all test sets that are currently packaged in Cbench into the tree specified by
the CBENCHTEST environment variable.
So you will find the installed packages in /opt/cbench/cbench_tests
Compiling non-default parts of Cbench.
Some code within Cbench is not compiled by default. Example: HPC Challenge and NAS
Parallel Benchmarks.
Building HPCC with the system MPI

Execute:
$ make -C opensource/hpcc distclean
$ make -C opensource/hpcc
$ make -C opensource/hpcc install
$ make itests (to update the CBENCTEST tree)
Building NPB
Execute:
$ make -C opensource/NPB
To make sure your NPB binaries get updated in the CBENCHTEST tree, from the top-level
of the Cbench tree run :
sbin/install_npb
Building the self-contained MPICH for Cbench.
Execute:
$ make -C opensource/mpich
Building only what is required for node-level hardware testing
60

Execute:
$ make nodehwtest
EROOR: cannot find llapack
This error means that lapack library, which we choose to be the BLAS library cannot be
found.
The solution: Download lapack from package manger.
Type lapack in the search and install lapack devel .
Then execute:
$ make nodehwtest
Building HPCC with the Cbench MPICH
Execute:
$ make -C opensource/hpcc distclean
$ make -C opensource/hpcc local
$ make -C opensource/hpcc install
$ make itests (to update the CBENCTEST tree)

At this point we have successfully installed the Cbench.

61

Chapter 3:
Parallel Programming

62

Chapter 3:

Parallel programming

3.1 Introduction to parallel programming


3.1.1 Goal
Parallel programming is used to reduce the runtime required for a program by distributing a
problem over several processors, and it's not to maximize the efficiency because we may
have an algorithm less efficiency than another but it uses less number of floating point
operations which minimize the runtime required for a problem.
To apply parallel programming we must know about
-

Architectures existing for parallel computers.

Software need for parallel computing

How to analyze the software

How to write parallel programs.

Then we distribute the code and coordinate the communication and data transferring between
the head node and the computing nodes

3.1.2 Types of parallelism


Bit level parallelism: Reduces the number of instructions the processor must execute
Instruction-level parallelism: This gives a higher throughput.
Task parallelism: Execute more than one program at the same time.

3.1.3 System architectures


3.1.3.1Single instructions single data SISDThis is the standard sequential computer.(scalar computer) i.e. algorithms for SISD
computers

do

not

contain

any

parallelism,

there

is

e.g.: summation of two numbers:

Figure 3-1: SISD performs one instruction per cycle

63

only

one

processor

The above figure shows the SISD performs one instruction per cycle.
The solution for this problem is Pipelining the advantage with all the functional units being
busy so it will produce one result per cycle.

Figure 3-2: The advantage of pipelining

3.1.3.2Single instruction multiple data SIMD


It performs one instruction on several data sets, it is called vector computer.

Figure 3-3: The SIMD

64

Example:
Adding two matrices A + B = C.
Say we have two matrices A and B of order 2 and we have 4 processors.
A11 + B11 = C11 ... A12 + B12 = C12
A21 + B21 = C21 ... A22 + B22 = C22
The same instruction is issued to all 4 processors (add the two numbers) and all
processors

execute

the

instructions

simultaneously.

It takes one step as opposed to four steps on a sequential machine.

3.1.3.3Multiple instruction multiple data (MIMD)


Combining several processing cores no matter scalar or vector processors gives us a
computer that can process several instructions and data sets per one cycle (MIMD).

Figure 3-4: the MIMD

We may build MIMD using shared or distributed:


3.1.3.3.1 Shared memory
Usually all processors are connected to a common memory and all processors are identical
and have equal memory access. (Symmetrical Multi-Processing)

65

Figure 3-5: shared memory MIMD

The way to connect processors and memory is a dominant issue


3.1.3.3.1.1 Type1: shared memory with a bus:

Figure 3-6: shared memory with a bus

There is a huge disadvantage is that all processors have to share the bandwidth provided by
the bus.
This type of shared memory may be found in desktop systems and small servers. To
overcome the BW problem direct connection from memory to CPU is desired.

66

3.1.3.3.1.2 Type2: shared memory using crossbar switch.

Figure 3-7: Shared memory using crossbar switch

Advantage:
The big advantage of shared memory systems is that all processors can make use of the whole
memory.
Disadvantage:
Shared memory solves the inter-processor communication problem but introduces the
problem

of

simultaneous

accessing

of

the

same

location

in

the

memory.

The limiting factor to their performance is the number of processors and memory modules
that can be connected to each other.
3.1.3.3.2 Distributed memory

Figure 3-8: Distributed memory

Each CPU has its own local memory and CPU can only access its own memory.
67

The importance of the network connections is lower than in the case of a shared memory
system, distributed memory systems can be hugely expanded
3.1.3.4Gathering up:

Figure 3-9: Flyn's Taxonomy for architecture

Distributed Memory

Shared Memory

Large number of processors

modest number of processors

(100's - 1000's)

(10's - 100's)

High power

Modest power

Unlimited expansion

Limited expansion

Difficult to fully utilize

Easy to fully utilize

Revolutionary parallel

Evolutionary parallel

programming

programming

Table 3-1: Distrbuted memory Vs. shared memory

68

3.1.3.5Cluster
A cluster consists of several cheap computers (nodes) linked together. The simplest case is
the combination of several desktop computers

Figure 3-10: cluster architecture

Advantages:
Clusters offer lots of computing power for little money; they are ideally suited to problems
with a high degree of parallelism.
Its easy to upgrade a cluster.

Disadvantages:
32 bit systems cannot address more than 4 GB of RAM and x 86-64 systems are limited by
the number of memory slots, the size of the available memory modules, and the chip sets

3.1.4 Performance analysis


3.1.4.1Goal
It is used to determine which section of the program to speed it up by increasing the speed or
reducing the required memory.
3.1.4.2Timing
It doesn't make sense to lose your time trying to parallelizing a part of program cannot be
parallelized.

69

We must know about:


Wall time: The times between starting till the end and it's the time to be minimized.
User time: the actual runtime used by a program
System time: time used by the OS not by the program.
e.g.: allocating memory or hard disk access.
3.1.4.3Profiling:
The investigation of a program's behavior using information gathered as the program
executes. For profiling .the program has to be built with information for the profiler.
Profiler: it's the performance analysis tool measure the behavior of a program during
execution or statistical summary (profile) or an ongoing interaction with the virtual machine
monitor (VVM4).
3.1.4.3.1 Profiler types:
Flat profiler: Flat profilers compute the average call times
Call-graph profiler: Call-graph profilers show the call times, and frequencies of the
functions
3.1.4.4Measuring performance
Profiling and timing give us an absolute measurement of the computation time used by the
program
3.1.4.4.1 The floating point performance
FLOPS: floating time operation per second ,FLOPS is used to measure the floating point
performance.

virtual machine monitor (VMM), is a computer software/hardware platform virtualization software that allows

multiple operating systems to run on a host computer concurrently

70

3.1.5 Example to illustrate Parallel programming


There are many basic mathematical operations that have highly parallelism degree. So we can
perform it separately and independently.
Example:
Ci=Xi*Yi where i=1,.N, we can calculate the N vectors independently so
We can call it a perfect mathematical parallelism which we refer to it EP (embarrassingly
parallel)
We will take the divide and conquer approach to show how we can extract parallelism

Figure 3-11:Inner product following the divide and conquer algorithm

assuming we have :
N=# CPUs
t: time to perform addition.
q: # stages.
C: time to collect results from sub-sums.

71

To reduce the evaluation of the dot product into a summation of two numbers on each
processor we want to have N=2^q.
In our case (Error!

Reference source not found.) we have N=8 & q=3.

Now; we want to figure out the speed up which will be


Speed up = time before parallelism/time after parallelism
Time before parallelism (straight forward approach) = (N-1)*t and time after parallelism
=q(t+C)
Its clearly obvious that we have a communication cost = q*C.
Assuming relative time (rt)=C/t.
So we can write the speed up =N-1/(1+rt)*log2(N).
If we ignore (C) we will have
Speed up =N-1/log2(N) < N and efficiency =N-1/N*log2(N).
Equation1 : speed up and efficiency

Its clearly even if we ignore the communication time the efficiency still less than perfect
So we can notice that there are factors limits the parallelism degree e.g the number of CPUs
doesnt match exactly with the problem. The latency time which may slow the data
transferring and the memory limits as the problem may not fit in the memory.

3.1.6 Amdahls law


Its a general model of the speed factor and it says theres some percentage of the program or
code cannot be parallelized, which will be mentioned here by () =degree of parallelization.

72

Figure 3-12: Speed up vs number of processors for three different degree of parallelization

We can notice from Error! Reference source not found., its not only about the number of
processors, we may have only five cores and we have the job done exactly like 100 cores.
This is directly depending on the degree of parallelization ().
Notes:
1-Amdahls law relies on the assumption that the serial work () does not depend on
the size of the problem.
2-In practice: decreases as a function of problem size and the upper bound of the
speed factor usually increases as a function of the problem size.
3- Amdahls law represents the upper limit of the speed up. But it ignores the
processors efficiency as it represents the speed up as a function of number of CPUs
only.

3.1.7 Gustafson's law:


Any sufficiently large problem can be efficiently parallelized, The execution of a program on
a parallel computer is represented:

73

a(n)+b(n)=1 equ.2
Equation2 : the execution of program under Gustafsons law

Where:
a(n) is the serial fraction and b(n) is the parallel fraction .
P: the numbers of CPUs.
Then;
Speed up =a(n)+P*b(n).  equ.3
From equ.2 and equ.3
Speed up =a(n)+P(1-a(n)).
Equation4: Speed up under Gustafsons law

Its clearly the increasing of CPUs will not affect the serial part.

74

3.2 MPI
Message Passing Interface (MPI) is a portable standard API for programming parallel
computers. It is a language-independent communications protocol used to program parallel
computers.
The basic idea behind MPI is that multiple parallel processes working concurrently towards a
common goal using "messages" as their means of communicating with each other.
MPI is a static library linked to C code to enable you to parallelize your code by sending
messages to other processes. See the following example:
Hellowrold.c
#include <stdio.h>
#include <mpi.h>
int main (argc, argv)
int argc;
char *argv[];
{
int rank, size,

MPI_Init (&argc, &argv);

/* starts MPI */

MPI_Comm_rank (MPI_COMM_WORLD, &rank);

/* get

current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of
processes */
printf( "Hello world from process %d of %d\n", rank, size );
MPI_Finalize();
return 0;
}

The previous code is a simple C file to make every process print its hello world message in
the terminal. Don't worry about understanding this code for now; instead we will discuss the
called subroutines of MPI in the code. But, the important thing that we want to mention is

75

that this file is downloaded on each node (process) and each of them executing its file
according to its rank5.
In the coming part we will discuss the very important subroutines in MPI which you will
need to begin your first step in parallel programming.

3.2.1 Basic MPI definitions:


3.2.1.1Rank
Rank is a unique ID for a process.
3.2.1.2Communicator
The Communicator is a local object that represents a communication domain which contains
a number of processes. A communication domain is a global, distributed structure that allows
processes in a group to communicate with each other or to communicate with processes in
another group. A communication domain of the first type (communication within a group) is
represented by an intracommunicator whereas a communication domain of the second type
(communication between groups) is represented by an intercommunicator. The default
Communicator is (MPI_COMM_WORLD)

Figure 3-13: MPI_COMM_WORLD

From the previous figure you can see that each process has its own rank and they all belong
to the default communicator (MPI_COMM_WORLD).

Rank: is the number of the process in the communicator.

76

Figure 3-14: Communicator & group

Again, you can see from the previous figure that each process has its own rank inside the
default communicator and also it has its own rank inside another communicator (BLACK,
WHITE, BLUE)
3.2.1.3Group
Group is a communicator that holds some processes. This communicator resides in the
default communicator (MPI_COMM_WORLD)
3.2.1.4MPI message
MPI message consists of two parts (Data & envelope). Data is the portion that holds
information about your message (e.g.: the buffer which holds the data, the length of the data
and the data type). Envelope is the other portion that holds information about the sending and
receiving operation (e.g.: the destination, the communicator of the process )

77

3.2.1.5Blocking Vs. Non-blocking Communication

User

User Buffer

System Buffer

Figure 3-15: Send operation block diagram

When you are sending data, the data first is put in a user buffer then go to a send buffer which
sends it to the connection medium. In a Blocking communication like MPI_ Send, the routine
doesnt return until the user buffer is empty which holds safe operation. But, in a Nonblocking communication like MPI_Isend, the routine returns immediately without checking
the user buffer.
Blocking send

MPI_Send(buffer,count,type,dest,tag,comm)

Non-blocking sends

MPI_Isend(buffer,count,type,dest,tag,comm,request)

Blocking receive

MPI_Recv(buffer,count,type,source,tag,comm,status)

Non-blocking receive

MPI_Irecv(buffer,count,type,source,tag,comm,request)

Table 3-2: Blocking and non-blocking communication

3.2.2 MPI Subroutines


In this section we will discuss the Subroutines of MPI. We will divide this section into two
parts. In the first part we will talk about some basic Subroutines in MPI and in the second
part we will proceed in other important Subroutines of MPI.
3.2.2.1Basic Subroutines
3.2.2.1.1 MPI_Init
Syntax: MPI_Init(int *argc, char ***argv)
Initializes the MPI execution environment, this function must be called in every MPI
program, must be called before any other MPI functions and must be called only once in an
MPI program. For C programs, MPI_Init may be used to pass the command line arguments to
all processes, although this is not required by the standard and is implementation dependent.
78

argc: the argument counter


argv: the argument vector, it is a vector of all the arguments that the MPI will need during the
execution.

3.2.2.1.2 MPI_Comm_size
Syntax: MPI_Comm_size (comm, int &size)
This instruction gets you the number of processes in the specified Communicator
comm: the process Communicator.
&size: integer variable to store the number of processes (the & sign means pass by reference)

3.2.2.1.3 MPI_Comm_rank
Syntax: MPI_Comm_rank (comm,&rank)
This instruction is used to get the number (rank) of the process in the specified
Communicator.
&rank: integer variable to store the rank of the process.

3.2.2.1.4 MPI_Finalize
Syntax: MPI_Finalize()
Terminates the MPI execution environment. This function should be the last MPI routine
called in every MPI program - no other MPI routines may be called after it.

3.2.2.2Other Subroutines
3.2.2.2.1

Point-to-Point Communication Subroutines

MPI point-to-point operations typically involve message passing between two, and only two,
different MPI tasks. One task is performing a send operation and the other task is performing
a matching receive operation. You can refer to Table 3.1 to see the basic Subroutines of
Point-to-Point communication.
Lets discuss each of these Subroutines:
3.2.2.2.1.1 MPI_Send
Syntax: MPI_Send(&buffer,count,type,dest,tag,comm)

79

This is a Blocking send, which means that after calling the MPI_Send Subroutine it wont
return until it checks if the user buffer is empty or not to guarantee that it can send another
data in the user buffer.
buffer: variable to store the data which will be sent.
count: the length of the data
type: the data type (int, char.)
dest: the rank of the destination process
tag: arbitrary integer assigned by the programmer to uniquely identify the message, for send
and receive operation they must have the same tag
comm: the communicator used in the communication process.
As we mentioned previously, the first three arguments are called Data and the others are
called the envelope
3.2.2.2.1.2 MPI_Recv
Syntax: MPI_Recv (&buffer,count,datatype,source,tag,comm)
This Subroutine is a Blocking receive, Blocking has the same concept discussed before in the
send operation
The arguments here are the same as the MPI_Send Subroutine except for source argument
which is used to define the rank of the process which will receive the data.

Note:
We have two other Subroutines which are MPI_ISend and MPI_IRecv. The only difference is
that MPI_ISend means immediate send which means call the MPI_Send subroutine but,
return immediately from it without checking the user buffer to be empty.(the same concept is
applied to MPI_IRecv)

3.2.2.2.2 Collective Communication Routines


Collective communication must involve all processes in the scope of a communicator. All
processes are by default, members in the communicator MPI_COMM_WORLD.

80

3.2.2.2.2.1 MPI_Bcast
Syntax: MPI_Bcast(&buffer,count,datatype,root,comm)
This subroutine broadcasts the message from the process of the rank "root" to all the
processes in the communicator.
buffer: variable to store the message
count: the length of the message
datatype: the type of the message.
root: the rank of the process which broadcasts the message.
comm: the communicator.

3.2.2.2.2.2 MPI_Scatter
Syntax: MPI_Scatter(&sendbuf,sendcnt,sendtype,&recvbuf,recvcnt,recvtype,root,comm)
This subroutine distributes a single message to all the processes in the communicator. To
understand this illustration, take a look at the following figure:

Figure 3-16: MPI_scatter

sendcnt: the length of the sent message


recvcnt: the length of the message to be received.

3.2.2.2.2.3 MPI_Gather
Syntax: MPI_Gather(&sendbuf,sendcnt,sendtype,&recvbuf,recvcount,recvtype,root,comm)
81

Gather a message from different processes into one single process.

Figure 3-17: mpi_gather

3.2.2.2.2.4 MPI_Reduce
Syntax: MPI_Reduce(&sendbuf,&recvbuf,count,datatype,operation,root,comm)
This subroutine does a reduction operation on all the messages and put the result in the
process with rank "root"
Operation: MPI functions that define the operation to be applied on the messages, see the
following table:

Operation

Description

MPI_MAX

Maximum

MPI_MIN

Minimum

MPI_SUM

Summation

MPI_PROD

Product

MPI_LAND

Logical AND

MPI_BAND

bit-wise AND

Table 3-3: MPI reduction operation

82

3.3 Running a code using MPI & testing the cluster.

3.3.1 Creat a new user


You must run MPI programs as a regular user (that is, not root).
Execute:
$ useradd username
$ passwd username
$ rocks sync users
On our cluster we made :
$ useradd commfe
$ passwd commfe (pass word : pdcommfe)
$ rocks sync users

The command ( rocks sync users ) make synchronization for the new user on all nodes (i.e. it
makes directory (commfe) in home).
But sometimes this command does not work , so how to make synchronization.
1. First of all after you make the new user (commfe) change
to this user by executing:
$ su commfe (the OS will not ask for the password as you was the root)
2. When you change to this user for the first time the OS will
ask you for entering the SSH phrase ( we make it
sshcommfe).
3. As we said making new user make new directory use its
name in home (/home)
But if you open (/home) with the Graphical interface you will not find it
but you find it in (/export/home).
So to make this directory appear move to /home when you are commfe
user ,execute:
$ cd /home

83

$ ls
You will find commfe.
4. You must make the previous steps with every node.
A. Change user to commfe
$ su commfe
B. Move to the node.
$ ssh compute-0-0 (for example).
C. Enter the SSH phrase (sshcommfe).
D. Move to /home.
$ cd /home
E. Make sure user directory is made.
$ ls
F. Repeat the previous steps with all nodes.
With these steps we create a new user & make synchronization to
it.

3.3.2 Running the testing code.


Ring-mpi.out is a precompiled code that used to make test our node is ready for parallel
computing.
The Ring-mpi.out code is in /opt/mpi-tests/bin, and if you move to any node you will find it
in the same place. This means that our code that will be computed must be on all nodes in the
same place.
To run the code:
o

Make a file in the your home (/home/commfe) , give it the

name machines for example.


o

Write the name of node you want to run the code on.
compute-0-0
compute-0-1
84

.
o

Make sure that the file machines has the all permeations
(read,write,execute)6

Change to commfe user

Move to its home (you must run when you are standing on

home)
$ cd
o

Run the code:


$ /opt/openmpi/bin mpirun -np 2 -machinefile machines

/opt/mpi-tests/bin/mpi-ring
mpirun is our command so we verify its path opt/openmpi/bin.
-machinefile is an option to make us verify the nodes.
machines the file that contain the name of nodes.
mpi-ring the code that will run ,so we verify its path

/opt/mpi-

tests/bin/mpi-ring
-np 2 this means the number of process (NOT PROCESSORS).
for example:
you make it -np 2 & the run 3 node ,this means that the cluster will use 2
node & 1 node will be idle. As 1 process will run on each node.

3.3.3 Running a satisfiability code


1. Compile the C code first, put the code in any place and move to this place then
execute:
$ mpicc satisfiability_mpi_c.c
2. The output code will be a.out (or any name).

Revise setting permissions P:86

85

3. Now you must this code to all nodes we will use.


4. We put a.out in / to be easy for us then copy it to all node in the same place.
$ scp /a.out root@compute-0-0 /a.out
The OS will ask SSH phrase enter sshroot.
scp:secure copy work when you are root only.
repeat this step with all nodes.
Note : you might use the command cluster-fork to make the same
command on all nodes.
Cluster-fork poweroff
5. Change to commfe user and move to its home
$ su commfe
$ cd
6. Run the code.
ssh-agent $SHELL
ssh-add
$ /opt/openmpi/bin mpirun -np 5 -machinefile machines /a.out

86

3.3.4 Result of running the satisfiability code


The number of cores (processors) on each node:
Compute-0-0,0-1,0-2,0-5 have 2 cores.
Compue-0-3,0-4 have 1 core.
So we can run to 2 processes on the nodes that have 2 cores & 1 process one the node that
has 1 processor.
When running 1 process on each node we have the following results:
Number of

58.11

29.10

19.40

14.53

used nodes
Time in sec

12.64

10.76

Table 3-4:Results of running satisfiablity problem code

We also can have better time as we can increase the no of processes as there still idle cores.

70
60

Time (sec)

50
40
30
20
10
0
1

Number of nodes

Figure 3-18: Results of running satisfiability problem code

87

3.4 PVM
3.4.1 Introduction
Parallel processing, the method of having many small tasks solve one large problem, has
emerged as a key enabling technology in modern computing. The past several years have
witnessed an ever-increasing acceptance and adoption of parallel processing, both for highperformance scientific computing and for more ``general-purpose'' applications, was a result
of the demand for higher performance, lower cost, and sustained productivity. The
acceptance has been facilitated by two major developments: massively parallel processors
(MPPs) and the widespread use of distributed computing.
The Parallel Virtual Machine (PVM) system uses the message-passing model to allow
programmers to exploit distributed computing across a wide variety of computer types,
including MPPs. A key concept in PVM is that it makes a collection of computers appear as
one large virtual machine, hence its name.

3.4.2 PVM Overview


The PVM software provides a unified framework within which parallel programs can be
developed in an efficient and straightforward manner using existing hardware. PVM enables
a collection of heterogeneous computer systems to be viewed as a single parallel virtual
machine. PVM transparently handles all message routing, data conversion, and task
scheduling across a network of incompatible computer architectures.
The PVM system is composed of two parts. The first part is a daemon , called pvmd3 and
sometimes abbreviated pvmd , that resides on all the computers making up the virtual
machine. (An example of a daemon program is the mail program that runs in the background
and handles all the incoming and outgoing electronic mail on a computer.) Pvmd3 is designed
so any user with a valid login can install this daemon on a machine. When a user wishes to
run a PVM application, he first creates a virtual machine by starting up PVM. (Chapter 3
details how this is done.) The PVM application can then be started from a Unix prompt on
any of the hosts.
The PVM system currently supports C, C++, and Fortran languages. This set of language
interfaces have been included based on the observation that the predominant majority of

88

target applications are written in C and Fortran, with an emerging trend in experimenting
with object-based languages and methodologies.
The C and C++ language bindings for the PVM user interface library are implemented as
functions, following the general conventions used by most C systems, including Unix-like
operating systems. To elaborate, function arguments are a combination of value parameters
and pointers as appropriate, and function result values indicate the outcome of the call. In
addition, macro definitions are used for system constants, and global variables such as errno
and pvm_errno are the mechanism for discriminating between multiple possible outcomes.
Application programs written in C and C++ access PVM library functions by linking against
an archival library (libpvm3.a) that is part of the standard distribution.
Below is a simple PVM example:
main()
{
int cc, tid, msgtag;
char buf[100];
printf("i'm t%x\n", pvm_mytid());
cc = pvm_spawn("hello_other", (char**)0, 0, "", 1, &tid);
if (cc == 1) {
msgtag = 1;
pvm_recv(tid, msgtag);
pvm_upkstr(buf);
printf("from t%x: %s\n", tid, buf);
} else
printf("can't start hello_other\n");
pvm_exit();
}
Figure 3-19: PVM program hello.c

Shown in Figure 3-19 is the body of the PVM program hello, a simple example that
illustrates the basic concepts of PVM programming. This program is intended to be invoked
manually; after printing its task id (obtained with pvm_mytid()), it initiates a copy of another
program called hello_other using the pvm_spawn() function. A successful spawn causes the
program to execute a blocking receive using pvm_recv. After receiving the message, the
program prints the message sent by its counterpart, as well its task id; the buffer is extracted
from the message using pvm_upkstr. The final pvm_exit call dissociates the program from
the PVM system.

89

#include "pvm3.h"
main()
{
int ptid, msgtag;
char buf[100];
ptid = pvm_parent();
strcpy(buf, "hello, world from ");
gethostname(buf + strlen(buf), 64);
msgtag = 1;
pvm_initsend(PvmDataDefault);
pvm_pkstr(buf);
pvm_send(ptid, msgtag);
pvm_exit();
}
Figure 3-20: PVM program hello_other.c

Figure 3-20 is a listing of the ``slave'' or spawned program; its first PVM action is to obtain
the task id of the ``master'' using the pvm_parent call. This program then obtains its
hostname and transmits it to the master using the three-call sequence - pvm_initsend to
initialize the send buffer; pvm_pkstr to place a string, in a strongly typed and architectureindependent manner, into the send buffer; and pvm_send to transmit it to the destination
process specified by ptid, ``tagging'' the message with the number 1.

3.4.3 How to Obtain the PVM Software


The latest version of the PVM source code and documentation is always available through
netlib. Netlib is a software distribution service set up on the Internet that contains a wide
range of computer software. Software can be retrieved from netlib by ftp, WWW, xnetlib, or
email.
PVM files can be obtained by anonymous ftp to ftp.netlib.org. Look in directory pvm3.
The file index describes the files in this directory and its subdirectories.
Using a world wide web tool like Xmosaic the PVM files are accessed by using the address
http://www.netlib.org/pvm3/index.html.

Xnetlib is a X-Window interface that allows a user to browse or query netlib for available
software and to automatically transfer the selected software to the user's computer. To get

90

xnetlib send email to netlib@netlib.org with the message send xnetlib.shar from
xnetlib

or anonymous ftp from ftp.netlib.org xnetlib/xnetlib.shar.

The PVM software can be requested by email. To receive this software send email to
netlib@netlib.org

with the message: send index from pvm3. An automatic mail handler

will return a list of available files and further instructions by email. The advantage of this
method is that anyone with email access to Internet can obtain the software.
The PVM software is distributed as a uuencoded, compressed, tar file.

3.4.4 Setup to Use PVM


One of the reasons for PVM's popularity is that it is simple to set up and use. PVM does not
require special privileges to be installed. Anyone with a valid login on the hosts can do so. In
addition, only one person at an organization needs to get and install PVM for everyone at that
organization to use it.
PVM uses two environment variables when starting and running. Each PVM user needs to set
these two variables to use PVM. The first variable is PVM_ROOT , which is set to the
location of the installed pvm3 directory. The second variable is PVM_ARCH , which tells
PVM the architecture of this host and thus what executables to pick from the PVM_ROOT
directory.
The easiest method is to set these two variables in your .cshrc file. We assume you are using
csh as you follow along this tutorial. Here is an example for setting PVM_ROOT:
setenv PVM_ROOT $HOME/pvm3 or export PVM_ROOT $HOME/pvm3 (if using ksh)
It is recommended that the user set PVM_ARCH by concatenating to the file .cshrc,

the

content of file $PVM_ROOT/lib/cshrc.stub. The stub should be placed after PATH and
PVM_ROOT are defined. This stub automatically determines the PVM_ARCH for this host
and is particularly useful when the user shares a common file system (such as NFS) across
several different architectures.
The PVM source comes with directories and makefiles for most architectures you are likely
to have. Building for each architecture type is done automatically by logging on to a host,
going into the PVM_ROOT directory, and typing make. The makefile will automatically
determine which architecture it is being executed on, create appropriate subdirectories, and
build pvm, pvmd3, libpvm3.a, and libfpvm3.a, pvmgs, and libgpvm3.a. It places all these
91

files in $PVM_ROOT/lib/PVM_ARCH, with the exception of pvmgs which is placed in


$PVM_ROOT/bin/PVM_ARCH.

3.4.4.1 Setup Summary

Set PVM_ROOT and PVM_ARCH in your .cshrc file

Build PVM for each architecture type

Create a .rhosts file on each host listing all the hosts you wish to use

Create a $HOME/.xpvm_hosts file listing all the hosts you wish to use prepended by
an ``&''.

3.4.5 Starting PVM


Before we go over the steps to compile and run parallel PVM programs, you should be sure
you can start up PVM and configure a virtual machine. On any host on which PVM has been
installed you can type
% pvm

and you should get back a PVM console prompt signifying that PVM is now running on this
host. You can add hosts to your virtual machine by typing at the console prompt
pvm> add hostname

And you can delete hosts (except the one you are on) from your virtual machine by typing
pvm> delete hostname

If you get the message ``Can't Start pvmd,'' then check the common startup problems section
and try again.
To see what the present virtual machine looks like, you can type
pvm> conf

To see what PVM tasks are running on the virtual machine, you type
pvm> ps -a

Of course you don't have any tasks running yet; that's in the next section. If you type ``quit"
at the console prompt, the console will quit but your virtual machine and tasks will continue
to run. At any Unix prompt on any host in the virtual machine, you can type
% pvm

and you will get the message ``pvm already running" and the console prompt. When you are
finished with the virtual machine, you should type
92

pvm> halt

This command kills any PVM tasks, shuts down the virtual machine, and exits the console.
This is the recommended method to stop PVM because it makes sure that the virtual machine
shuts down cleanly.
You should practice starting and stopping and adding hosts to PVM until you are comfortable
with the PVM console. A full description of the PVM console and its many command options
is given at the end of this chapter.
If you don't want to type in a bunch of host names each time, there is a hostfile option. You
can list the hostnames in a file one per line and then type
% pvm hostfile

PVM will then add all the listed hosts simultaneously before the console prompt appears.
Several options can be specified on a per-host basis in the hostfile . These are described at the
end of this chapter for the user who wishes to customize his virtual machine for a particular
application or environment.
There are other ways to start up PVM. The functions of the console and a performance
monitor have been combined in a graphical user interface called XPVM , which is available
precompiled on netlib (see Chapter 8 for XPVM details). If XPVM has been installed at your
site, then it can be used to start PVM. To start PVM with this X window interface, type
% xpvm

The menu button labled ``hosts" will pull down a list of hosts you can add. If you click on a
hostname, it is added and an icon of the machine appears in an animation of the virtual
machine. A host is deleted if you click on a hostname that is already in the virtual machine.
On startup XPVM reads the file $HOME/.xpvm_hosts, which is a list of hosts to display in
this menu. Hosts without leading ``\&" are added all at once at startup.
The quit and halt buttons work just like the PVM console. If you quit XPVM and then restart
it, XPVM will automatically display what the running virtual machine looks like. Practice
starting and stopping and adding hosts with XPVM. If there are errors, they should appear in
the window where you started XPVM.

93

3.4.6 Common Startup Problems


If PVM has a problem starting up, it will print an error message either to the screen or in the
log file /tmp/pvml.<uid>. This section describes the most common startup problems and
how to solve them. Chapter 9 contains a more complete troubleshooting guide.
If the message says
[t80040000] Can't start pvmd

first check that your .rhosts file on the remote host contains the name of the host from
which you are starting PVM. An external check that your .rhosts file is set correctly is to
type
% rsh remote_host ls
If your .rhosts is set up

correctly, then you will see a listing of your files on the remote

host.
Other reasons to get this message include not having PVM installed on a host or not having
PVM_ROOT set correctly on some host. You can check these by typing
% rsh remote_host $PVM_ROOT/lib/pvmd

Some Unix shells, for example ksh, do not set environment variables on remote hosts when
using rsh. In PVM 3.3 there are two work arounds for such shells. First, if you set the
environment variable, PVM_DPATH, on the master host to pvm3/lib/pvmd, then this will
override the default dx path. The second method is to tell PVM explicitly were to find the
remote pvmd executable by using the dx= option in the hostfile.
If PVM is manually killed, or stopped abnormally (e.g., by a system crash), then check for
the existence of the file /tmp/pvmd.<uid>. This file is used for authentication and should
exist only while PVM is running. If this file is left behind, it prevents PVM from starting.
Simply delete this file.
If the message says
[t80040000] Login incorrect

it probably means that no account is on the remote machine with your login name. If your
login name is different on the remote machine, then you must use the lo= option in the
hostfile.

94

If you get any other strange messages, then check your .cshrc file. It is important that you
not have any I/O in the .cshrc file because this will interfere with the startup of PVM. If you
wish to print out information (such as who or uptime) when you log in, you should do it in
your .login script, not when you're running a csh command script.

3.4.7 Running PVM Programs


In this section you'll learn how to compile and run PVM programs. In this section we will
work with the example programs supplied with the PVM software. These example programs
make useful templates on which to base your own PVM programs.
The first step is to copy the example programs into your own area:
% cp -r $PVM_ROOT/examples $HOME/pvm3/examples
% cd $HOME/pvm3/examples

The examples directory contains a Makefile.aimk and Readme file that describe how to build
the examples. PVM supplies an architecture-independent make, aimk, that automatically
determines PVM_ARCH and links any operating system specific libraries to your
application. aimk was automatically added to your $PATH when you placed the cshrc.stub
in your .cshrc file. Using aimk allows you to leave the source code and makefile unchanged
as you compile across different architectures.
The master/slave programming model is the most popular model used in distributed
computing. (In the general parallel programming arena, the SPMD model is more popular.)
To compile the master/slave C example, type
% aimk master slave

If you prefer to work with Fortran, compile the Fortran version with
% aimk fmaster fslave

Depending on the location of PVM_ROOT, the INCLUDE statement at the top of the Fortran
examples may need to be changed. If PVM_ROOT is not HOME/pvm3, then change the
include to point to $PVM_ROOT/include/fpvm3.h. Note that PVM_ROOT is not expanded
inside the Fortran, so you must insert the actual path.
The makefile moves the executables to $HOME/pvm3/bin/PVM_ARCH, which is the default
location PVM will look for them on all hosts. If your file system is not common across all

95

your PVM hosts, then you will have to build or copy (depending on the architectures) these
executables on all your PVM hosts.
Now, from one window, start PVM and configure some hosts. These examples are designed
to run on any number of hosts, including one. In another window cd to
$HOME/pvm3/bin/PVM_ARCH

and type

% master

The program will ask how many tasks. The number of tasks does not have to match the
number of hosts in these examples. Try several combinations.
The first example illustrates the ability to run a PVM program from a Unix prompt on any
host in the virtual machine. This is just like the way you would run a serial a.out program on
a workstation. In the next example, which is also a master/slave model called hitc, you will
see how to spawn PVM jobs from the PVM console and also from XPVM.
hitc

illustrates dynamic load balancing using the pool-of-tasks paradigm. In the pool of tasks

paradigm, the master program manages a large queue of tasks, always sending idle slave
programs more work to do until the queue is empty. This paradigm is effective in situations
where the hosts have very different computational powers, because the least loaded or more
powerful hosts do more of the work and all the hosts stay busy until the end of the problem.
To compile hitc, type
% aimk hitc hitc_slave

Since hitc does not require any user input, it can be spawned directly from the PVM
console. Start up the PVM console and add a few hosts. At the PVM console prompt type
pvm> spawn -> hitc

The ``->" spawn option causes all the print statements in hitc and in the slaves to appear in
the console window. This feature can be useful when debugging your first few PVM
programs. You may wish to experiment with this option by placing print statements in hitc.f
and hitc_slave.f and recompiling.
hitc

can be used to illustrate XPVM's real-time animation capabilities. Start up XPVM and

build a virtual machine with four hosts. Click on the ``tasks" button and select ``spawn" from
the menu. Type ``hitc" where XPVM asks for the command, and click on ``start". You will
see the host icons light up as the machines become busy. You will see the hitc_slave tasks get
96

spawned and see all the messages that travel between the tasks in the Space Time display.
Several other views are selectable from the XPVM ``views" menu. The ``task output" view is
equivalent to the ``->" option in the PVM console. It causes the standard output from all tasks
to appear in the window that pops up.
There is one restriction on programs that are spawned from XPVM (and the PVM console).
The programs must not contain any interactive input, such as asking for how many slaves to
start up or how big a problem to solve. This type of information can be read from a file or put
on the command line as arguments, but there is nothing in place to get user input from the
keyboard to a potentially remote task.

3.4.8 PVM Console Details


The PVM console, called pvm, is a stand-alone PVM task that allows the user to interactively
start, query, and modify the virtual machine. The console may be started and stopped
multiple times on any of the hosts in the virtual machine without affecting PVM or any
applications that may be running.
When started, pvm determines whether PVM is already running; if it is not, pvm automatically
executes pvmd on this host, passing pvmd the command line options and hostfile. Thus PVM
need not be running to start the console.
pvm [-n<hostname>] [hostfile]

The -n option is useful for specifying an alternative name for the master pvmd (in case
hostname doesn't match the IP address you want). Once PVM is started, the console prints the
prompt
pvm>

and accepts commands from standard input. The available commands are
add: followed by one or more host names, adds these hosts to the virtual machine.
Alias: defines or lists command aliases.
Conf: lists the configuration of the virtual machine including hostname, pvmd task ID,
architecture type, and a relative speed rating.
Delete: followed by one or more host names, deletes these hosts from the virtual machine.
PVM processes still running on these hosts are lost.
97

Echo: echo arguments.


Halt: kills all PVM processes including console, and then shuts down PVM. All daemons
exit.
Help: can be used to get information about any of the interactive commands. Help may be
followed by a command name that lists options and flags available for this command.
Id: prints the console task id.
Jobs: lists running jobs.
Kill: can be used to terminate any PVM process.
Mstat: shows the status of specified hosts.
ps a: lists all processes currently on the virtual machine, their locations, their task id's, and
their parents' task id's.
pstat: shows the status of a single PVM process.
Quit: exits the console, leaving daemons and PVM jobs running.
Reset: kills all PVM processes except consoles, and resets all the internal PVM tables and
message queues. The daemons are left in an idle state.
Setenv: displays or sets environment variables.
Sig: followed by a signal number and TID, sends the signal to the task.
Spawn: starts a PVM application. Options include the following:
-count: number of tasks; default is 1.
-host: spawn on host; default is any.
-ARCH: spawn of hosts of type ARCH.
-?: enable debugging.
->: redirect task output to console.
->file: redirect task output to file.
98

->>file: redirect task output append to file.


-@: trace job, display output on console
-@file: trace job, output to file
Trace: sets or displays the trace event mask.
Unalias: undefines command alias.
Version: prints version of PVM being used.
The console reads $HOME/.pvmrc before reading commands from the tty, so
you can do things like
alias ? help
alias h help
alias j jobs
setenv PVM_EXPORT DISPLAY
# print my id
echo new pvm shell
id

PVM supports the use of multiple consoles. It is possible to run a console on any host in an
existing virtual machine and even multiple consoles on the same machine. It is also possible
to start up a console in the middle of a PVM application and check on its progress.

3.4.9 Errors we faced


After starting PVM console by typing PVM in the frontend, the PVM console starts
normally:
pvm>

we tried to add the first host (compute-0-0) so we type:


root@cluster.hpc.org# add compute-0-0
A permission denied message appears:
add compute-0-0
0 successful
HOST

DTID

lucifer Can't start pvmd

Auto-Diagnosing Failed Hosts...

99

Compute-0-0...
Verifying Local Path to "rsh"...
Rsh found in /usr/bin/rsh - O.K.
Testing Rsh/Rhosts Access to Host "compute-0-0"...
Rsh/Rhosts Access FAILED - "compute-0-0: Connection refused"
Connect to compute-0-0 port 543: Connection refused
Trying krb4 rlogin
Connect to 10.255.255.254 port 543: Connection refused
Trying normal rlogin
(/usr/bin/rlogin)

The connection fails, by searching in /etc/xinetd.d, you shall find rsh and rlogin files, when
opening them we found that disabled=yes, so we changed it to disabled=no, in both files to
enable the 2 services to be enabled.
Trying to connect again, we found the same error message.
When logging into compute-0-0 we found that rsh and rlogin are not installed, so we installed
rsh-server and xinetd which is needed by rsh on the compute node, unfortunately the
connection failed again.
At last we found out that Rocks by default disables the rsh and rlogin services, they are
replaced by ssh, which is safer.
So to enable rsh do the following steps:
1-

# cp /state/partition1/home/install/rocksdist/lan/i386/build/graphs/default/base-rsh.xml \
/state/partition1/home/install/site-profiles/4.3/graphs/default/

2- When opening base-rsh.xml file from the new location you shall find:
<!-- Uncomment to enable RSH on your cluster
<edge from="client">
<to>xinetd</to>
<to>rsh</to>
</edge>
-->

3-

Follow the instruction and uncomment this block. This will force all appliance

types that reference the client class (compute nodes, nas nodes, ...) to enable an rsh
service that trusts all hosts on the private side network. This uncommented block should
look like this:
100

<edge from="client">
<to>xinetd</to>
<to>rsh</to>
</edge>

4 - To apply your customized configuration scripts to compute nodes,


rebuild the distribution:
# cd /home/install
# rocks-dist dist

Then, reinstall your compute nodes, and it should work.


Note: rlogin and rsh tries first to do authentication through Kerberos
authentication it tries Kerberos version 5 if fails tries Kerberos
version 4 if it failed it tries normal rlogin, to use Kerberos 5
authentication put in clients home, a file called k5login, edit the
file so that it contains the user authorized to access this computer, it
should be in the form: principleuser/instance@realm
For more info visit: http://web.mit.edu/kerberos/

3.4.10 Host File Options


As we stated earlier, only one person at a site needs to install PVM, but each PVM user can
have his own hostfile, which describes his own personal virtual machine.
The hostfile

defines the initial configuration of hosts that PVM combines into a virtual

machine. It also contains information about hosts that you may wish to add to the
configuration later.
The hostfile in its simplest form is just a list of hostnames one to a line. Blank lines are
ignored, and lines that begin with a # are comment lines. This allows you to document the
hostfile and also provides a handy way to modify the initial configuration by commenting out
various hostnames (see Figure 3-21).
# configuration used for my run
sparky
azure.epm.ornl.gov
thud.cs.utk.edu
sun4
Figure 3-21: Simple hostfile listing virtual machine configuration

Several options can be specified on each line after the hostname. The options are separated by
white space.
101

lo= userid
allows you to specify an alternative login name for this host; otherwise, your login
name on the start-up machine is used.
so=pw
will cause PVM to prompt you for a password on this host. This is useful in the cases
where you have a different userid and password on a remote system. PVM uses rsh by
default to start up remote pvmd's, but when pw is specified, PVM will use rexec()
instead.
dx= location of pvmd
allows you to specify a location other than the default for this host. This is useful if
you want to use your own personal copy of pvmd,
ep= paths to user executables
allows you to specify a series of paths to search down to find the requested files to
spawn on this host. Multiple paths are separated by a colon. If ep= is not specified,
then PVM looks in $HOME/pvm3/bin/PVM_ARCH for the application tasks.
sp= value
specifies the relative computational speed of the host compared with other hosts in the
configuration. The range of possible values is 1 to 1000000 with 1000 as the default.
bx= location of debugger
specifies which debugger script to invoke on this host if debugging is requested in the
spawn

routine.

Note: The environment variable PVM_DEBUGGER can also be set. The default
debugger is pvm3/lib/debugger.
wd= working_directory
specifies a working directory in which all spawned tasks on this host will execute.
The default is $HOME.
ip= hostname
102

specifies an alternate name to resolve to the host IP address.


so=ms
specifies that a slave pvmd will be started manually on this host. This is useful if rsh
and rexec network services are disabled but IP connectivity exists. When using this
option you will see in the tty of the pvmd3
[t80040000] ready
Fri Aug 27 18:47:47 1993
*** Manual startup ***
Login to "honk" and type:
pvm3/lib/pvmd -S -d0 -nhonk 1 80a9ca95:0cb6 4096 2
80a95c43:0000
Type response:

On honk, after typing the given line, you should see


ddpro<2312> arch<ALPHA> ip<80a95c43:0a8e> mtu<4096>

which you should relay back to the master pvmd. At that point, you will see
Thanks

and the two pvmds should be able to communicate.


If you want to set any of the above options as defaults for a series of hosts, you can place
these options on a single line with a * for the hostname field. The defaults will be in effect for
all the following hosts until they are overridden by another set-defaults line.
Hosts that you don't want in the initial configuration but may add later can be specified in the
hostfile by beginning those lines with an &. An example hostfile displaying most of these
options is shown in (Figure 3-22).
# Comment lines start with a # (blank lines ignored)
gstws
ipsc dx=/usr/geist/pvm3/lib/I860/pvmd3
ibm1.scri.fsu.edu lo=gst so=pw
# set default options for following hosts with *
* ep=$sun/problem1:~/nla/mathlib
sparky
#azure.epm.ornl.gov
midnight.epm.ornl.gov
# replace default options with new values
* lo=gageist so=pw ep=problem1
thud.cs.utk.edu
speedy.cs.utk.edu
# machines for adding later are specified with &

103

# these only need listing if options are required


&sun4
ep=problem1
&castor dx=/usr/local/bin/pvmd3
&dasher.cs.utk.edu lo=gageist
&elvis dx=~/pvm3/lib/SUN4/pvmd3
Figure 3-22: PVM hostfile illustrating customizing options

More details about PVM, visit: http://www.netlib.org/pvm3/book/pvm-book.html

104

References
[1]

https://computing.llnl.gov/tutorials/mpi/#Collective_Communication_Ro utines

[2]

Practical MPI programming (Yukiya Aoyama, Jun Nakano)

[3]

Parallel Scientific Computing in C++ and MPI - George Em Karniadakis and


Robert M. Kirby II - Cambridge University Press

[4]

High performance linux cluster with Oscar ,Rocks, openmosix and MPI by joseph
D. sloan

[5]

http://www.cm.cf.ac.uk/Parallel/Year2/

[6]

An Introduction to Parallel Programming Tobias Wittwer VSSD

[7]

http://en.wikipedia.org/wiki/Gustafson's_Law

[8]

www.gnome.org

[9]

www.kde.org

[10]

www.linuxreviews/software/desktops

[11]

www.wikipedia.org

[12]

www.rocksclusters.org

[13]

http://www.netlib.org/pvm3/book/pvm-book.html

[14]

Scilab help file

[15]

http://web.mit.edu/kerberos/

105

Appendix A: Linux commands


Command

Job

halt

shut down system

init0

shutdown system

Reboot

restart system

locate missingfilename

Find a file called missingfilename

which missingfilename

Show the subdirectory containing the executable


file called missingfilename

ls l

List files in current directory using long format

rm name

Remove a file or directory called name

rm -rf name

Kill off an entire directory and all its includes


files and subdirectories

cat filetoview

Display the file called filetoview

cp filename /home/dirname

Copy the file called filename to the /home/dirname


directory

mv filename /home/dirname

Move the file called filename to the


/home/dirname directory

tail filetoview

Display the last 10 lines of the file called


filetoview

head filetoview

Display the first 10 lines of the file called


filetoview

tar -zxvf archive.tar.gz

Decompress the files contained in the zipped and


tarred archive called archive

tar -jxvf archive.gz2

Decompress the files contained in the zipped and


tarred archive called archive

106

./configure

Execute the script preparing the installed files for


compiling

adduser accountname

Create a new user call accountname

passwd accountname

Give accountname a new password

su

Log in as superuser from current login

exit

Stop being superuser and revert to normal user

chmod 7 5 5 filename

Full permission for the owner, read and execute


access for the group and others
Where; Read=4,write=2,excute=1.

107

Appendix B: list of the open source tools in Cbench


* Intel MPI Benchmarks (IMB).
* IOR .
* Iozone .
* b_eff .
* mpi_examples and perftest .
* MPICH .
* netperf - .
* NAS Parallel Benchmarks.
* Pallas MPI Benchmarks.
* Presta Benchmarks from ASCI Purple Benchmarks.
* STREAMS.
* OSU MPI Benchmarks .
* LLCbench - Low Level Architectural Characterization Benchmark .
* PSNAP - PAL System Noise Activity Program .
* CTCS - Cerberus Test Control System .
* HPCC - HPC Challenge Benchmark .
* HP Linpack http.
* Memtester - userspace memory tester .
* PIOB - Sandia parallel file i/o testing utility
* Stress - a simple tool that imposes certain types of compute stress on UNIX-like operating
systems .
* Software that is intrinsically part of Cbench
* 300MB - 300 megabyte MPI executable for stressing MPI launches
* fpck - floating point correctness diagnostic
* mpi_routecheck - serialized N**2 MPI communication sanity checker
* rotate, rotate_latency - Rotate is a cross-sectional bandwidth/latency benchmark
developed at Sandia National Labs. It is useful for measuring how well an interconnect
makes advantage of the available hardware bandwidth at the MPI level. Specifically, this
benchmark is useful to see the effects of static routing in fully connected CLOS network.
* HA4, MP3, RS5, RS12
* mpi_hello
108

* mpi_latency
* mpi_overhead - simple MPI memory overhead measurer
* mpi_slowcpu
* mpi_tokensmash
* word9
* LAMMPS .
* MPQC .
To find the locations where you can get source code of each package check the
documentation downloaded with Cbench.

109

Appendix C: PVM functions in Scilab


AdCommunications
pvm

advanced communication toolbox for parallel programming

communications with other applications using Parallel Virutal Machine

pvm_addhosts

add hosts to the virtual machine.

pvm_barrier blocks the calling process until all processes in a group have called it.
pvm_bcast

broacasts a message to all members of a group

pvm_bufinfo

Returns information about a message buffer.

pvm_config sends a message


pvm_delhosts

deletes hosts from the virtual machine.

pvm_error Prints message describing an error returned by a PVM call


tells the local pvmd that this process is leaving PVM.

pvm_exit

pvm_f772sci

Convert a F77 complex into a complex scalar

pvm_get_timer Gets the system's notion of the current time.


pvm_getinst

returns the instance number in a group of a PVM process.

pvm_gettid

returns the tid of the process identified by a group name and instance number

pvm_gsize

returns the number of members presently in the named group.

pvm_halt

stops the PVM daemon

pvm_joingroup
pvm_kill

enrolls the calling process in a named group.

Terminates a specified PVM process.

pvm_lvgroup

Unenrolls the calling process from a named group.

pvm_mytid

returns the tid of the calling process.

pvm_parent

tid of the process that spawned the calling process.

pvm_probe Check if message has arrived.


pvm_recv

receive a message.

pvm_reduce Performs a reduce operation over members of the specified group


pvm_sci2f77
pvm_send

Convert complex scalar into F77

immediately sends (or multicast) data.

pvm_set_timer Sets the system's notion of the current time.


pvm_spawn

Starts new Scilab processes.


110

pvm_spawn_independent

Starts new PVM processes.

pvm_start

Start the PVM daemon

pvm_tasks

information about the tasks running on the virtual machine.

pvm_tidtohost
pvmd3

returns the host of the specified PVM process.

PVM daemon

111

Appendix D: The H.W and the budget.


description

# of units

# unit price

total

mouse
keyboard
mouse USB
keyboard USB

2
2
1
1

15
15
20
30

30
30
20
30

master node:
CPU 2.6 core 2 duo
RAM DDR2 2GB
case DR66
Motherboard MSI
DVD ROM
LAN CARD
TOTAL MASTER NODE:

1
1
1
1
1
2

645
105
110
275
90
20

645
105
110
275
90
40
1265

TYP1:
COMPAQ
CPU 2.4 CASH
HE
512
512
H.D 40 GB
RAM 256
DDR1

410

820

TYP2:
DELL
CPU 2.8 CASHE 1 M
H.D 40 GB
RAM 1 GB
DDR2

625

1250

COMPUTING NODES

TOTAL:

3445

TYP3
DELL
CPU 2.8 CASHE
HE1512
M
H.D 40 GB
RAM 1 GB
DDR1

500

1500

RAM 1G
RAM 1 GB

1
1

130
60

130
60

1.3

10.4

cables

DDR1
DDR1
8m

TOTAL ALL

5145.4

I/P

7000

BUDGET

1854.6

112

Potrebbero piacerti anche