Sei sulla pagina 1di 15

PARALLEL AND DISTRIBUTED ALGORITHMS IMPORTANT QUESTIONS-> Q1.

Differentiate between parallel and distributed computing system with appropriate diagrams?

DIAGRAMS OF DISTRIBUTED AND PARALLEL SYSTEM:-

(a)(b) A Distributed System. (c) A Parallel System. Q2. Explain different stages of designing parallel algorithms? Designing Parallel Algorithms Parallel algorithm design is not easily reduced to simple recipes. Goal is to suggest a framework within which parallel algorithm design can be explored. In the process, we hope to develop intuition as to what constitutes a good parallel algorithm. Efficiency Scalability Partitionioning computations:

Some issues in the design of parallel algorithms


Domain Decomposition

Functional Decomposition Techniques

Locality Synchronous and Asynchronous communication Agglomeration as a means of reducing communication Load-balancing strategies.

STAGES OF DESIGNING PARALLEL :1. Methodical Design

This methodology structures the design process as four distinct stages:


Partitioning, Communication, Agglomeration, and Mapping. third and fourth stages deal with locality and other performancerelated issues.

In the first two stages, we focus on concurrency and scalability. The

Partitioning.

Decompose the computation and the data operated on by this

computation into small tasks. Practical issues such as the number of processors in the target computer are ignored. Focus on recognizing opportunities for parallel execution. Communication The communication required to coordinate task execution is determined. Appropriate communication structures and algorithms are defined.

Agglomeration The task and communication structures defined in the first two stages of a design are evaluated with respect to
Performance requirements

Implementation costs. If necessary, tasks are combined into larger tasks to improve performance or reduce development costs. Mapping

Each task is assigned to a processor in a manner that

attempts to satisfy the competing goals of


Maximizing processor utilization and Minimizing communication costs.

Mapping can be specified statically or determined at runtime by load-balancing algorithms. DIAGRAM:-

Figure 2.1: PCAM: a design methodology for parallel programs. Starting with a problem specification, we develop a partition, determine communication requirements, agglomerate tasks, and finally map tasks to processors. 1. Partitioning The partitioning stage of a design is intended to expose opportunities for parallel execution. The focus is on defining a large number of small tasks in order to yield what is termed a fine-grained decomposition of a problem. In later design stages, evaluation of communication requirements, the target architecture, or software engineering issues may lead us to forego opportunities for parallel execution identified at this stage. In this first stage we avoid prejudging alternative partitioning strategies. A good partition divides into small pieces both the

Computation associated with a problem and The data on which this computation operates.

Programmers most commonly first focus on the data associated with a problem:

first determine an appropriate partition for the data, and finally work out how to associate computation with data.

This partitioning technique is termed domain decomposition.

1. Domain Decomposition We divide these data into small pieces of approximately equal size. Next, we partition the computation that is to be performed.This partitioning yields a number of tasks, each comprising some data and a set of operations on that data. Typically, communication is required to move data between tasks. This requirement is addressed in the next phase of the design process. Example: Domain decomposition is a problem involving a three-dimensional grid. Computation is performed repeatedly on each grid point. In the early stages of a design, we favour the most aggressive decomposition possible, which in this case defines one task for each grid point.

Figure 2.2: Domain decompositions for a problem involving a threedimensional grid. One-, two-, and three-dimensional decompositions are possible; in each case, data associated with a single task are shaded. A three-dimensional decomposition offers the greatest flexibility and is adopted in the early stages of a design. 1. Functional Decomposition Domain decomposition forms the foundation for most parallel algorithms. Functional decomposition is valuable as a different way of thinking about problems. A focus on the computations that are to be performed can sometimes reveal structure in a problem that would not be obvious from a study of data alone.

Figure 2.3: Functional decomposition in a computer model of climate. Each model component can be thought of as a separate task, to be parallelized by domain decomposition. Arrows represent exchanges of data between components during computation: the atmosphere model generates wind velocity data that are used by the ocean model, the ocean model generates sea surface temperature data that are used by the atmosphere model, and so on. Q3. Write an algorithm for list ranking ? LIST-RANK(L) (in O(lg n) time)
1. for each processor i, in parallel 2. do if next[i]=nil 3. then d[i]0 4. else d[i]1 5. while there exists an object i such that next[i] nil 6. do for each processor i, in parallel 7. do if next[i] nil 8. then d[i] d[i]+ d[next[i]] 9. next[i] next[next[i]]

Q4. Describe Flynns classification of parallel computer with suitable diagrams? Flynn's Classical Taxonomy There are different ways to classify parallel computers. One of the more widely used classifications, in use since 1966, is called Flynn's Taxonomy. Flynn's taxonomy distinguishes multi-processor computer architectures according to how they can be classified along the two independent dimensions of Instruction and Data. Each of these dimensions can have only one of two possible states: Single or Multiple. The matrix below defines the 4 possible classifications according to Flynn:

SISD Single Instruction, Single Data MISD Multiple Instruction, Single Data

SIMD Single Instruction, Multiple Data MIMD Multiple Instruction, Multiple Data

Single Instruction, Single Data (SISD): A serial (non-parallel) computer Single Instruction: Only one instruction stream is being acted on by the CPU during any one clock cycle Single Data: Only one data stream is being used as input during any one clock cycle Deterministic execution This is the oldest and even today, the most common type of computer Examples: older generation mainframes, minicomputers and workstations; most modern day PCs.

Single Instruction, Multiple Data (SIMD):


A type of parallel computer Single Instruction: All processing units execute the same instruction at any given clock cycle Multiple Data: Each processing unit can operate on a different data element Best suited for specialized problems characterized by a high degree of regularity, such as graphics/image processing. Synchronous (lockstep) and deterministic execution Two varieties: Processor Arrays and Vector Pipelines Examples:

Processor Arrays: Connection Machine CM-2, MasPar MP-1 & MP-2, ILLIAC IV

Vector Pipelines: IBM 9000, Cray X-MP, Y-MP & C90, Fujitsu VP, NEC SX-2, Hitachi S820, ETA10 Most modern computers, particularly those with graphics processor units (GPUs) employ SIMD instructions and execution units.

Multiple Instruction, Single Data (MISD):


A type of parallel computer Multiple Instruction: Each processing unit operates on the data independently via separate instruction streams. Single Data: A single data stream is fed into multiple processing units. Few actual examples of this class of parallel computer have ever existed. One is the experimental Carnegie-Mellon C.mmp computer (1971). Some conceivable uses might be: multiple frequency filters operating on a single signal stream

multiple cryptography algorithms attempting to crack a single coded message.

Multiple Instruction, Multiple Data (MIMD):


A type of parallel computer Multiple Instruction: Every processor may be executing a different instruction stream Multiple Data: Every processor may be working with a different data stream Execution can be synchronous or asynchronous, deterministic or non-deterministic Currently, the most common type of parallel computer - most modern supercomputers fall into this category. Examples: most current supercomputers, networked parallel computer clusters and "grids", multi-processor SMP computers, multi-core PCs. Note: many MIMD architectures also include SIMD execution subcomponents

Q5. Describe the parallel computing memory architecture? Parallel Computing Memory Architecture:Shared Memory General Characteristics: Shared memory parallel computers vary widely, but generally have in common the ability for all processors to access all memory as global address space.

Multiple processors can operate independently but share the same memory resources. Changes in a memory location effected by one processor are visible to all other processors. Shared memory machines can be divided into two main classes based upon memory access times: UMA and NUMA

Uniform Memory Access (UMA):

Most commonly represented today by Symmetric Multiprocessor (SMP) machines Identical processors Equal access and access times to memory Sometimes called CC-UMA - Cache Coherent UMA. Cache coherent means if one processor updates a location in shared memory, all the other processors know about the update. Cache coherency is accomplished at the hardware level.

Non-Uniform Memory Access (NUMA):

Often made by physically linking two or more SMPs One SMP can directly access memory of another SMP Not all processors have equal access time to all memories Memory access across link is slower If cache coherency is maintained, then may also be called CC-NUMA - Cache Coherent NUMA

Shared Memory (NUMA) Advantages: Global address space provides a user-friendly programming perspective to memory Data sharing between tasks is both fast and uniform due to the proximity of memory to CPUs

Disadvantages: Primary disadvantage is the lack of scalability between memory and CPUs. Adding more CPUs can geometrically increases traffic on the shared memory-CPU path, and for cache coherent systems, geometrically increase traffic associated with cache/memory management. Programmer responsibility for synchronization constructs that ensure "correct" access of global memory. Expense: it becomes increasingly difficult and expensive to design and produce shared memory machines with ever increasing numbers of processors.

Parallel Computer Memory Architectures Distributed Memory General Characteristics: Like shared memory systems, distributed memory systems vary widely but share a common characteristic. Distributed memory systems require a communication network to connect inter-processor memory.

Processors have their own local memory. Memory addresses in one processor do not map to another processor, so there is no concept of global address space across all processors. Because each processor has its own local memory, it operates independently. Changes it makes to its local memory have no effect on the memory of other processors. Hence, the concept of cache coherency does not apply. When a processor needs access to data in another processor, it is usually the task of the programmer to explicitly define how and when data is communicated. Synchronization between tasks is likewise the programmer's responsibility. The network "fabric" used for data transfer varies widely, though it can can be as simple as Ethernet. Advantages: Memory is scalable with the number of processors. Increase the number of processors and the size of memory increases proportionately. Each processor can rapidly access its own memory without interference and without the overhead incurred with trying to maintain cache coherency. Cost effectiveness: can use commodity, off-the-shelf processors and networking.

Disadvantages:

The programmer is responsible for many of the details associated with data communication between processors. It may be difficult to map existing data structures, based on global memory, to this memory organization. Non-uniform memory access (NUMA) times

Q6. Differentiate between: (i) UMA and NUMA (ii) Shared Memory and Distributed Memory

Shared Memory General Characteristics: Shared memory parallel computers vary widely, but generally have in common the ability for all processors to access all memory as global address space. Multiple processors can operate independently but share the same memory resources. Changes in a memory location effected by one processor are visible to all other processors. Shared memory machines can be divided into two main classes based upon memory access times: UMA and NUMA

Distributed Memory General Characteristics: Like shared memory systems, distributed memory systems vary widely but share a common characteristic. Distributed memory systems require a communication network to connect inter-processor memory.

Processors have their own local memory. Memory addresses in one processor do not map to another processor, so there is no concept of global address space across all processors. Because each processor has its own local memory, it operates independently. Changes it makes to its local memory have no effect on the memory of other processors. Hence, the concept of cache coherency does not apply. When a processor needs access to data in another processor, it is usually the task of the programmer to explicitly define how and when data is communicated. Synchronization between tasks is likewise the programmer's responsibility.

The network "fabric" used for data transfer varies widely, though it can be as simple as Ethernet.

Q7. What do you mean by binary search? Write an algorithm for binary search? In computer science, a binary search or half-interval search algorithm finds the position of a specified value (the input "key") within a sorted array. At each stage, the algorithm compares the input key value with the key value of the middle element of the array. If the keys match, then a matching element has been found so its index, or position, is returned. Otherwise, if the sought key is less than the middle element's key, then the algorithm repeats its action on the subarray to the left of the middle element or, if the input key is greater, on the sub-array to the right. If the remaining array to be searched is reduced to zero, then the key cannot be found in the array and a special "Not found" indication is returned. A binary search halves the number of items to check with each iteration, so locating the item (or determining its absence) takes logarithmic time.

BinarySearch(A[0..N-1], value, low, high) { if (high < low) return -1 // not found mid = low + (high - low) / 2 if (A[mid] > value) return BinarySearch(A, value, low, mid-1) else if (A[mid] < value) return BinarySearch(A, value, mid+1, high) else return mid // found }

Q9. What is Parallel Computing?

Traditionally, software has been written for serial computation:

To be run on a single computer having a single Central Processing Unit (CPU); A problem is broken into a discrete series of instructions. Instructions are executed one after another. Only one instruction may execute at any moment in time.

In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem: To be run using multiple CPUs A problem is broken into discrete parts that can be solved concurrently Each part is further broken down to a series of instructions Instructions from each part execute simultaneously on different CPUs

ee For example: The computer resources might be: A single computer with multiple processors; An arbitrary number of computers connected by a network; A combination of both. The computational problem should be able to:

Be broken apart

into discrete pieces of work that can be solved simultaneously; Execute multiple program instructions at any moment in time; Be solved in less time with multiple compute resources than with a single compute resource. Distributed computing Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal. A computer program that runs in a distributed system is called a distributed program, and distributed programming is the process of writing such programs. Distributed computing also refers to the use of distributed systems to solve computational problems. In distributed computing, a problem is divided into many tasks, each of which is solved by one or more computers.

Q10.COMPLEXITY MEASURES: The three basic aims of complexity theory are:

1. Introducing a notation to specify complexity: the first aim is to

introduce a mathematical notation to specify the functional relationship between the input size of the problem and the compution of computational resources, eg, computational time and memory space. 2. Choice of machine model to standardize the measures: The second aim is to specify an underlying machine model to prescribe an associated set of measures for the consumption of resources. These measures are standardized so that these are invariant for possible variations in algorithm. 3. Refinement of the meansures for parallel computation: having obtained the turing measures,the third aim is to understahnd how fast we cans solve certain problems when a large number of processors are put together to work in parallel.

SOURCE:INTERNET

Potrebbero piacerti anche