Sei sulla pagina 1di 29

Parallel Patterns

Polina Vyshnevska Oleh Mizov 2011

Content
The Basics Parallel Patterns: Tips & Tricks The Dessert: Debugging Tools

Multithreading Basics

What is Thread?
Has Has

How it Works
OS

own stack own state (context) independently

slices short CPU time for each thread creates an illusion of simultaneous

That

Executes

execution

Synchronization

Condition synchronization When one thread needs to notify one or more other threads that a specific task has been completed Mutual exclusion When multiple threads access a shared resource in such a way that the resource does not become corrupt

How Do We Do Synchronization?
Critical Sections Conditional Variables

Interlocked functions

Waiting functions

Kernel objects

Slim R/W Locks

Timers

One-Time initialization

DeadLocks
Deadlock is a situation where two or more threads are waiting for the other to finish and thus neither ever does

It might be caused by:


Mutual exclusion. Only a limited number of threads may utilize a resource concurrently. Hold and wait. A thread holding a resource requests access to other resources and waits it. No preemption. Resources are released only voluntarily by the thread holding the resource. Circular wait. Several threads in a chain where each thread waits for a resource that holds the next thread.

Parallel Loop
for (int i = 0; i < n; i++) { }

Steps of the loop have to be independent!

Parallel.For (0; n; i => { });

Movie: Parallelizing Blur algorithm

Shared State
To keep our threads alive and application fast we should follow one of the ways:

Synchronization

Immutability

Isolation

Fork/Join
The problem appears when a set of functions or tasks are executed within a shared address space. The solution is to logically create threads (fork), carry out concurrent computations, and then terminate them after possibly combining results from the computations (join).

Sample:Fork/Join
Fork/Join pattern can be easily implemented via Tasks in .NET4.0

static void MyForkJoin(params Action[] actions) { var tasks = new Task [actions.Length]; for (int i = 0; i < actions.Length; i++) { tasks[i] = Task.Factory.StartNew (actions[i]); } Task.WaitAll(tasks); }

Producer/Consumer Pattern

Queue

Producer/Consumer Pattern
Producer Loops and Consumer Loops communicate through the Queue

Common solution for network communication

Adding data to full Queue; Retrieving data from empty Queue

MapReduce
Is widely used in: Quick search in databases; databases; Advanced Analytical Queries; Queries;

Documents/Tables indexing; indexing; Parsing. Parsing.

Sample:MapReduce
Map function void Map( String name, String document){ // name document name // document document content for each word w in document: EmitIntermediate(w, EmitIntermediate(w, "1"); }

Grouping pairs

Reduce function void Reduce( String word, Iterator partialCounts){ partialCounts){ // partialCounts a set of grouped pairs int result = 0; for each word in partialCounts: partialCounts: result += parseInt(v); parseInt(v); Emit(AsString(result)); Emit(AsString(result)); }

Discrete Event Pattern


Event generated

Event processed

Event generated Dynamic decision Event generated Event processed

Event processed

Event generated

Discrete Event Pattern


Known problem: Approaches: Optimistic
Roll back the effects of events that are mistakenly executed (inc new events) Not feasible if an event causes interaction with the outside world

Out-of-order events

Pessimistic

Ensures that events are always executed in order

Increased latency; Communication overhead

Dynamic Programming Pattern


Optimally solving sub problems => globally optimal solution

Top-down

Bottom-up

Should involve memorization to avoid redundant computations

Dynamic Programming Pattern


Synchronization: Parent pulls the result from its children Child pushes the result to parents Can be across entire level of subproblems to amortize overhead Require good load balancing Deadlocks: Possible individual locks on each sub-problem

Recursion MAY Be Paralyzed!


{ Parallel_Work(A); ProcessNode(B); ProcessNode(C); }

Recursive Splitting Pattern


Use more than one task generated per call

Use a balanced data structure

Use a fork-join or task-queue implementation Use optimizations to improve locality

Recursive Splitting
Data structure class Tree<T> { Tree<T> public Tree<T> Left, Right; // children Tree<T> public T Data; // data for the node } Sequential approach
public void Process Process<T>(Tree<T> tree, Action Action<T> action) { // Process the current node, then left and right action(tree.Data); Process(tree.Left, action); Process(tree.Right, action); }

Parallel approach // Recursive delegate to walk the tree


processNode = node => {

// Asynchronously run the action on the current node


ThreadPool.QueueUserWorkItem (delegate { action(node.Data); }); processNode(node.Left); // Process the children processNode(node.Right); Recursive Splitting Pattern };

Debugging Tools

Performance Analyzer Tools


Visual Studio Profiler The following Views are available: Call Tree , Modules, Caller/Callee, Functions, Process, Lines, IPs, Marks

Intel VTune Performance Analyzer (Intels CPUs only) Collects performance data from system or application: Call Graph && Sampling: Eventbased or Timebased . Intel Thread Checker Two modes of operation: Source && Binary Instrumentation

Or You May Use Task Manager, Process Explorer, Process Viewer, Performance Monitor, etc

Movie: Debugging with VS2010

GetTickCount() vs GetThreadTimes()
GetTickCount(void) GetThreadTimes( hThread, lpCreationTime, lpExitTime, lpKernelTime, lpUserTime )

Retrieves the number of milliseconds that have elapsed since the system was started Counts all the time, no matter which thread worked at the time

Retrieves timing information for the specified thread

Takes into account that thread mightve been suspended It only produces correct values if each thread would consume all of its time-slice

Returns correct values independently on threads behavior

Performance Counters
Provide information as to how well the OS or an app is performing Can help determine system bottlenecks and fine-tune app performance
Over 40 counters for memory performance objects Over 30 counters for process performance objects

Consuming: Registry vs PDH interface


The PDH interface is a higher-level abstraction over registry interface

Performance Tuning Steps


Write your code as fast as you can to enable functionality

Measure performance and find real bottlenecks

Fix only the code that really impacts performance

Resume

PP supported by number of third-party frameworks

Frameworks implement efficient patterns

Avoid low-level multithreading => utilize patterns implemented in PP frameworks

QUESTIONS ?

Potrebbero piacerti anche