Sei sulla pagina 1di 27

By Kelvin Chou

Programming Parallelism

Data Level Parallelism


Single
Multiple
Instruction Instruction
Single Data SISD
MISD
Multiple
Data

SIMD

MIMD

Single Instruction Multiple


Data
One Clock Cycle
Vectorized

Question from Fall 2012


What would we
want START set
to be?

Question from Fall 2012

We can see that


we want to handle
the fringe first!
What part does
the fringe take car
of for us?

Question from Fall 2012


Since we are
32 bit integers
Into a 128 bit
vector, can fit
4 integers into
one vector.

Question from Fall 2012


We would want
Start set to be:
(n % 4)

#define START (n % 4)

Question from Fall 2012


What would we
want J_INIT set
to be?

#define START (n % 4)

Question from Fall 2012

Notice what
we are loading
into j_on_steroids?

#define START (n % 4)

Question from Fall 2012


The j_on_steroids
is being added to
a vector of what
START is.

#define START (n % 4)

Question from Fall 2012

We see that
j += START. Where
else do we notice
that START is bein
used to calculate?

#define START (n % 4)

Question from Fall 2012


There is a
START + 1 in
the fringe case!
That is where we
will start.
Therefore,
J_INIT = 1

#define START (n % 4)
#define J_INIT 1

Question from Fall 2012

Next will be
STEROIDS_INIT.
We see that this
Is being added
to the set1 of
START.

#define START (n % 4)
#define J_INIT 1

Question from Fall 2012

Given that we wan


to start at the
START, we know
that for
Differentiation
Multiply by one
Greater than
Current.

#define START (n % 4)
#define J_INIT 1

Question from Fall 2012

Since initial is
Added to set1.
Then, we can
see that we want
each field
Initialized
Differently.

#define START (n % 4)
#define J_INIT 1

Question from Fall 2012

Example:
If START = 2.
Want the vector
Going into main
Loop to contain
{3, 4, 5, 6}

#define START (n % 4)
#define J_INIT 1

Question from Fall 2012

{3, 4, 5, 6}
Therefore
{2, 2, 2, 2} +
{a, b, c, d}?
{1, 2, 3, 4}

#define START (n % 4)
#define J_INIT 1
#define STEROIDS_INIT \
{1, 2, 3, 4}

Question from Fall 2012

Last thing we
Want to find
Is where to
END the loop?

#define START (n % 4)
#define J_INIT 1
#define STEROIDS_INIT \
{1, 2, 3, 4}

Question from Fall 2012

Keep in mind
The constraints
Of the problem.
What is the
Last index that we
Need to keep track

#define START (n % 4)
#define J_INIT 1
#define STEROIDS_INIT \
{1, 2, 3, 4}

Question from Fall 2012

In differentiation
the nth term falls
off!
So this means
END = (n-1)

Thread Level Parallelism


Each thread executes on
different data
Allows things to be run in
parallel
OpenMP is an example

Thread Level Parallelism

Question from Spring 2013


Suppose we have int *A that points to the head of an
array of length len. Assume we have n > 1 threads.
#pragma omp parallel for
for (int x = 0; x < len; x++){
*A = x;
A++;
}
Is this always incorrect, sometimes incorrect, always
correct?

Question from Spring 2013


#pragma omp parallel for
for (int x = 0; x < len; x++){
*A = x;
A++;
}
Is this always incorrect, sometimes incorrect,
always correct?
This is due to the fact that we will have data races to
see who can increment A correctly. It is possibly
correct if the stars align.

Question from Spring 2013


#pragma omp parallel
{
for (int x = 0; x < len; x++) {
*(A+x) = x;
}
}
Is this always incorrect, sometimes incorrect, always
correct?

Question from Spring 2013


#pragma omp parallel
{
for (int x = 0; x < len; x++) {
*(A+x) = x;
}
}
Is this always incorrect, sometimes incorrect, always
correct?
Now, is this faster or slower than serial?

Question from Spring 2013


#pragma omp parallel
{
for (int x = 0; x < len; x++) {
*(A+x) = x;
}
}
Is this always incorrect, sometimes incorrect, always correct?
Now, is this faster or slower than serial?
Slower, due to duplication of work. And false sharing.

Potrebbero piacerti anche