Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Agenda
Multithreading Overview
Parallel Programming Techniques
Multicore Programming Challenges
Application Examples
Conclusions
Multithreading Overview
Thread
Entity within a process that can be executed
Shares resources of the process
Has individual thread resources
Waiting
Ready to Run
Thread
Thread
Thread
Thread
Thread
Thread
Thread
Running
Thread
Thread
Core 1
Core 2
Thread
thread
thread
thread
thread
thread
thread
thread
DEMOS
Multithreading in LabVIEW
Implicit Parallelism (Automatic
Multithreading in LabVIEW)
Explicit Parallelism (Control of threads
with Timed Loops)
LabVIEW Disadvantages
Inherent Parallelism
Automatic multithreading
LabVIEW is cross-platform
(Windows, Mac, Linux), no
need to learn different
threading APIs
Support for parallel libraries
such as Intel Math Kernel
Library (MKL)
Task A
Task B
CPU Core
Task E
Task F
CPU Core
CPU Core
Task C
Task D
time
CPU Core
CPU Core
CPU Core
Task A
Task C
Task E
Task B
Task F
Task D
time
Application Decomposition
First step is to break down program into
core components
Application
Tasks
Data
Data Flow
Data Paradigm
Best suited for data operations that are completely
independent
Parallel Strategies
Task
Paradigm
Application
Data
Paradigm
Data Flow
Paradigm
Task Parallelization
Divide & Conquer
Geometric
Recursive
Pipeline
Wave Front
Task Parallelism
Tasks
Code is comprised of logically independent blocks
of functionality
Task Parallelism
Not all code requires sequential execution
Isolate independent chunks of code and
mark them as tasks
Task Parallelism
Not all code requires sequential execution
Isolate independent chunks of code and
mark them as tasks
Task A
Task B
Task C
split
subproblem
subproblem
split
subproblem
split
subproblem
solve
subproblem
solve
subsolution
subsolution
subproblem
solve
solve
subsolution
subsolution
merge
merge
subsolution
subsolution
merge
solution
Data Parallelism
Geometric Decomposition
No dependencies in data
Could be completely parallelized if enough
resources were available
Recursive Structure
Similar to Divide & Conquer strategy. Data is
inherently recursive and can be split up into
parallelized subsets
Data Parallelism
You can speed up processor-intensive operations
on large data sets by segmenting the data.
Data Set
CPU Core
CPU Core
Signal Processing
CPU Core
CPU Core
Result
Data Parallelism
You can speed up processor-intensive operations
on large data sets by segmenting the data.
Data Set
CPU Core
Signal Processing
CPU Core
Signal Processing
CPU Core
Signal Processing
CPU Core
Signal Processing
Combine
Results
Wave Front
Data has dependencies but can be computed if
prior elements are computed
Pipelining
Many applications involve sequential,
multistep algorithms
Applying pipelining can increase performance
2
Acquire
1
Acquire
1
time
t0
t3
t4
t7
Pipelining Strategy
CPU Core
Acquire
Filter
CPU Core
Analyze
CPU Core
Log
CPU Core
t0
t1
t2
t3
time
Pipelining Strategy
CPU Core
Acquire
Acquire
Filter
CPU Core
Filter
Analyze
CPU Core
Analyze
Log
CPU Core
t0
t1
t2
t3
Log
time
Pipelining Strategy
CPU Core
Acquire
CPU Core
Acquire
Acquire
Acquire
Filter
Filter
Filter
Filter
Analyze
Analyze
Analyze
Analyze
Log
Log
Log
CPU Core
CPU Core
t0
t1
t2
t3
Log
time
Pipelining in LabVIEW
Sequential
Pipelined
?
Note: Queues may also be
used to pipeline data between
different loops
or
Wave Front
Dependencies exist
in elements of the
data structure
For instance, the value
at (i,j) requires
computed value at
(i-1,j-1)
Wave Front
As long as the dependencies are satisfied
multiple operations can be carried out on
data
Wave Front
Wave Front effect appears as parallel
executions iterate over data
1
Wave Front
Practical Applications
Error Diffusion for Black & White Printers
Image Processing & Filtering
Pk = % of instructions affected
Sk = Speed increase factor
K = Section of code label
N = Total number of sections
Stotal = Total speed change factor
DEMOS
Parallel Programming Techniques
Divide and Conquer/Recursion
Data Parallelism
Pipelining
Amdahls Law
Multicore Programming
Challenges
Memory
Thread Synchronization
With OS scheduling, there is no guarantee when
threads will execute without using synchronization
primitives
Order of events may change at each execution due
to the way the threads are scheduled
First execution
Thread
1
Thread
2
Thread
3
Second execution
Thread
2
Thread
3
Thread
1
Third execution
Thread
3
Thread
1
Thread
2
Race Conditions
This issue occurs when threads manipulate shared resources
simultaneously
Common problem when code is migrated from single CPU
system to multicore (software was not originally created for
multicore)
No synchronization utilized between threads, results in
anomalous behavior
Ex: Two threads simultaneously writing to one memory
location
Thread 1 Data
Thread 2 Data
thread
Synchronization in LabVIEW
When more synchronization
is required, use
synchronization mechanisms:
Notifiers
Queues
Semaphores
Rendezvous
Occurrences
2. Hard Disk
Example: Computers can only read or write to
the hard disk one item at a time (File I/O cannot
be made into parallel operations)
3. Blocking Items
Example: Non-reentrant VIs
Non-reentrant VIs
Example: Multithreaded
nature of LabVIEW and
structures that provide
optimization
Libraries
Device drivers
Debugging Methods
Functional Debugging
Trace Debugging
Performance Counters
Functional Debugging
LabVIEW supports debugging parallel code for
functional correctness
Use basic LabVIEW debugging tools (highlight
execution, probes, etc.) to ensure code is
functionally correct
Trace Debugging
On real-time systems, trace debugging can show thread
activity at the OS level
Thread activity on each core is displayed by selecting a
particular CPU
Performance Counters
Performance counters provide
detailed system information such as
CPU usage, memory usage, and cache
hits/misses
LabVIEW does not natively support
performance counters but can call
Windows counters programmatically
Example utilities for performance
counting include:
Windows Perfmon
Intels VTune
DEMOS
Debugging Methods
Functional Debugging
Trace Debugging
Performance Counters
Memory Considerations
Data transfer between cores
Cache considerations
Cache Considerations
Multicore processors
typically utilize a shared
cache
Common cache problem is
false sharing where two
cores write to the same
cache line and cause
performance degradation
For cache optimization, use
processor affinity
Application Examples
2 Channels
from a
Digitizer
Recommendation
1. Read data channels separately
2. Perform FFT operations in parallel
Result
Recommendation
Split data into subsets and then perform the operation.
Recommendation
Balance acquisition rate and processing rate for
maximum throughput
Acquire from Scope
Dequeue
Element
Data Decomposition
Digital Output
7th Order
Low-pass filter
Enqueue
Element
DEMO
Multiloop Producer / Consumer
Conclusions
Multithreading - LabVIEW offers implicit parallelism
(automatic multithreading) and explicit parallelism
(timed structures)
Parallel Programming Techniques - There is no silver
bullet, the advantage of LabVIEW is that parallelism is
much easier expressed by the language
Conclusions (continued)
Multicore Programming Challenges - Debugging and
memory considerations have evolved with multicore
and play an important role
Application Examples - With minor modifications,
typical LabVIEW applications can be optimized for
multicore
Resources
www.ni.com/multicore
How to Develop
your LabVIEW skills
Experienced User
Courses
LabVIEW Advanced I
Core Courses
Begin
Here
LabVIEW
Basics I
Advanced User
LabVIEW
Basics II
Certifications
Certified LabVIEW
Associate Developer Exam
LabVIEW
Intermediate I
LabVIEW
Intermediate II
Certified LabVIEW
Developer Exam
ni.com/training
Certified LabVIEW
Architect Exam
Certification
Next Steps
Visit ni.com/training
Identify your current expertise level and
desired level
Register for appropriate courses
$200 discount for attending LV Dev Day event!