Sei sulla pagina 1di 21

How multithreaded architecture works in

DB2 9.5
An overview

Shashank Kharche, Staff Software Engineer, IBM

Summary: New multithreaded capabilities were introduced inDB2® 9.5 for Linux®, UNIX®,
and Windows®, codenamed “Viper 2." Learn how these new capabilities affect you if you
regularly monitor processes or threads, if you need to understand how much memory your
database is using, or if you want to simplify mission-critical tasks such as backup, restore, and
roll forward. You'll learn how these changes affect configuration parameters, and gain
knowledge of the new technology in DB2 9.5.

Tag this!
Update My dW interests (Log in | What's this?) Skip to help for Update My dW interests

Date: 17 Jul 2008


Level: Intermediate
PDF: A4 and Letter (164KB | 14 pages)Get Adobe® Reader®
Also available in: Chinese Russian

Activity: 7058 views


Comments: 0 (Add comments)

Average rating (based on 18 votes)

Introduction

In order to understand the new multithreaded capabilities in DB2 9.5, this article starts with a
look at the DB2 process model. The entire DB2 process model is controlled by Base System
Utilities (BSUs). BSUs allocate memory for the instance and database, intercept and handle
signals, and handle exceptions sent to DB2. Figure 1 shows the old DB2 process model for the
Linux and UNIX platforms.
Figure 1. Old DB2 process model on Linux and UNIX

Figure 2 illustrates the new process model on Linux and UNIX.


Figure 2. New DB2 process model on Linux UNIX

The communication between database servers, clients, and applications is taken care of by a
framework. This kind of framework is nothing but the process model used by all DB2 servers. It
makes sure that internally used database files won't interfere with user or database applications.

Engine dispatchable units (EDUs) are responsible for performing various tasks such as
processing database application requests, reading database log files, and flushing log records
from the log buffer to the log files on disk. Typically the DB2 server handles this as a separate
EDU per task. Prior to DB2 9.5, most of these EDUs were processed based on UNIX and Linux
environments and were thread based on a Windows environment. Now, in 9.5 there is uniformity
in the process model of DB2 as EDUs are now thread based on Linux, UNIX, and Windows
environments.

Here are some of the advantages of the new memory model:

• The new memory model is simpler and more easily configured. See the following entries
in the DB2 Information Center:
o Configuring memory and memory heaps
o Memory configuration has been simplified
• This model saves resources:
Significantly fewer system file descriptors are used. The most obvious distinction
between processes and threads is that all threads of a process share the same memory
space and system-defined facilities. Facilities include open file handles (file descriptors),
shared memory, process synchronization primitives, and current directory. All threads in
a process can share the same file descriptors. There is no need to have each agent
maintain its own file descriptor table.
• Performance is enhanced:
Operating systems can generally switch (context switching) faster between threads of the
same process than between different process. There is no need to switch address space.
Because global memory is shared and almost no new memory must be allocated, creating
a thread is simpler and faster than creating a process. Process creation is expensive in
terms of processor cycles and memory usage.
• There are more automatic and dynamic configurable parameters, so less is required from
the DBA.
This is covered in the Process model configuration simplification section of this article.
• The process model is the same now across all three platforms: Linux, UNIX, and
Windows.

Monitor threads with db2pd and mapping it with ps output

Prior to DB2 9.5, on UNIX and Linux environments with the help of the ps system command or
the db2_local_ps command, you were able to list all active DB2 EDUs. However, in DB2 9.5
those commands no longer list any EDU threads within the db2sysc process. Therefore, one of
the changes DB2 users and DBAs will see when they use an OS command to look at the
processes running on the system is that they will see only one process as apposed to several. This
is an administrative change you might expect from a DBA perspective.

$ ps -fu db2ins10
UID PID PPID C STIME TTY TIME CMD
db2ins10 1237176 2109662 0 Feb 28 - 0:12 db2acd 0
db2ins10 1921136 2109662 0 Feb 28 - 0:14 db2sysc 0
db2ins10 2101494 1941686 0 14:22:34 pts/1 0:00 -ksh
db2ins10 2420958 2101494 0 15:25:33 pts/1 0:00 ps -fu db2ins10

On AIX:
To view all threads of the db2sysc process (PID = 1921136):

Listing 1. View all threads of the db2sysc process on an AIX system


$ ps -mo THREAD -p 1921136

USER PID PPID TID ST CP PRI SC WCHAN F TT BND


COMMAND
db2ins10 1921136 2109662 - A 0 60 26 * 40401 - -
db2sysc 0
- - - 1273899 S 0 60 1 f1000100403674b0 410400
- - -
- - - 1327331 Z 0 60 1 - c00001 - -
-
- - - 1392805 Z 0 60 1 - c00001 - -
-
- - - 1601705 Z 0 60 1 - c00001 - -
-
- - - 1814627 Z 0 60 1 - c00001 - -
-
- - - 1851457 S 0 60 1 f1000004f010de00 410400
- - -
- - - 1961987 Z 0 60 1 - c00001 - -
-
- - - 1974311 Z 0 60 1 - c00001 - -
-
- - - 2023571 S 0 60 1 f100010041b401b0 410400
- - -
- - - 2068591 Z 0 60 1 - c00001 - -
-
- - - 2179161 Z 0 60 1 - c00001 - -
-
- - - 2187515 Z 0 60 1 - c00001 - -
-
- - - 2216003 S 0 60 1 - 400400 - -
-
- - - 2412647 Z 0 60 1 - c00001 - -
-
- - - 2551911 Z 0 60 1 - c00001 - -
-
- - - 2592969 Z 0 60 1 - c00001 - -
-
- - - 2621455 S 0 60 1 f1000100407f7e30 410400
- - -
- - - 2658531 S 0 60 1 - 418400 - -
-
- - - 3031171 Z 0 60 1 - c00001 - -
-
- - - 3457047 Z 0 60 1 - c00001 - -
-
- - - 3899477 Z 0 60 1 - c00001 - -
-
- - - 4157609 Z 0 60 1 - c00001 - -
-
- - - 4390991 S 0 60 1 - 400400 - -
-
- - - 4636819 Z 0 60 1 - c00001 - -
-
- - - 5628153 S 0 60 1 - 400400 - -
-
- - - 6783009 Z 0 60 1 - c00001 - -
-

On Linux:

To view all threads of the db2sysc process (PID = 1921136): ps -lLfp 1921136

The DBA's life is made easier now. Improvements were done in db2pd to list processes and
threads. You can now use the db2pd command, with the -edu option, to list all EDU threads that
are active. It can be used on UNIX, Linux, and Windows systems.
Listing 2. view all threads of the db2sysc process on a Linux system
$ db2pd -edu

Database Partition 0 -- Active -- Up 1 days 01:05:54


List of all EDUs for database partition 0

db2sysc PID: 1921136


db2wdog PID: 2109662
db2acd PID: 1237176

EDU ID TID Kernel TID EDU Name USR


SYS
=============================================================================
======
1801 1801 2216003 db2agent (idle) 0 0.706935
1.071737
1543 1543 5628153 db2resync 0 0.002641
0.004271
1286 1286 1851457 db2ipccm 0 0.082388
0.044037
1029 1029 2023571 db2licc 0 0.000211
0.001055
772 772 4390991 db2thcln 0 0.000244
0.000105
515 515 2621455 db2aiothr 0 2.740874
6.287562
2 2 1273899 db2alarm 0 0.274076
0.408226
258 258 2658531 db2sysc 0 2.085981
1.379128

Back to top

How much memory is used by DB2?

There are few ways to check memory usage:

• db2pd -dbptnmem
• db2 get snapshot for applications on sample
• select * from table(admin_get_dbp_mem_usage())
• db2mtrk -a and db2mtrk -p

Note the following information:

• db2pd shows accurate representation of shared memory hierarchy


• db2pd still can't report private memory allocations
• db2mtrk can report private memory allocations, but weak in other areas
• Private memory usage is not so interesting anymore
• db2pd -dbpntmem high-level reporting may be sufficient

Using db2pd

Listing 3. Example of db2pd


$ db2pd -dbptnmem

Database Partition 0 -- Active -- Up 1 days 01:11:27

Database Partition Memory Controller Statistics

Controller Automatic: Y
Memory Limit: 13994636 KB
Current usage: 76608 KB
HWM usage: 332736 KB
Cached memory: 16064 KB

Individual Memory Consumers:

Name Mem Used (KB) HWM Used (KB) Cached (KB)


========================================================
DBMS-db2ins10 46784 46784 10048
FMP_RESOURCES 22528 22528 0
PRIVATE 7296 7296 6016

Fields information:

Controller Automatic is set to Y if the INSTANCE_MEMORY configuration parameter is set to


AUTOMATIC. This means that database manager automatically determines the upper boundary
on memory consumption.

Memory Limit is the DB2 server's upper bound of memory that can be consumed. It is the value
of the INSTANCE_MEMORY configuration parameter.

Current usage is the amount of memory the server is currently consuming.

HWM usage is the high water mark (HWM) or peak memory usage that has been consumed
since the activation of the database partition when the db2start command was run.

Cached memory is how much of the current usage is not currently being used, but is cached for
performance reasons for future memory requests.

Individual Memory Consumers section:

• All registered "consumers" of memory within the DB2 server are listed with the amount
of the total memory they are consuming.
• Name: A brief, distinguishing name of a "consumer" of memory. Examples include:
o APPL-<dbname> for application memory consumed for a database <dbname>
o DBMS-xxx for global database manager memory requirements
o FMP_RESOURCES for memory required to communicate with db2fmps
o PRIVATE for miscellaneous private memory requirements
o FCM_RESOURCES for Fast Communication Manager resources
o LCL-<pid> for memory segment used to communicate with local applications
o DB-<dbname> for database memory consumed for a database <dbname>
• Mem Used (KB): How much memory is currently allotted to that consumer.
• HWM Used (KB): High-water mark, or peak, memory that the consumer has used.
• Cached (KB): Of the Mem Used (KB), the amount of memory that is not currently being
used but is immediately available for future memory allocations.

Using db2 get snapshot

Listing 4. Example of db2 get snapshot


$ db2 get snapshot for applications on sample

Memory usage for application:

Memory Pool Type = Application Heap


Current size (bytes) = 65536
High water mark (bytes) = 65536
Configured size (bytes) = 1048576

Agent process/thread ID = 6463


Agent Lock timeout (seconds) = -1
Memory usage for agent:

Memory Pool Type = Other Memory


Current size (bytes) = 196608
High water mark (bytes) = 196608
Configured size (bytes) = 16710107136

Using SQL

Listing 5. Example of using SQL


$ db2 "select * from table(admin_get_dbp_mem_usage())"

DBPARTITIONNUM MAX_PARTITION_MEM CURRENT_PARTITION_MEM PEAK_PARTITION_MEM


-------------- -------------------- --------------------- --------------------
0 14330507264 340590592 340852736

1 record(s) selected.
Using db2mtrk

Listing 6. Example of db2mtrk -a


$ db2mtrk -a
Tracking Memory on: 2008/02/29 at 15:51:00

Application Memory for database: SAMPLE


appshrh
128.0K

Memory for application 546


apph other
64.0K 192.0K

Memory for application 545


apph other
64.0K 192.0K

Memory for application 544


apph other
64.0K 320.0K

Memory for application 543


apph other
64.0K 576.0K

Memory for application 547


apph other
64.0K 192.0K

Listing 7. Example of db2mtrk -p


$ db2mtrk -p
Tracking Memory on: 2008/02/29 at 15:51:37

Memory for agent 6463


other
192.0K

Memory for agent 6206


other
192.0K
Memory for agent 5949
other
320.0K

Memory for agent 2094


other
576.0K

Memory for agent 6720


other
192.0K

Note: By default, INSTANCE_MEMORY is set to AUTOMATIC, which means that the


instance is allowed a maximum of some percentage of RAM (the range is 75 percent for smaller
systems and 95 percent for larger systems ). This includes all local partitions for a single
instance.

db2 get dbm cfg show detail|grep INSTANCE_MEMORY


Size of instance shared memory(4KB)
(INSTANCE_MEMORY)=AUTOMATIC(3498659)AUTOMATIC(3498659)

You cannot permanently set different INSTANCE_MEMORY values for different database
partitions. For a DB2 Express licenses, the upper bound on INSTANCE_MEMORY is further
restricted to at most 4GB of memory (1,048,576 * 4KB pages). DB2 Workgroup licenses are
restricted to at most 16GB of memory (4,194,304 * 4KB pages). Attempts to update the
INSTANCE_MEMORY configuration parameter to values larger than these limits will fail with
a SQL5130N return code, specifying the restricted range allowed for the license. Other license
types have no additional restrictions. You cannot set INSTANCE_MEMORY to be more than
RAM.

Back to top

Get rid of DPF backup and recovery problems

Each partition gets a different timestamp

On previous DB2 versions:

$ db2_all " db2 backup db test"

Backup successful. The timestamp for this backup image is : 20080304124529


eva88: db2 backup db test completed ok

Backup successful. The timestamp for this backup image is : 20080304124544


eva88: db2 backup db test completed ok

Backup successful. The timestamp for this backup image is : 20080304124554


eva88: db2 backup db test completed ok

Whereas, in DB2 9.5, the BACKUP command is enhanced to take a list of database partitions,
which provides a single system view.

$ db2 backup db test on all dbpartitionnums


Part Result
---- -------------------------------
0000 DB20000I The BACKUP DATABASE command completed successfully.
0010 DB20000I The BACKUP DATABASE command completed successfully.

Backup successful. The timestamp for this backup image is : 20080304135942

How do you determine what log files are required during roll-forward?

$ db2 rollforward db test to 2008-03-01 and stop


SQL1275N The stoptime passed to roll-forward must be greater than or equal
to
"2008-03-04-12.45.54.000000 UTC", because database "TEST" on node(s) "0,1"
contains information later than the specified time.

$ db2 rollforward db test to 2008-03-04-12.45.54.000000 and stop


DB20000I The ROLLFORWARD command completed successfully.

The above example shows that during roll-forward if point in time (PIT) specified in the
command is old or early, you get error message (SQL1275N). The error message tells you about
the correct PIT. You might consider using BACKUP with INCLUDE logs. However, in a DPF
database, BACKUP with INCLUDE logs generates error message (SQL2032N). Therefore, you
cannot use this option.

Whereas, in DB2 9.5 you can use the "TO END OF BACKUP" clause with the
ROLLFORWARD command to roll forward all partitions in a partitioned database to the
minimum recovery time. The minimum recovery time is the earliest point in time during a roll-
forward when a database is consistent (when the objects listed in the database catalogs match the
objects that physically exist on disk). Manually determining the correct point in time to which to
roll forward a database is difficult, particularly for a partitioned database. The "END OF
BACKUP" option makes it easy.
$ db2 rollforward db test to end of backup and stop
DB20000I The ROLLFORWARD command completed successfully

Back to top

What's important about user limits?

User limits set or show various restrictions on resource usage for a shell. It's a good practice to
set some of these limitations to prevent such issues as a faulty shell script to start unlimited
copies of itself or to prevent users on the system to start processes that run forever. But, what to
set it to? Below are the few considerations for various restrictions on resources:

• At db2 start, data and nofiles are unlimited.


• The stack limit is irrelevant because DB2 creates its own stack space
(AGENT_STACK_SZ dbm cfg)
On 64-bit UNIX
Default : 4 MB
Minimum : 1 MB
Maximum : 128 MB

On 32-bit LINUX
Default : 1MB
Minimum : 64 KB
Maximum : 4MB

• MAXFILOP is the maximum per database per partition. New high defaults of ~32K for
32-bit and ~64K for 64-bit.
• Current ulimit setting (or 8GB on AIX if ulimit is set to unlimited). DB2 overrides an
unlimited core limit. In order to get a core larger than 8GB, you have to explicitly set the
core limit to something larger than 8GB, but not unlimited.

Back to top

Process model configuration simplification

In this section you will see how the configuration parameters behave differently in DB2 9.5.
Take note of the default values and ranges, as they are different than before.
Figure 3. Configuration parameters

If you have performance-critical unfenced, external stored procedures (SPs) or user-defined


functions (UDFs), ensure they are thread-safe. Upon migration, all external, unfenced SPs and
UDFs, will become fenced.
For data integrity, by convention, unfenced SPs and UDFs should already be thread-safe, but this
cannot be enforced. Running a non-thread-safe SP or UDF in a multi-threaded process could
cause unpredictable problems. Therefore, as a migration procedure, create a script to facilitate
the conversion to unfenced.

Before taking Schooner, have a quick look over newly introduced threads and processes:

• db2thcln (thread stack cleanup): Recycles resources when an EDU terminates (UNIX-
only).
• db2aiothr (aio collector thread): Manages asynchronous I/O requests for the database
partition (UNIX-only).
• db2alarm (alarm thread): Notifies EDUs when their requested timer has expired (UNIX-
only).
• db2vend (fenced vendor process): Executes vendor code on behalf of an EDU, for
instance to execute the user-exit program for log archiving (UNIX-only).
• db2extev (external event handler thread): The same as SIGUSR2.
• db2acd: A health monitor process.

Finally, does it impact the applications that you currently have if you are moving up DB2 9.5?
The answer to this question is NO, absolutely NOT. The internal change this does not affect the
application at all. In fact, it is largely transparent from an administration and application
programming prospective.

Back to top

Acknowledgements

Special thanks to Amar Thakkar and Samir Kapoor for their technical review of this article.

Resources

Learn

• IBM DB2 9.5 Information Center for Linux, UNIX and Windows: Find information
describing how to use the DB2 family of products and features, as well as related
WebSphere® Information Integration products and features.

• IBM DB2 Express-C 9.5: Download DB2 Express-C 9.5, a no-charge version of DB2
Express 9 database server.

• IBM DB2 Training and Certification: Find award winning instructors, industry leading
software, hands-on labs, and more.

• DB2 for Linux, UNIX, and Windows Forum: Share questions, thoughts, and ideas with
others DB2 users and developers.

• Stay current with developerWorks technical events and webcasts.

• developerWorks Information Management zone: Learn more about DB2. Find technical
documentation, how-to articles, education, downloads, product information, and more.

Get products and technologies

• Build your next development project with IBM trial software, available for download
directly from developerWorks.
• Download IBM product evaluation versions and get your hands on application
development tools and middleware products from DB2, Lotus®, Rational, Tivoli®, and
WebSphere.

Discuss

• Participate in the discussion forum.

• Participate in developerWorks blogs and get involved in the developerWorks community.

About the author

Shashank Kharche is a staff software engineer with the IBM Australia Development Lab in
Sydney, Australia. He is an IBM certified DB2 administrator. Shashank currently works as part
of the Down Systems Division, Asia Pacific region, and has widespread experience in DB2
database and the diagnosis and resolution of critical problems. He has published several
technotes for IBM. He holds a Bachelor's degree in Computers Science and Engineering. You
can reach him at shashank.kharche@au1.ibm.com.

Simultaneous multithreading
From Wikipedia, the free encyclopedia

This section may require cleanup to meet Wikipedia's quality


standards. Please improve this section if you can. The talk page may contain
suggestions. (November 2007)

Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the


overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple
independent threads of execution to better utilize the resources provided by modern processor
architectures.
Contents
[hide]

• 1 Details
• 2 Taxonomy
• 3 Historical implementations
• 4 Modern commercial
implementations
• 5 Disadvantages
• 6 See also
• 7 References

• 8 External links

[edit] Details

Multithreading is similar in concept to preemptive multitasking but is implemented at the thread


level of execution in modern superscalar processors.

Simultaneous multithreading (SMT) is one of the two main implementations of multithreading,


the other form being temporal multithreading. In temporal multithreading, only one thread of
instructions can execute in any given pipeline stage at a time. In simultaneous multithreading,
instructions from more than one thread can be executing in any given pipeline stage at a time.
This is done without great changes to the basic processor architecture: the main additions needed
are the ability to fetch instructions from multiple threads in a cycle, and a larger register file to
hold data from multiple threads. The number of concurrent threads can be decided by the chip
designers, but practical restrictions on chip complexity have limited the number to two for most
SMT implementations, though there have been as many as 8 threads per core in, for example, the
UltraSPARC T2.

Because the technique is really an efficiency solution and there is inevitable increased conflict on
shared resources, measuring or agreeing on the effectiveness of the solution can be difficult.
Some researchers have shown that the extra threads can be used to proactively seed a shared
resource like a cache, to improve the performance of another single thread, and claim this shows
that SMT is not just an efficiency solution. Others use SMT to provide redundant computation,
for some level of error detection and recovery.

However, in most current cases, SMT is about hiding memory latency, increasing efficiency, and
increasing throughput of computations per amount of hardware used.

[edit] Taxonomy

In processor design, there are two ways to increase on-chip parallelism with less resource
requirements: one is superscalar technique which tries to increase Instruction Level Parallelism
(ILP), the other is multithreading approach exploiting Thread Level Parallelism (TLP).
Superscalar means executing multiple instructions at the same time while chip-level
multithreading (CMT) executes instructions from multiple threads within one processor chip at
the same time. There are many ways to support more than one thread within a chip, namely:

• Interleaved multithreading: Interleaved issue of multiple instructions from


different threads, also referred to as Temporal multithreading. It can be
further divided into fine-grain multithreading or coarse-grain multithreading
depending on the frequency of interleaved issues. Fine-grain multithreading
-- such as in a barrel processor -- issues instructions for different threads after
every cycle, while coarse-grain multithreading only switches to issue
instructions from another thread when the current executing thread causes
some long latency events (like page fault etc.). Coarse-grain multithreading is
more common for less context switch between threads. For example, Intel's
Montecito processor uses coarse-grain multithreading, while Sun's
UltraSPARC T1 uses fine-grain multithreading. For those processors that have
only one pipeline per core, interleaved multithreading is the only possible
way, because it can issue at most one instruction per cycle.
• Simultaneous multithreading (SMT): Issue multiple instructions from multiple
threads in one cycle. The processor must be superscalar to do so.
• Chip-level multiprocessing (CMP or multicore): integrates two or more
processors into one chip, each executing threads independently
• Any combination of multithreaded/SMT/CMP

The key factor to distinguish them is to look at how many instructions the processor can issue in
one cycle and how many threads from which the instructions come. For example, Sun
Microsystems' UltraSPARC T1 (known as "Niagara" until its November 14, 2005 release) is a
multicore processor combined with fine-grain multithreading technique instead of simultaneous
multithreading because each core can only issue one instruction at a time.

[edit] Historical implementations

While multithreading CPUs have been around since the 1950s, simultaneous multithreading was
first researched by IBM in 1968. The first major commercial microprocessor developed with
SMT was the Alpha 21464 (EV8). This microprocessor was developed by DEC in coordination
with Dean Tullsen of the University of California, San Diego, and Susan Eggers and Hank Levy
of the University of Washington. The microprocessor was never released, since the Alpha line of
microprocessors was discontinued shortly before HP acquired Compaq which had in turn
acquired DEC. Dean Tullsen's work was also used to develop the "Hyper-threading" (or "HTT")
versions of the Intel Pentium 4 microprocessors, such as the "Northwood" and "Prescott".

[edit] Modern commercial implementations

The Intel Pentium 4 was the first modern desktop processor to implement simultaneous
multithreading, starting from the 3.06GHz model released in 2002, and since introduced into a
number of their processors. Intel calls the functionality Hyper-Threading Technology (HTT), and
provides a basic two-thread SMT engine. Intel claims up to a 30% speed improvement compared
against an otherwise identical, non-SMT Pentium 4. The performance improvement seen is very
application dependent, and some programs actually slow down slightly when HTT is turned on
due to increased contention for resources such as bandwidth, caches, TLBs, re-order buffer
entries, etc. This is generally the case for poorly written data access routines that cause high
latency intercache transactions (cache thrashing) on multi-processor systems. Programs written
before multiprocessor and multicore designs were prevalent commonly did not optimize cache
access because on a single CPU system there is only a single cache which is always coherent
with itself. On a multiprocessor system each CPU or core will typically have its own cache,
which is interlinked with the cache of other CPU/cores in the system to maintain cache
coherency. If thread A accesses a memory location [00] and thread B then accesses memory
location [01] it can cause an intercache transaction particularly where the cache line fill exceeds
2 bytes, as is the case for all modern processors.

The latest[when?] MIPS architecture designs include an SMT system known as "MIPS MT". MIPS
MT provides for both heavyweight virtual processing elements and lighter-weight hardware
microthreads. RMI, a Cupertino-based startup, is the first MIPS vendor to provide a processor
SOC based on 8 cores, each of which runs 4 threads. The threads can be run in fine-grain mode
where a different thread can be executed each cycle. The threads can also be assigned priorities.

The IBM POWER5, announced in May 2004, comes as either a dual core DCM, or quad-core or
oct-core MCM, with each core including a two-thread SMT engine. IBM's implementation is
more sophisticated than the previous ones, because it can assign a different priority to the various
threads, is more fine-grained, and the SMT engine can be turned on and off dynamically, to
better execute those workloads where an SMT processor would not increase performance. This is
IBM's second implementation of generally available hardware multithreading.

Although many people reported that Sun Microsystems' UltraSPARC T1 (known as "Niagara"
until its 14 November 2005 release) and the upcoming processor codenamed "Rock" (to be
launched ~2009 [1]) are implementations of SPARC focused almost entirely on exploiting SMT
and CMP techniques, Niagara is not actually using SMT. Sun refers to these combined
approaches as "CMT", and the overall concept as "Throughput Computing". The Niagara has 8
cores, but each core has only one pipeline, so actually it uses fine-grained multithreading. Unlike
SMT, where instructions from multiple threads share the issue window each cycle, the processor
uses a round robin policy to issue instructions from the next active thread each cycle. This makes
it more similar to a barrel processor. Sun Microsystems' Rock processor is different, it has more
complex cores that have more than one pipeline.

The Intel Atom, released in 2008, is the first Intel product to feature SMT (marketed as Hyper-
Threading) without supporting instruction reordering, speculative execution, or register
renaming. Intel reintroduced Hyper-Threading with the Nehalem microarchitecture, after its
absence on the Core microarchitecture.

[edit] Disadvantages

Simultaneous multithreading cannot improve performance if any of the shared resources are
limiting bottlenecks for the performance. In fact, some applications run slower when
simultaneous multithreading is enabled. Critics argue that it is a considerable burden to put on
software developers that they have to test whether simultaneous multithreading is good or bad
for their application in various situations and insert extra logic to turn it off if it decreases
performance. Current operating systems lack convenient API calls for this purpose and for
preventing processes with different priority from taking resources from each other [2].

There is also a security problem with simultaneous multithreading. It has been proven that it is
possible for one application to steal a cryptographic key from another application running in the
same processor by monitoring its cache use [3].

[edit] See also

• Temporal multithreading, another implementation of hardware


multithreading
• Thread (computer science), the fundamental software entity scheduled by
the operating system kernel to execute on a CPU or processor (core)
• Symmetric multiprocessing, where the system (or partition of a larger
computer hardware platform) contains more than one CPU or processor
(core) and where the operating system kernel is not limited to which of the
available CPUs (cores) a given thread can be scheduled to execute on

[edit] References

1. ^ http://www.theregister.co.uk/2007/12/14/sun_rock_delays/
2. ^ How good is hyperthreading?
3. ^ Hyper-Threading Considered Harmful

• LE Shar and ES Davidson, "A Multiminiprocessor System Implemented


through Pipelining", Computer Feb 1974
• D.M. Tullsen, S.J. Eggers, and H.M. Levy, "Simultaneous Multithreading:
Maximizing On-Chip Parallelism," In 22nd Annual International Symposium on
Computer Architecture, June, 1995
• D.M. Tullsen, S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo, and R.L. Stamm,
"Exploiting Choice: Instruction Fetch and Issue on an Implementable
Simultaneous Multithreading Processor," In 23rd Annual International
Symposium on Computer Architecture, May, 1996

[edit] External links

• SMT news articles and academic papers


• SMT research at the University of Washington
• Timeline of multithreading technologies

Queue-based primitives for a multithreaded architecture

Source Proceedings of the seventeenth Euromicro conference on Software


and hardware : specification and design: specification and design
table of contents
Vienna, Austria
Pages: 113 - 116
Year of Publication: 1992
ISSN:0165-6074
Also published in ...
Author Xiaoming Fan
Publisher Elsevier North-Holland, Inc. Amsterdam, The Netherlands, The
Netherlands
Bibliometrics Downloads (6 Weeks): n/a, Downloads (12 Months): n/a,
Citation Count: 0

Additional Information: index terms

Tools and Review this Article


Actions: Save this Article to a Binder Display Formats: BibTeX EndNote ACM
Ref

INDEX TERMS

Primary Classification:
D. Software
D.4 OPERATING SYSTEMS
D.4.1 Process Management
Subjects: Multiprocessing/multiprogramming/multitasking

Additional Classification:
C. Computer Systems Organization
C.1 PROCESSOR ARCHITECTURES

General Terms:
Design, Measurement, Performance

This Article has also been published in:

• Microprocessing and Microprogramming

Potrebbero piacerti anche