Sei sulla pagina 1di 32

Marcel Mitran - Lead Architect, IBM Java on Z

November 19th, 2013 Core0 Core2 Core4

cop cop cop

L3C 1
L3C 0
MCU
L3 Data L3 DataGX

cop cop cop

Core1 Core3 Core5

The Evolution of the Hardware/Software Interface

2013 IBM Corporation


Who am I, and What Do I Do
School
 McGill BEng ECE class of 97
 McGill MEng CIM class of 01 (computer vision)

Joined IBM Sept 10th, 2001


 IBM Master Inventor and Senior Technical Staff Member (STSM)
 Lead Architect IBM Java on System z
Corporate-wide responsibility for developing IBMs JDK on System z
 World-wide technical leader for compilers development for System z
Hardware/Software interface

Work closely with:


 JVM (Ottawa Lab), C/C++, COBOL (Silicon Valley Lab), XML compiler teams
 Hardware designers (Power, System z, Intel, AMD)
 O/S developers (AIX, z/OS, Linux)
 Middleware developers (Websphere, DB2, Tivoli)
 Research (Tokyo Research Lab, Watson Research Lab)
2 2013 IBM Corporation
Did you say Mainframe!?!?
System z run applications that run my life
- Used by 95% of the Fortune 500 Companies
- 80% of corporate data resides or originates on mainframes
- Runs everything from your class registration to airplane
reservations
- 2/3 of business transactions for US retail banks run directly
on mainframes
System z is secure
- Highest level of industry security rating, EAL5, awarded to
the mainframe
System z is dependable and available
- Less than 5 minutes down time per year
- Mean time to failure is 40 years
System z is virtualized
- Create a new Linux image in as little as 90 seconds
System z is expandable, adaptable and flexible
- Add capacity and software updates without a reboot
- Respond automatically to spikes in workload demands
- Align processing priorities with business priorities

The return of the mainframe: Back in fashion


Jan 14th, 2010

Toni Sacconaghi of Bernstein Research estimates that 40% of IBMs profits are mainframe-related.

3 2013 IBM Corporation


Classical View of the Hardware/Software Interface

 Classical view of the h/w + s/w interface


(at least with my 1st Edition copy of the text)
Data representation
Instruction architecture
Pipelining
Memory hierarchy

 Little to no reference to
Multi-core/SMT
Dynamic compilation/managed runtimes
I/O attached accelerators
domain-specific devices/languages
distributed computing
parallel programming models
etc

 Up-until ~5 years ago, largely ok..

4 2013 IBM Corporation


zEnterprise EC12 Hardware Available since Sept 2012

Continued aggressive investment in H/W + S/W System Specs


co-design 120-way machine
3TB of real storage
Hardware Transaction Memory (HTM) IBM zAware autonomic monitoring
Better concurrency for multi-threaded/parallel applications Common Criteria Evaluation
Fine-grained concurrency Assurance Level 5+ (un-matched)
IBM DB2 Analytics Accelerator
Run-time Instrumentation (RI)
Real-time feedback on dynamic program characteristics
Enables increased optimization by Java

2GB page frames


Improved performance targeting 64-bit heaps

Page-able 1MB large pages using Flash Express


Better versatility of managing memory
Shared-Memory-Communication
RDMA over Converged Ethernet
zEnterprise Data Compression accelerator
FPGA-based acceleration of gzip

Misc new instructions


Software hints directives
Traps
Decimal conversion

5 2013 IBM Corporation


zEnterprise EC12 Hardware Available since Sept 2012

Continued aggressive investment in H/W + S/W System Specs


co-design 120-way machine
3TB of real storage
Hardware Transaction Memory (HTM) IBM zAware autonomic monitoring
Better concurrency for multi-threaded/parallel applications Common Criteria Evaluation
Fine-grained concurrency Assurance Level 5+ (un-matched)
IBM DB2 Analytics Accelerator
Run-time Instrumentation (RI)
Real-time feedback on dynamic program characteristics
Enables increased optimization by Java

2GB page frames


Improved performance targeting 64-bit heaps

Page-able 1MB large pages using Flash Express


Better versatility of managing memory
Shared-Memory-Communication
RDMA over Converged Ethernet
zEnterprise Data Compression accelerator
FPGA-based acceleration of gzip

Misc new instructions


Software hints directives
Traps
Decimal conversion

6 2013 IBM Corporation


zEnterprise EC12 Hardware Available since Sept 2012

Continued aggressive investment in H/W + S/W System Specs


co-design 120-way machine
3TB of real storage
Hardware Transaction Memory (HTM) IBM zAware autonomic monitoring
Better concurrency for multi-threaded/parallel applications Common Criteria Evaluation
Fine-grained concurrency Assurance Level 5+ (un-matched)
IBM DB2 Analytics Accelerator
Run-time Instrumentation (RI)
Real-time feedback on dynamic program characteristics
Enables increased optimization by Java

2GB page frames


Improved performance targeting 64-bit heaps

Page-able 1MB large pages using Flash Express


Better versatility of managing memory
Shared-Memory-Communication
RDMA over Converged Ethernet
zEnterprise Data Compression accelerator
FPGA-based acceleration of gzip

Misc new instructions


Software hints directives
Traps
Decimal conversion

7 2013 IBM Corporation


Inflection Point in Computing?
Topics discussed in a typical week at work
 The end of single-thread performance
 The evolution of HPC into analytics and
optimization in the enterprise
 BigData
 The Cloud
 Systems of engagement, scripting

XCAT

Scripting Ecosystem

8 2013 IBM Corporation


Evolution of the Enterprise Computing Ecosystem
OLTP
Industry has spent the last decade focusing on OnLine Transaction Processing (OLTP)
Enabling/optimizing data persistency and serving
Internet of Things
Trillions of transactions/day
Massive amounts of data (structure/unstructured)
Scoring

WebSphere
CICS/IMS
DB2

9 2013 IBM Corporation


Evolution of the Enterprise Computing Ecosystem
BAO
Business Intelligence, Analytics and Optimization
A clear need to understanding how to interpret/optimize/predict the data
eg. fraud detection, customer relations management, low-latency trading etc

Scoring Rules
SPSS ILOG
WebSphere
SPSS CICS/IMS
CPLEX DB2

InfoSphere
Infoserver
Cognos
InfoSphere
Warehouse

10 2013 IBM Corporation


Evolution of the Enterprise Computing Ecosystem
Cloud
Economies of scale for computational infrastructure through 3rd party hosting
Driving down cost
Disruptive changes to IT industry

Scoring Rules
SPSS ILOG
WebSphere
SPSS CICS/IMS
CPLEX DB2

InfoSphere
Infoserver
Cognos
InfoSphere
Warehouse

11 2013 IBM Corporation


12

Significant presence for traditional enterprise languages

Java leads developer language choice


How do you allocate the time you spend writing code across the following programming languages? (Enter a percentage for each)

=> ~37%

Base: North American and European enterprise software developers; Source: Forrsights Developer Survey, Q1 2013

12 2013 IBM Corporation


13

Scripting languages gaining significant momentum

Java leads developer language choice


How do you allocate the time you spend writing code across the following programming languages? (Enter a percentage for each)

=> ~37%
=> ~26%

Base: North American and European enterprise software developers; Source: Forrsights Developer Survey, Q1 2013

13 2013 IBM Corporation


Evolution of the Enterprise Computing Ecosystem
Scripting
Cloud and DevOps represent a new and fast evolving tool-chain
Client-side scritping languages migrating to the servers
Somewhat foreign to the traditional enterprise space
Moving very quickly

Scoring Rules
SPSS ILOG
WebSphere
SPSS CICS/IMS
CPLEX DB2

InfoSphere
Scripting Ecosystem
Infoserver
Cognos
InfoSphere
Warehouse

14 2013 IBM Corporation


(Perceived) Single Thread Performance/Latency Matters

 SLA -> Service Level Agreements


Vendor guarantees response-time of transactions
Increasingly intelligent transactions

 Batch windows
Fixed elapsed window to complete
eg. Balance books overnight before opening for business
next day

 Plethora of S/W idioms that do not fall easily into


divide-and-conquer
Finite-state-machine
Real-time analytics
Queuing/dispatching
Enterprise middleware

 Practical challenges of coarse-grain parallelism


Even very coarse parallelism can be non-trivial to
implement

 Fine-grained parallelism hold your breath?

15 2013 IBM Corporation


Performance Innovation is no-Longer just a Processor Game

A range of aggressive hardware and systems designs IBM DB2 Analytics Accelerator
Fit-for-purpose/Hybrid systems
Appliances (eg. Netezza)
GPUs/FGPAs
Co-processors (eg. crypto)
Transactional memory

Winners/losers will be defined by a few key criteria


Time-to-market and prevalence of core infrastructure
Stack opportunity is typically unclear + long lead-
time/high cost => risk
Ease of adoption/integration
1. Runtime and middleware (easiest)
2. Compiler optimization
3. New programming models/libraries
4. Hand written assembler (hardest) IBM DB2 Analytics Accelerator for z/OS is
a high-performance appliance that integrates
IBM Netezza and zEnterprise technologies.
The solution delivers extremely fast results
for complex and data-intensive DB2 queries
on data warehousing, business intelligence
and analytic workloads.

16 2013 IBM Corporation


Shared Memory Communications (SMC-R):
Exploit RDMA over Converged Ethernet (RoCE) with qualities of service support for dynamic
failover to redundant hardware

S MC-R vs TCP /IP (OS A)


W AS Libe rty< -> DB2 W orkloa d System z SMC-R enabled platform

Middleware/Application Middleware/Application
10.00%

5.00% 3.11% Sockets Sockets

0.00%
Latency CPU Consumption per tran TCP TCP
-5.00%
IP SMC-R IP SMC-R
-10.00%
P ercent Change

Interface Interface

-15.00% OSA RNIC ETH RNIC


SMC-R vs TCP/IP (OSA) (40 Client Data Flows using RDMA
-20.00% Connections) over RoCE
RDMA Network
-25.00% RoCE (CEE)

-30.00% IP Network
TCP connection establishment over IP
(Ethernet)
Dynamic negotiation for SMC -R
-35.00%

-40.00%
-40.00%
-45.00%

 Transparent exploitation for TCP sockets based applications


 Compatible with existing TCP/IP based load balancing solutions
 Up-to 40% reduction in end-to-end transaction latency
 Slight increase in CPU is due to very small message size in this workload (~100bytes). Workloads with larger
payloads are expected to show a CPU savings

17 (Controlled measurement environment, results may vary)


2013 IBM Corporation
zEnterprise Data Compression (zEDC)

 Exploited transparently through strandard Java APIs (eg. java/util/zip)


 Up-to 10x improvement in CPU time compressing data compared to L1 zlib
 Compression ratio of ~4x
(Controlled measurement environment, results may vary)
18 2013 IBM Corporation
GPU Acceleration on Standard Java Arrays

48x faster !

2x faster

Chart in Logarithmic scale

19 2013 IBM Corporation


A New Age for Compilers and Managed Runtimes

Compilers and managed runtimes will


need to evolve

Today:
Optimize once for single static architecture
Synergy with micro-architecture is open-loop
Parallelism is in its infancy
Some dynamic in Java (Just-in-time)
z/OS Multi-Threaded 64 bit Java Workload 16-Way
~12x Improvement in Hardware and Software

160
zEC12 SDK 7 SR3

Tomorrow: 140
Aggressive +
LP Code Cache
zEC12 SDK 7 SR1

Normalized Throughput
120
Deeper synergy with micro-architecture 100 z196 SDK 7 SR1

Providing separation of interests to hedge risk 80 z196 SDK 6 SR8

from large changes to h/w 60


z10 SDK 6 SR4
40
Automatic parallelism (thread, data, etc) 20 z10 SDK 6 GM
NO (CR or Heap LP)

Hybrid systems, accelerators, fit-for-purpose 0


1 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
z9 Java 5 SR5
NO (CR or Heap LP)

Adaptive, dynamic and specialized Threads

JITing for light weight scripting environments

20 2013 IBM Corporation


zEC12 Runtime-Instrumentation: H/W for Managed Runtimes
 Highly configurable trace-sampling mechanism
Events: Data/instruction cache miss information, register values
Paths: Last N taken branches
Correlated value, event and path profiling

 Integrating into IBM JVM


Java Runtime Environment is designed to be adaptive and self-tuning
RI enhances JRE decision-making by providing real-time feedback

Map binary addresses of


JITCODE to bytecode Binary Bytecode Count
Compilation thread address index

Java code 0x0020 1 290


Generate
0x0080 20 3000 Method A
JITCODE

RIOFF RION
// C code
Method B
J9 Runtime Call void processBuffer() {

(e.g., interpreter)
}
Execute Update

Application thread This buffer is filed every


20,000 cycle by CPU
RI buffer

Send an asynchronous event every 2 ms

Sampling thread
21 2013 IBM Corporation
RI Example: Path Splitting/Specialization

B1 B1
compiler

B2 B3 B2 B3
x=i x=4 x=i
x=4

B4 B4
B4

B5
j = y >> 2

B5 B6
j = y/x B7 B5
j = y/x
B7
Path can be specialized
22 2013 IBM Corporation
Binary Optimization
Technology that enables re-optimization of legacy COBOL binaries
on the latest System z without requiring source-level re-compilation.
legacy COBOL
binary with ESA390
Binary Optimization technology being built by IBM Research (Tokyo) instructions
Solution for the Dusty Deck Problem: Can re-optimize the existing
large body of legacy code that the customers are unable and/or
unwilling to recompile
Built on top of IBM Testarossa Optimizer
Value
Binary
Upgrade the binary from really old ISA to the exact ISA of the machine
Optimizer
the code is running on
Inject the latest COBOL optimization technology into legacy programs
Ability to start utilizing profiling and RI-based feedback into to optimize
the program
Experimental Results* optimized COBOL
up-to 4.62x and average 1.89x over the original binary on z196 binary with zEC12
up-to 3.31x and average 1.94x over the original binary on zEC12 instructions
Alpha-level prototype available on developerWorks

* Measured using twelve benchmarks in the internal testsuite, used by the COBOL compiler dev. team.
233 2013 IBM Corporation
2013 IBM Corporation
Concurrency and Parallelism in the Enterprise

Traditionally a play in niche spaces (eg. HPC)


With industry focus on business intelligence, analytics and
optimization, stakes have reached a new high

 Programming models for parallelism becoming mainstream


Eg. java/util/concurrent, fork/join and Lambdas

 Traditional Programming models evolving to meet the needs


of enterprise computing
Eg. OpenMP adds tasks

 Hardware transactional memory its here!


 Auto-parallel remains elusive!
24 2013 IBM Corporation
Transactional Execution: Concurrent Linked Queue
 ~2x improved scalability of juc.ConcurrentLinkedQueue
first
node
 Unbound Thread-Safe LinkedQueue De-queue
First-in-first-out (FIFO)
Insert elements into tail (en-queue)
Poll elements from head (de-queue) head
No explicit locking required

 Example usage: a multi-threaded work queue node


Tasks are inserted into a concurrent linked queue as multiple worker threads
poll work from it concurrently

node

node

tail

En-queue
last
node

New TX-base Traditional CAS-base


25 implementation implementation 2013 IBM Corporation
(Controlled measurement environment, results may vary)
HTM Example: Transactional Lock Elision (TLE)

Lock elision allows readers to


Threads must serialize despite only
execute in parallel, and safely back-
reading just in-case a writer updates
out should a writer update hash
the hash

read_hash(key) { read_hash(key)

Wait_for_lock(); TRANSACTION_BEGIN

read(hash, key); read hash.lock;

Release_lock(); BRNE serialize_on_hash_lock

} read (hash, key);


TRANSACTION_END

Transaction Lock Elision on HashTable.get() Thr1: read_hash() Thr3: read_hash()


Thr1: read_hash()
Java Prototype

T
T
Thr2: read_hash()
T h ro u g h p u t (o p s/sec)

Thr3:read_hash()

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Threads

26 2013 IBM Corporation


Java8: Language Innovation Lambdas and Parallelism

New syntax to allow concise code snippets and expression


Useful for sending code to java.lang.concurrent
On the path to enabling more parallelisms

http://www.dzone.com/links/presentation_languagelibraryvm_coevolution_in_jav.html

27 2013 IBM Corporation


So what about auto-parallelism?

A MOUTHFUL to chew on:


 Will the convergence of analytics/optimization and enterprise in the context of the end of
the single-thread-performance-roadmap on the cloud provide enough momentum/focus to
see some real-world (eg. Java) breakthroughs in auto-parallel technology?

28 2013 IBM Corporation


Concluding Remarks
Lots to be excited about
Significant acceleration in innovation in the hardware/software interface
Its never been a better time to be a runtime/compiler developer

Scoring Rules
SPSS ILOG
WebSphere
SPSS CICS/IMS
CPLEX DB2

InfoSphere
Scripting Ecosystem
Infoserver
Cognos
InfoSphere
Warehouse

29 2013 IBM Corporation


Copyright IBM Corporation 2013. All rights reserved. The information contained in these materials is provided for informational purposes only, and is provided AS IS without warranty of any kind,
express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, these materials. Nothing contained in these materials is intended to, nor shall have
the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM
software. References in these materials to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities
referenced in these materials may change at any time at IBMs sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature
availability in any way. IBM, the IBM logo, Rational, the Rational logo, Telelogic, the Telelogic logo, and other IBM products and services are trademarks of the International Business Machines
Corporation, in the United States, other countries or both. Other company, product, or service names may be trademarks or service marks of others.

30 2013 IBM Corporation


31

Creating an Open Community to drive full stack innovation for the cloud: OpenPOWER

OpenPOWER Consortium IBM POWER server technology Systems Management


creates an open community to drive innovation for the Cloud Firmware, Hypervisors
and Operating Systems

Systems
 Industrys first open system design for cloud data
centers Processors

Semiconductor
 Custom development group for hyperscale Technology

servers including hardware designs, firmware and


software.
OpenPOWER Consortium
 Addresses need for industry-based innovation
across processors, network and storage I/O
 OpenPower will create an ecosystem for Power
Systems
IBM will contribute OpenSource software
IBM will enable industry participation through open
documentation
IBM will license chip design intellectual property (IP) to
allow customization

31 2013 IBM Corporation


Co-Location by GC

Java Heap Java Heap

A A
GC
D D
C

hot
B C B

In cache/tlb
Java objects
Out of cache/tlb
32 2013 IBM Corporation

Potrebbero piacerti anche