Sei sulla pagina 1di 32

ANSYS HPC

High Performance
Computing Leadership

Barbara Hutchings

barbara.hutchings@ansys.com
1

2011 ANSYS, Inc.

8/29/11

Why HPC for ANSYS?

Insight you cant get any other way

HPC enables high-fidelity


Solve the un-solvable
Be sure your design is
right
Innovate with confidence

HPC delivers throughput


Consider multiple design

ideas
Optimize your design
Ensure performance across
range of conditions

2011 ANSYS, Inc.

8/29/11

HPC has become a software


issue.

Clock Speed
Leveling off
Core Counts
Growing

Exploding
(GPUs)

Future
performance
depends on highly
scalable parallel
software
ANSYS
goal: lead
Source: http://www.lanl.gov/news/index.php/fuseaction/1663.article/d/20085/id/13277
2011 ANSYS, Inc.
8/29/11
the
way into
this

HPC Deployment - Trends /


Challenges
Local computing infrastructure for
simulation is being replaced by
centralized HPC resources, shared by
a globally distributed workforce.

Remote access, scheduling, and

visualization tools
Data sharing, IP protection, central data
archiving

Mega Simulations for high-fidelity


understanding
Design Exploration for improved product
integrity

(Intermittent) need for 1000s of

processors, terabytes of RAM


Storage / data management

2011 ANSYS, Inc.

8/29/11

Agenda
Performance / Software
Milestones

Scalability, GPUs

Current HPC Practice

Customer case studies

Deployment Trends

2011 ANSYS, Inc.

ANSYS vision

8/29/11

RATING

ANSYS FLUENT Scaling


Achievement
2008 Hardware
(Intel Harpertown, DDR IB)
12

12

10

10

2010 Hardware
(Intel Westmere, QDR IB)

6
RATING

2
0

0
0

Number of Cores

10

12

256

512

768

1024

1280

1536

Number of Cores

Systems keep improving: faster processors, more cores


Ideal rating (speed) doubled in two years!

Memory bandwidth per core and network latency/BW stress


scalability
release 8/29/11
(12.0) re-architected MPI huge scaling improvement, for a
2008
2011 ANSYS, Inc.
while

Extreme CFD Scaling - 1000s


of cores

Enabled by ongoing software innovation

Hybrid parallel: fast shared memory


communication (OpenMP) within a machine to
speed up overall solver performance;
distributed memory (MPI) between machines
2011 ANSYS, Inc.

8/29/11

Parallel Scaling ANSYS


Mechanical

Sparse Solver (Parallel Re-Ordering)

300

Focus on
bottlenecks in

Solution Rating

250

the distributed
memory

200
150

solvers (DANSYS)

100
50

0
0

64

128

Number of cores

192

256

PCG Solver (Pre-Conditioner Scaling)

4000

Solution Rating

3500
3000
2500
2000

1500
1000
500
0
0

2011 ANSYS, Inc.

16
8/29/11

32 cores
Number of

48

64

Sparse Solver

Parallelized
equation
ordering

40% faster
w/ updated
Intel MKL
Preconditione
d Conjugate
Gradient
(PCG) Solver

Parallelized
preconditio

Architecture-Aware
Partitioning

Original partitions are


remapped to the cluster
considering the network
topology and latencies
Minimizes inter-machine
traffic reducing load on
network switches
Improves performance,
particularly on slow
interconnects and/or
large clusters

2011 ANSYS, Inc.

8/29/11

Partition Graph
3 machines, 8 cores each
Colors indicate machines

Original mapping

New mapping

File I/O Performance


Case file IO
Both read and write
significantly faster in R13
A combination of serial-IO
optimizations as well as
parallel-IO techniques, where
available
Parallel-IO (.pdat)
Significant speedup of parallel
IO, particularly for cases with
large number of zones
Support for Lustre, EMC/MPFS,
AIX/GPFS file systems added
Data file IO (.dat)
Performance in R12 was highly
optimized. Further incremental
10 improvements
2011 ANSYS, Inc.
8/29/11 done in R13

truck_14m, case read

Parallel Data
write

R12 vs. R13

BMW

-68%

FL5L2 4M

-63%

Circuit

-97%

Truck 14M

-64%

GPU Computing!
CPUs and GPUs work in a collaborative
GP
fashionCPU
U

PCI Express
channel

Multi-core processors

Typically 4-6 cores

Powerful, general
purpose

11

2011 ANSYS, Inc.

8/29/11

Many-core processors

Typically hundreds of cores

Great for highly parallel code,


within memory constraints

ANSYS Mechanical SMP GPU


Speedup
From
Solver
Kernel
Speedu
ps

Overall
Speedup
s

12

2011 ANSYS, Inc.

8/29/11

NAFEMS
World
Congress
May 2011
Boston, MA,
USA

Accelerate FEA
Simulations with
a GPU
by Jeff
Beisheim,
ANSYS
-

Tesla C2050 and Intel


Xeon 5560

New: GPU Acceleration for


DANSYS

R14 Distributed ANSYS Total Simulation Speedups for R13 Ben

Windows workstation : Two Intel Xeon 5560


processors (2.8 GHz, 8 cores total), 32 GB RAM,
NVIDIA Tesla C2070, Windows 7, TCC driver
mode

13

2011 ANSYS, Inc.

8/29/11

ANSYS Mechanical MultiNode GPU

Solder Joint Benchmark (4 MDOF, Creep


Strain Analysis)

Linux cluster : Each


node contains 12
Intel Xeon 5600series cores, 96 GB
RAM, NVIDIA Tesla
M2070, InfiniBand

R14 Distributed ANSYS w/wo GPU

Total Speedup

10
8
6
4
2
0

Solder
balls

Mold
PCB

14

2011 ANSYS, Inc.

8/29/11

Results Courtesy of MicroConsult Engineering, GmbH

GPU Acceleration for CFD


Radiation
viewfactor
calculation
(ANSYS
FLUENT 14)

First capability for physics outside core linear solver, with


modest memory requirement: view factors, ray tracing,
reaction rates, etc.
2011 ANSYS,
Inc. on
8/29/11
R&D
focus
linear solvers, smoothers but potential

15

Agenda
Performance / Software
Milestones

Scalability, GPUs

Current HPC Practice

Customer case studies

Deployment Trends

16

2011 ANSYS, Inc.

ANSYS vision

8/29/11

HPC for Turbocharger Design


8M to 12M element models (ANSYS

17

CFX)
Previous practice (8 nodes HPC)
Full stage compressor runs 36-48
hours
Turbine simulations up to 72 hours
Current practice (160 nodes)
32 nodes per simulation
Full stage compressor 4 hours
Turbine simulations 5-6 hours
Simultaneous consideration of 5
ideas
Ability to address design
uncertainty clearance tolerance

http://www.ansys.com/About+ANSYS/ANSYS+Advantage+Magazine/Current+Issue
ANSYS HPC technology
is
2011 ANSYS, Inc.
8/29/11
enabling
Cummins
to use

HPC for High Fidelity at


EuroCFD

Model sizes up to 100M dells (ANSYS

FLUENT)
2011 cluster of 700 cores
(expansions pending)
25 Millions
64-256 cores per simulation
(4 Days)
10 Millions
(5 Days)

Transient
Optimisation / DOE
Dynamic Mesh

LES
Combustion
Aeroacoustic
Fluid Structure Interaction

Supersonic
Multiphase
Radiation

05
0
2

Increase of :
Compressibility
Conduction/Convection

18

2011

2009

7
200

3 Millions of Cells
(6 Days)

50 Millions
(2 Days)

2011 ANSYS, Inc.

8/29/11

Spatial-temporal Accuracy

Complexity of Physicals Phenomenon

HPC for High Fidelity at


Microconsult GmbH

Solder joint failure analysis

Thermal stress 7.8 MDOF


Creep strain 5.5 MDOF

Simulation time reduced from 2 weeks to


1 day

From 8 26 cores (past) to 128 cores


(present)

HPC is an important competitive


advantage for companies looking to
optimize the performance of their
products and reduce time to market.

19

2011 ANSYS, Inc.

8/29/11

HPC on the Desktop

Cognity

Limited
steerable
conductors
for oil
recovery

http://www.ansys.com/About+ANSYS/ANSYS+Advantage+Magazine/Current+Issue

20

2011 ANSYS, Inc.

8/29/11

Desktop HPC at NVIDIA


Case study on the value of HW refresh and
software best-practice
Deflection and bending of 3D glasses

ANSYS Mechanical 1M DOF models

Optimization of:

Solver selection (direct vs iterative)

Machine memory (in core execution)

Multicore (8-way) parallel with GPU acceleration

Before/After:
77x speedup from 60 hours per simulation
to 47 minutes.
21

Most importantly: HPC tuning added scope for


design
exploration and optimization.
2011 ANSYS,
Inc.
8/29/11

Agenda
Performance / Software
Milestones

Scalability, GPUs

Current HPC Practice

Customer case studies

Deployment Trends

22

2011 ANSYS, Inc.

ANSYS vision

8/29/11

Remote Access to Simulation /


Cloud
Cloud computing puts renewed focus on an
ongoing challenge:
How can we optimize remote access to HPC
for ANSYS users?

Critical emerging issue for

Public cloud (outsourcing of HPC)


Private cloud (internal remote access /
elastic provisioning)

23

2011 ANSYS, Inc.

8/29/11

Levels of Remote Access


Level 0 : Local Computing

Pre / Solve / Post on local desktop system

Files stored locally under individual control

Inherent capacity limitations; also limits


collaboration and data management

Level 1: Remote Batch

Batch solve conducted on remote HPC


resource
Bottlenecks related to file transfer and
limitation s of local hardware (e.g., cant
postprocess large files)

Remote access and job management also


challenging.
2011 ANSYS, Inc.
8/29/11

24

Pre / Post on local desktop system

Level 2: Remote Interactive


Workflow
Mobile
User doing
Full Remote
Simulation

Web Browser
Access

Schedule / provision
HW resources

Launch application

Launch remote
visualization tool for
interactive use

Access data (EKM)

Monitor job progress

Remote HPC Resource


Pre / Solve / Post / EKM
Remote 3D
Visualization
Server

Data
Storage

Remote 3D
Visualization

Full simulation process conducted via remote access


(Pre/Solve/Post)

Files reside at the HPC resource for efficiency, enhanced


data management, collaboration

Feasible and implemented today by best-in-class

25

2011 ANSYS, Inc.

8/29/11

Simulation Data Management


Access results over the full product lifecycle. Re-use of
previous simulations . Long-term archival storage.
Executes simulations
and extracts data using
a batch system such as
RSM, LSF, SGE

Web Browser
(IE, Firefox, etc.)

Application Server
(Jboss , etc.)
Compute Cluster
Firewall

ANSYS EKM
Content Management
Repository (Jackrabbit, etc.)

Desktop Application (ANSYS


Workbench, VB, etc.)

26

2011 ANSYS, Inc.

8/29/11

Stores
metadata

File Server
(http, ftp)

Relational Database
(Mysql, Oracle, DB2, etc.)
Repository of all archived
files and applications

Interface to PLM / PDM


ANSYS EKM

Windchill

Functional Spec
Geometry, Attributes

Smarteam

Simulation Report

MatrixOne

EKM Datalink

Teamcenter

Pull functional requirements or geometry from PLM to ANSYS


EKM
Check update status from EKM
Return simulation results to PLM
Associated to the master model
27

2011 ANSYS, Inc.

8/29/11

Public Cloud
Cloud computing could provide cost-effective access to
scaled-out elastic infrastructure

Scale up - extreme problem size, data storage and backup


Surge capacity, intermittent workloads or users
Cost effective balance of CAPEX vs. OPEX
But challenges remain for

IP protection, export compliance and data sharing


Optimized use of on-premise vs. remote systems
Remote workflow (visualization, file transfer)
ANSYS focus:

Improved remote simulation workflow (private cloud)


Enable seamless use of 3P hosted cloud

28

Bring Your Own License model

2011 ANSYS, Inc.

8/29/11

Agenda
Performance / Software
Milestones

Scalability, GPUs

Current HPC Practice

Customer case studies

Deployment Trends

ANSYS vision

Summary / Next steps


29

2011 ANSYS, Inc.

8/29/11

Summary / Take Home Points


High-Performance Computing can add great
value to your use of ANSYS
What could you learn from a 10M (100M) cell / DOF
model?

ANSYS continues to focus on software


development for HPC
Mission Critical - in order to maintain scaling as the
hardware revolution continues

We believe that this focus will enable ANSYS to


maximize your overall return on investment.
Optimizing IT deployment is a critical challenge
Getting started or scaling out
Desktop / remote access, data management and
archiving,
job submission and resource management,
2011 ANSYS, Inc.
8/29/11

30

Next Steps
ANSYS is committed to understanding your IT
environment and deployment challenges
ANSYS (and our partners) can provide solutions today
and you can help drive our product strategy.
Questions/Comments: hpcinfo@ansys.com

Thank You

31

2011 ANSYS, Inc.

8/29/11

32

2011 ANSYS, Inc.

8/29/11

Potrebbero piacerti anche