FS16 Saratech 04 PerformanceTuning

FEMAP/NX NASTRAN PERFORMANCE TUNING
Chris Teague - Saratech
(949) 481-3267 | www.saratechinc.com

NX Nastran Hardware Performance History
Running Nastran in 1984:
Cray Y-MP, 32 Bits! (X-MP was only 24 Bits)
Four Vector Processors (167 Mhz)
256 MB of RAM (Note the MB, PC was 256K)
333 Mflops per processor
$3-$4 Million, plus special room
Comparison in 2016:
64 Bits
Dual Core (1.85 Ghz)
2GB (2048 MB) of RAM
340 Mflops (Single Thread)/
613 Mflops (Multi Thread)
iPhone 6s, $689
NX Nastran currently not ported to iOS
Saratech proprietary and confidential Slide Number: 2

NX Nastran Hardware Performance History
Rack Server in 2016:
Dell R930, 64 Bits
Max 4 Processors at a Max 18 Cores Each
(72 Cores Total), or up to 3.2 Ghz
1.5 TB of RAM
1.8 Gflops per processor
High Speed PCIe based SSD disk drive
(2.8 GBs Read/2.2 GBs Write speed)
$85K with 6.5TB PCIe SSD, 1.5TB RAM,
4x3.2Ghz 4C Xeon processors
Blade Array System
Up to 30 Blades, each configured like a single server
So how much faster do our Nastran jobs run with this
huge increase in computing Power?
NX Nastran Performance Tuning Tips
What is LP-64 vs ILP-64?

Hardware and OS Selection
NX Nastran Scratch Drive
I/O Performance, OS Settings
Buffer Size
Hyperthreading
Element Iterative Solver
SMP vs DMP
GPGPU
NX NASTRAN LP-64 vs ILP-64
There are two 64 bit versions of NX Nastran:
LP-64
Standard version when running through FEMAP
4-Byte Words
8 GB RAM limit
ILP-64
Optional version when running through FEMAP
8-Byte Words
20 TB RAM limit, which is really the hardware RAM limit
of the machine you are running on
When running NX Nastran on the
command line, the L executables
are ILP. w executables will bring
up a file browser.
In some cases, ILP-64 may offer
improved accuracy

NX NASTRAN LP-64 vs ILP-64
In general, the standard LP-64 version of NX Nastran is faster for models

that do not need more than 8GB of RAM allocated to the Solver
For larger models that need more than 8GB of RAM for the Solver, you
will need to use the ILP-64 version and have available RAM.
For performance reasons, you dont want to allocate more than about
50% of RAM to NX Nastran. The other RAM is needed for the OS and I/O
Caching, which is a huge help to NX Nastran performance.
This means that if you need to use the ILP-64
version of NX Nastran, you will want at least
16 GB of RAM. Larger models may require
more. Sometimes LOTS more!

Processors
Faster processers are good (Faster I/O Speed is
just as important, if not more though)
Large L2 or L3 processer cache can improve
performance (Xeon can help here)
Multi-Core is good, but dont get more cores over
less cores with faster clock speed (Usually)
Intel Xeon E7-8880 v3, 2.3 Ghz, 45M, 18 Core
Intel Xeon E7-8893 v3, 3.2 Ghz, 45M, 4 Core
Memory
As much as budget allows, and the fastest available
Saratech ran a large job with mem=24 GB on a
system with 64 GB of RAM. Nastran used up all
available RAM for 2-3 hours, the extra being used
for I/O Caching. See Task Manager graph:

Disk
SATA based SSD are significantly faster than
mechanical drives
PCIe based SSD devices are even faster still,
and are available and laptop, desktop, and
server models.
Example: SanDisk SX350-3200, 3.2 TB, 2.8 GB/s
Read, 2.2 GB/s write speed (Servers)
Intel 750 Series, 1.2 TB, 2.5 GB/s Read,
1.2 GB/S write speed (Workstations)
Operating System Intel 750 Series PCIe SSD
Generally Linux is faster that Windows on the
same hardware due the superior I/O on Linux
Because of this, most HPC cluster systems run Linux
Windows is more popular on the desktop due to the wide variety of applications that
run on Windows.
Priorities for getting the most performance for the least money:
Maximum number of *fast* cores with large cache
Add as much RAM as possible, and go for the fastest RAM allowed
Maximize I/O bandwidth and disk speed
Add GPU processing for some large dynamics problems (More on this later)
I always recommend at least two disks, and 3 if possible:
Disk 1: Fast drive for OS & Applications
Disk 2: Very fast drive for NASTRAN & FEMAP scratch space (Keep empty when not
running NX Nastran & FEMAP)
Disk 3: Large drive for data storage
NASTRAN does so much disk I/O, it is better to have its own drive for scratch files,
and make sure it is as fast as possible, SSD PCIe, or even a RAID of SSD. We dont
want to let the OS/Application data needs slow down our NASTRAN job.

Nastran scratch folder should point to a

fast disk, or a RAID array (RAID0)
Local disk drives are preferred
Using network mounted NFS or SMB
(Windows Shared Drive) connection is
generally going to have significant
performance penalties
Even laptops can have two drives, try
mSATA cards, or
even PCIe in
newer laptops SanDisk Fusion ioMemory SX350-3200
Samsung 850 EVO M.2 SSD

You can set the NX Nastran scratch drive in the rc file

The nastran rc file for FEMAP can be found in FEMAPv113/nastran/conf,
where 113 is the version of FEMAP that you have installed
Sample from my laptop:
Auth=28000@LocalHost
Sdir=e:\scratch
program=FEMAP
scr=yes
buffsize=32769
memory=.45*physical
smem=20.0X Samsung 850 EVO M.2 SSD
The E drive is a 512GB SSD mSATA card

NX Nastran & FEMAP Scratch Drive in FEMAP Preferences
FEMAP scratch drive NX Nastran scratch drive

OS Settings: I/O Cache
Reading from and writing to disk drives are

much slower than RAM, even with SSD
Data that is typically written is probably read
back soon
Keeping information in memory instead of
disk will reduce disk seek times
Make use of unallocated memory for disk
buffer I/O Cache

OS Settings: Enabling Disk I/O Cache
Read cache is enabled by default on

Linux and Windows
Enable write cache on Linux using
hdparm command or equivalent
On Windows, use Device Manager
property settings to enable write-cache
on the Nastran scratch drive in the
Policies tab

Buffer Size
The NX Nastran buffer size is the size of each I/O unit

The default size in NX Nastran 9 is 8193 and works well for small models
(<100K DOF)
For larger models (>400K DOF), increasing the default buffer size to
32769 may help. This is the default in NX Nastran 10
This can be done by editing the nastran rc file and editing the line to be:
Buffsize=32769
The nastran rc file for FEMAP can be found in FEMAPv113/nastran/conf,
where 113 is the version of FEMAP that you have installed

NX Nastran Settings: Memory
Starting with NX Nastran 10, the new default memory settings in the rcf
file are:
Memory=0.45*physical (45% of total RAM installed in the workstation)
Smem=20.0X (20% of Memory in line above)
Buffpool=20.0X (Same as Smem)
These settings are more appropriate for large models and machines with
more RAM
Inspect the F04 file to see if you have optimum settings for your model
Note: Unless SMEM is large enough to contain all scratch files, it is
better to set it to zero

NX NASTRAN MEMORY LAYOUT

NX NASTRAN MEMORY
The f04 file will give a summary of the memory that was allocated. The
allocations will be the areas shown on the previous slide.
Here is an example from TET10 model around 650,000 elements:
** PHYSICAL FILES LARGER THAN 2GB ARE SUPPORTED ON THIS PLATFORM
0 ** MASTER DIRECTORIES ARE LOADED IN MEMORY.
USER OPENCORE (HICORE) = 433308364 WORDS
EXECUTIVE SYSTEM WORK AREA = 316925 WORDS
MASTER(RAM) = 70822 WORDS
SCRATCH(MEM) AREA = 3276900 WORDS ( 100 BUFFERS)
BUFFER POOL AREA (GINO/EXEC) = 1671729 WORDS ( 51 BUFFERS)
TOTAL NX NASTRAN MEMORY LIMIT = 438644740 WORDS
This model was run with mem=1673Mb

Remember, LP-64 is 4 bytes per word, and ILP-64 is 8 bytes per word

HOW MUCH MEMORY IS ENOUGH?
Look in the f04 file for USER OPENCORE:
Compare to the HIWATER usage toward the end of the f04 file:
If HIWATER is getting close to or over HICORE, then likely the job would
benefit from more memory (mem=x)

SETTING MEMORY SIZE IN FEMAP
FEMAP uses Mb units, and memory can
be set in the NASTRAN Executive and
Solution Options form. 0 is the default
which will use NASTRANs default in the
rcf file
For Windows, dont allocate more than
about 50% of the physical memory of
the machine to avoid performance issues
(swapping). Less may be better since the
other memory is used for I/O Caching by
Windows
NX Nastran 10 default of 45% is pretty
good for most cases until you get to
workstations/servers with a large amount
of RAM

HYPERTHREADING
Some modern Intel CPUs support Hyperthreading.
Hyperthreading is a like a virtual CPU, where one CPU can run two
threads.
There can be a small performance advantage on some desktop
applications, but its very small.
Nastran, like other Windows programs sees the virtual CPU as a real
CPU, since that is what Intel intended.
Since NX NASTRAN is very CPU intensive, it expects the virtual CPU
to perform like a real CPU, but it wont.
NX NASTRAN will usually perform better if you turn off
Hyperthreading. This is typically done in the BIOS.
Some Xeon processors do not have Hyperthreading for this reason

Element Iterative Solver
For models that are
mostly solid elements,
the Element Iterative
Solver can offer
significant performance
improvements. (2-3x)
It does not help shell
or bar elements, and
will be ignored in
dynamics solutions
Set this in the Solution
form

NX Nastran Linear Contact Solutions
Specify the proper search distance
Large Search distances typically
involve more active contacts for
the first few iterations

Multiple CPUs SMP vs DMP
Shared Memory Parallel (SMP) is a single

machine with multiple processors that share
common memory and a common I/O system
(disks) as shown in the figure to the right.
SMP
DMP
Distributed Memory Parallel (DMP)

is a set of multiple machines or
cluster with one or more processors
communicating over a network.
Each machine has its own memory
and its own I/O system
DMP vs. SMP
SMP Shared Memory Parallel
Common Memory Pool, Common I/O Pool
Desktop/Laptop hardware
Tapers off at 8 or so cores
No extra license needed
DMP Distributed Memory Parallel
Multiple machines with one or more processors communicating
over a network (Desktop/Cluster)
Each machine has its own memory and disk I/O
Used Message Passing Interface (MPI) which must be installed in
the OS
Highly Scalable
Extra license needed Now can be supported with a Femap license
DMP Solutions
101 Linear Statics
103 Normal Modes
105/108/111/112 Buckling, Direct/Model Frequency,
Modal Transient response
200 Design Optimization
Multiple CPUs SMP Setup in FEMAP
If you would like to use multiple CPUs to solve a
NASTRAN run, FEMAP can set that right above
the Solver Memory.
If you are running NASTRAN on your desktop
machine, it is recommended to leave one CPU
available for other applications if you want to
continue to use the machine for other work
This can also be done in the input file with:
NASTRAN PARALLEL=x
PARALLEL is a command line option also, and
can be set in the rc file if you would like to have a
default number of processors
There is no extra license needed for SMP

AMD PROFESSIONAL GRAPHICS ADVANTAGE
INNOVATION PERFORMANCE RELIABILITY

Simultaneous render & compute Application optimizations 100+ app certifications
Up to six 4K displays1 Latest API support Rock-solid drivers
Intelligent power technologies PCIe 3.0 support Three year warranty
Image courtesy of Siemens PLM Software
27 AMD Professional Graphics for NX | August 2015 27

AMD FIREPRO W-SERIES GRAPHICS PRODUCT STACK
AMD FirePro W-Series Recommended for NX/FEMAP

UHE
16GB GDDR5
W9100 275W
8GB GDDR5
W8100 220W
AMD FirePro W7100
TM
HE
8GB GDDR5
W7100 150W
Midrange
4GB GDDR5
W5100 <75W AMD FirePro W5100
TM
2GB GDDR5
W4100 LP, <50W
2D/3D
Entry
2GB DDR3
W2100 LP, 26W
AMD FirePro W4100
TM
28 AMD Professional Graphics for NX | August 2015

The Right Solution for your PLM Workflow
Simulation
NX Nastran AMD FirePro W9100
TM
AMD FirePro W8100

TM
Large Assemblies
and Rendering AMD FirePro W7100
TM
Design and
AMD FirePro W5100
TM
Validation
Drafting and
AMD FirePro W4100
TM
Modeling
Visualize, Review
AMD FirePro W2100
TM
and Mark-up
Images courtesy of Siemens PLM Software

NX NASTRAN
y High performance GPUs and OpenCL accelerate modal frequency response calculations in NX Nastran.
y This solution makes it possible to compute a large number of modes over a wide frequency range,
economically and efficiently.
y Results of the AMD FirePro OpenCL acceleration for NX Nastran Modal Frequency Response:
Up to 25x faster than serial

Up to 4x faster than the top of the line 24-core CPU run time
Ref. : Siemens 2012 NX CAE Symposium Presentation: Accelerating Modal Frequency Response in
NX Nastran with AMD GPUs by Hoffnung and Reymond
OpenCL-accelerated solution
System Configuration: Supermicro H8DGi-F Dual Opteron Motherboard 24 core Magny-Cours with AMD FirePro W8000

SCALABLE PROFESSIONAL GPU SOLUTIONS
Servers
} AMD provides a wide range of
products for a wide range of
software solutions
Desktop Workstations
Mobile Workstations &

Thin Clients

Using GPGPU with NX Nastran (OpenCL)
For modal frequency response (SOL 111) with
more than 5000 modes, and if you have a fast
GPU card, such as the AMD FirePro W9100, it
may help turning on the GPGPU acceleration in
the NASTRAN Executive and Solution Options
form
NVIDIA Tesla K40 and Intel Xeon Phi 7120D are
also supported by NX Nastran

FEMAP Performance Graphics
Performance graphics vs. regular graphics comparison

Model: 6 million nodes / elements
Action: full model display / group / full model display

Graphics Preferences - Options
Hardware Acceleration: This will
disable the hardware driver if you
are having significant graphics
problems and want to find out if
the graphics driver is the cause
Performance Graphics (11.1 and
Higher): Uses a new graphics
architecture to improve
performance of initial draw and
dynamic rotation. Needs
OpenGL 4.2 or higher.
Memory Optimization: Should be
off unless you models are very
big and swapping is occurring. If
that is the case, turning this on
can improve drawing speed. If
not, it will slow things down.
Multi-Model Memory: This will
use more memory to help make
the transition time between
switching models faster.
Auto Regenerate: This will force
a redraw after every command.
Its slower, but keeps the
graphics up to date during
modifications.
Graphics Preferences OpenGL
Enabling the Performance
Graphics option can
dramatically improve
performance on models
with a large number of:
Solids
Points
Nodes
Solid and Shell Elements
Set Max VBO MB (Memory)
to no more than 75% of
your graphics card memory
Sample is shown for a
graphics card with 2GB of
VRAM
Min VBO B is set to 1024 by
default and this should
work well with most
graphics cards
Graphics Preferences Dynamic Rotation
Include in Dynamic Rotation
options - switching off any these
options should improve
performance.
Some key options:
Element Symbols - if you
have a lot of lumped masses
and springs
Mesh Size - if you have a
large number of curves with
mesh sizes on
Labels and Undeformed -
switching these off helps
performance.
Elements as Free Edges
this has a slight delay in
starting and finishing
dynamic rotation but dynamic
rotation is much quicker. For
some models e.g. a mesh on
a sphere, there is no free
edge and you will see nothing
as the model rotates.
Graphics Card Performance Considerations
Desktop area resolution should be taken into
consideration when using Femap. Having a
very fine screen resolution can increase the
time animations need to generate and the time
individual windows need to refresh. Something
to consider for Ultra HD (4K/2160P) monitors
with resolutions of 3840x2160.
If Femap appears to be having graphics errors,
it could be the driver for your graphics card.
Update the drivers for your graphics card
often!
Drivers from the manufacturers of the
graphics card chipset tend to be more stable
then the drivers from the maker of the
graphics card. (e.g. use an ATI or nVidia
driver vs. an ASUS driver)
You should also set your graphics card
performance settings to the default settings.
In some cases, setting a card for optimum
performance for an application may cause
Femap to crash.
Database Preferences
The database memory limit is set to
20% of available system RAM by
default. When FEMAP needs more,
it will just swap to the scratch disk,
slowing things down. Increasing
this number will leave less
available RAM for other FEMAP
operations besides the database.
In some cases it may be better to
lower this number.
The Max Cached Label must be set
to an ID higher than any entity in
the model.
The Open/Save method may
improve read/write performance if
you are experiencing slow
performance. Clicking the
Read/Write Test button will
automatically run a test and
determine the best setting for your
hardware. It takes about 1.2 GB of
disk space and a few minutes of
time

FS16 Saratech 04 PerformanceTuning

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

FS16 Saratech 04 PerformanceTuning

Caricato da

Copyright:

Formati disponibili

FEMAP/NX NASTRAN PERFORMANCE TUNING

Chris Teague - Saratech

(949) 481-3267 | www.saratechinc.com

Saratech proprietary and confidential Slide Number: 2

What is LP-64 vs ILP-64?

Saratech proprietary and confidential Slide Number: 5

In general, the standard LP-64 version of NX Nastran is faster for models

Saratech proprietary and confidential Slide Number: 6

Saratech proprietary and confidential Slide Number: 7

Saratech proprietary and confidential Slide Number: 9

Nastran scratch folder should point to a

Samsung 850 EVO M.2 SSD

You can set the NX Nastran scratch drive in the rc file

The E drive is a 512GB SSD mSATA card

Saratech proprietary and confidential Slide Number: 11

Saratech proprietary and confidential Slide Number: 12

Reading from and writing to disk drives are

Saratech proprietary and confidential Slide Number: 13

Read cache is enabled by default on

Saratech proprietary and confidential Slide Number: 14

The NX Nastran buffer size is the size of each I/O unit

Saratech proprietary and confidential Slide Number: 15

Saratech proprietary and confidential Slide Number: 16

Saratech proprietary and confidential Slide Number: 17

This model was run with mem=1673Mb

Saratech proprietary and confidential Slide Number: 18

Look in the f04 file for USER OPENCORE:

Saratech proprietary and confidential Slide Number: 19

Saratech proprietary and confidential Slide Number: 20

Saratech proprietary and confidential Slide Number: 21

Saratech proprietary and confidential Slide Number: 22

Saratech proprietary and confidential Slide Number: 23

Shared Memory Parallel (SMP) is a single

Distributed Memory Parallel (DMP)

Saratech proprietary and confidential Slide Number: 26

INNOVATION PERFORMANCE RELIABILITY

Image courtesy of Siemens PLM Software

27 AMD Professional Graphics for NX | August 2015 27

AMD FirePro W-Series Recommended for NX/FEMAP

28 AMD Professional Graphics for NX | August 2015

AMD FirePro W8100

29 AMD Professional Graphics for NX | August 2015

Up to 25x faster than serial

30 AMD Professional Graphics for NX | August 2015

Mobile Workstations &

31 AMD Professional Graphics for NX | August 2015

Saratech proprietary and confidential Slide Number: 32

Performance graphics vs. regular graphics comparison

Saratech proprietary and confidential Slide Number: 33

Saratech proprietary and confidential Slide Number: 38

Potrebbero piacerti anche