Sei sulla pagina 1di 38

FEMAP/NX NASTRAN PERFORMANCE TUNING

Chris Teague - Saratech

(949) 481-3267 | www.saratechinc.com


NX Nastran Hardware Performance History
Running Nastran in 1984:
Cray Y-MP, 32 Bits! (X-MP was only 24 Bits)
Four Vector Processors (167 Mhz)
256 MB of RAM (Note the MB, PC was 256K)
333 Mflops per processor
$3-$4 Million, plus special room
Comparison in 2016:
64 Bits
Dual Core (1.85 Ghz)
2GB (2048 MB) of RAM
340 Mflops (Single Thread)/
613 Mflops (Multi Thread)
iPhone 6s, $689
NX Nastran currently not ported to iOS

Saratech proprietary and confidential Slide Number: 2


NX Nastran Hardware Performance History
Rack Server in 2016:
Dell R930, 64 Bits
Max 4 Processors at a Max 18 Cores Each
(72 Cores Total), or up to 3.2 Ghz
1.5 TB of RAM
1.8 Gflops per processor
High Speed PCIe based SSD disk drive
(2.8 GBs Read/2.2 GBs Write speed)
$85K with 6.5TB PCIe SSD, 1.5TB RAM,
4x3.2Ghz 4C Xeon processors
Blade Array System
Up to 30 Blades, each configured like a single server
So how much faster do our Nastran jobs run with this
huge increase in computing Power?
Saratech proprietary and confidential Slide Number: 3
NX Nastran Performance Tuning Tips

What is LP-64 vs ILP-64?


Hardware and OS Selection
NX Nastran Scratch Drive
I/O Performance, OS Settings
Buffer Size
Hyperthreading
Element Iterative Solver
SMP vs DMP
GPGPU
Saratech proprietary and confidential Slide Number: 4
NX NASTRAN LP-64 vs ILP-64
There are two 64 bit versions of NX Nastran:
LP-64
Standard version when running through FEMAP
4-Byte Words
8 GB RAM limit
ILP-64
Optional version when running through FEMAP
8-Byte Words
20 TB RAM limit, which is really the hardware RAM limit
of the machine you are running on
When running NX Nastran on the
command line, the L executables
are ILP. w executables will bring
up a file browser.
In some cases, ILP-64 may offer
improved accuracy

Saratech proprietary and confidential Slide Number: 5


NX NASTRAN LP-64 vs ILP-64

In general, the standard LP-64 version of NX Nastran is faster for models


that do not need more than 8GB of RAM allocated to the Solver
For larger models that need more than 8GB of RAM for the Solver, you
will need to use the ILP-64 version and have available RAM.
For performance reasons, you dont want to allocate more than about
50% of RAM to NX Nastran. The other RAM is needed for the OS and I/O
Caching, which is a huge help to NX Nastran performance.
This means that if you need to use the ILP-64
version of NX Nastran, you will want at least
16 GB of RAM. Larger models may require
more. Sometimes LOTS more!

Saratech proprietary and confidential Slide Number: 6


Hardware and OS Selection
Processors
Faster processers are good (Faster I/O Speed is
just as important, if not more though)
Large L2 or L3 processer cache can improve
performance (Xeon can help here)
Multi-Core is good, but dont get more cores over
less cores with faster clock speed (Usually)
Intel Xeon E7-8880 v3, 2.3 Ghz, 45M, 18 Core
Intel Xeon E7-8893 v3, 3.2 Ghz, 45M, 4 Core

Memory
As much as budget allows, and the fastest available
Saratech ran a large job with mem=24 GB on a
system with 64 GB of RAM. Nastran used up all
available RAM for 2-3 hours, the extra being used
for I/O Caching. See Task Manager graph:

Saratech proprietary and confidential Slide Number: 7


Hardware and OS Selection
Disk
SATA based SSD are significantly faster than
mechanical drives
PCIe based SSD devices are even faster still,
and are available and laptop, desktop, and
server models.
Example: SanDisk SX350-3200, 3.2 TB, 2.8 GB/s
Read, 2.2 GB/s write speed (Servers)
Intel 750 Series, 1.2 TB, 2.5 GB/s Read,
1.2 GB/S write speed (Workstations)
Operating System Intel 750 Series PCIe SSD
Generally Linux is faster that Windows on the
same hardware due the superior I/O on Linux
Because of this, most HPC cluster systems run Linux
Windows is more popular on the desktop due to the wide variety of applications that
run on Windows.
Saratech proprietary and confidential Slide Number: 8
Hardware and OS Selection

Priorities for getting the most performance for the least money:
Maximum number of *fast* cores with large cache
Add as much RAM as possible, and go for the fastest RAM allowed
Maximize I/O bandwidth and disk speed
Add GPU processing for some large dynamics problems (More on this later)
I always recommend at least two disks, and 3 if possible:
Disk 1: Fast drive for OS & Applications
Disk 2: Very fast drive for NASTRAN & FEMAP scratch space (Keep empty when not
running NX Nastran & FEMAP)
Disk 3: Large drive for data storage
NASTRAN does so much disk I/O, it is better to have its own drive for scratch files,
and make sure it is as fast as possible, SSD PCIe, or even a RAID of SSD. We dont
want to let the OS/Application data needs slow down our NASTRAN job.

Saratech proprietary and confidential Slide Number: 9


NX Nastran Scratch Drive

Nastran scratch folder should point to a


fast disk, or a RAID array (RAID0)
Local disk drives are preferred
Using network mounted NFS or SMB
(Windows Shared Drive) connection is
generally going to have significant
performance penalties
Even laptops can have two drives, try
mSATA cards, or
even PCIe in
newer laptops SanDisk Fusion ioMemory SX350-3200

Samsung 850 EVO M.2 SSD


Saratech proprietary and confidential Slide Number: 10
NX Nastran Scratch Drive

You can set the NX Nastran scratch drive in the rc file


The nastran rc file for FEMAP can be found in FEMAPv113/nastran/conf,
where 113 is the version of FEMAP that you have installed
Sample from my laptop:
Auth=28000@LocalHost
Sdir=e:\scratch
program=FEMAP
scr=yes
buffsize=32769
memory=.45*physical
smem=20.0X Samsung 850 EVO M.2 SSD

The E drive is a 512GB SSD mSATA card

Saratech proprietary and confidential Slide Number: 11


NX Nastran & FEMAP Scratch Drive in FEMAP Preferences
FEMAP scratch drive NX Nastran scratch drive

Saratech proprietary and confidential Slide Number: 12


OS Settings: I/O Cache

Reading from and writing to disk drives are


much slower than RAM, even with SSD
Data that is typically written is probably read
back soon
Keeping information in memory instead of
disk will reduce disk seek times
Make use of unallocated memory for disk
buffer I/O Cache

Saratech proprietary and confidential Slide Number: 13


OS Settings: Enabling Disk I/O Cache

Read cache is enabled by default on


Linux and Windows
Enable write cache on Linux using
hdparm command or equivalent
On Windows, use Device Manager
property settings to enable write-cache
on the Nastran scratch drive in the
Policies tab

Saratech proprietary and confidential Slide Number: 14


Buffer Size

The NX Nastran buffer size is the size of each I/O unit


The default size in NX Nastran 9 is 8193 and works well for small models
(<100K DOF)
For larger models (>400K DOF), increasing the default buffer size to
32769 may help. This is the default in NX Nastran 10
This can be done by editing the nastran rc file and editing the line to be:
Buffsize=32769
The nastran rc file for FEMAP can be found in FEMAPv113/nastran/conf,
where 113 is the version of FEMAP that you have installed

Saratech proprietary and confidential Slide Number: 15


NX Nastran Settings: Memory

Starting with NX Nastran 10, the new default memory settings in the rcf
file are:
Memory=0.45*physical (45% of total RAM installed in the workstation)
Smem=20.0X (20% of Memory in line above)
Buffpool=20.0X (Same as Smem)
These settings are more appropriate for large models and machines with
more RAM
Inspect the F04 file to see if you have optimum settings for your model
Note: Unless SMEM is large enough to contain all scratch files, it is
better to set it to zero

Saratech proprietary and confidential Slide Number: 16


NX NASTRAN MEMORY LAYOUT

Saratech proprietary and confidential Slide Number: 17


NX NASTRAN MEMORY

The f04 file will give a summary of the memory that was allocated. The
allocations will be the areas shown on the previous slide.
Here is an example from TET10 model around 650,000 elements:
** PHYSICAL FILES LARGER THAN 2GB ARE SUPPORTED ON THIS PLATFORM
0 ** MASTER DIRECTORIES ARE LOADED IN MEMORY.
USER OPENCORE (HICORE) = 433308364 WORDS
EXECUTIVE SYSTEM WORK AREA = 316925 WORDS
MASTER(RAM) = 70822 WORDS
SCRATCH(MEM) AREA = 3276900 WORDS ( 100 BUFFERS)
BUFFER POOL AREA (GINO/EXEC) = 1671729 WORDS ( 51 BUFFERS)
TOTAL NX NASTRAN MEMORY LIMIT = 438644740 WORDS

This model was run with mem=1673Mb


Remember, LP-64 is 4 bytes per word, and ILP-64 is 8 bytes per word

Saratech proprietary and confidential Slide Number: 18


HOW MUCH MEMORY IS ENOUGH?

Look in the f04 file for USER OPENCORE:

Compare to the HIWATER usage toward the end of the f04 file:

If HIWATER is getting close to or over HICORE, then likely the job would
benefit from more memory (mem=x)

Saratech proprietary and confidential Slide Number: 19


SETTING MEMORY SIZE IN FEMAP
FEMAP uses Mb units, and memory can
be set in the NASTRAN Executive and
Solution Options form. 0 is the default
which will use NASTRANs default in the
rcf file
For Windows, dont allocate more than
about 50% of the physical memory of
the machine to avoid performance issues
(swapping). Less may be better since the
other memory is used for I/O Caching by
Windows
NX Nastran 10 default of 45% is pretty
good for most cases until you get to
workstations/servers with a large amount
of RAM

Saratech proprietary and confidential Slide Number: 20


HYPERTHREADING
Some modern Intel CPUs support Hyperthreading.
Hyperthreading is a like a virtual CPU, where one CPU can run two
threads.
There can be a small performance advantage on some desktop
applications, but its very small.
Nastran, like other Windows programs sees the virtual CPU as a real
CPU, since that is what Intel intended.
Since NX NASTRAN is very CPU intensive, it expects the virtual CPU
to perform like a real CPU, but it wont.
NX NASTRAN will usually perform better if you turn off
Hyperthreading. This is typically done in the BIOS.
Some Xeon processors do not have Hyperthreading for this reason

Saratech proprietary and confidential Slide Number: 21


Element Iterative Solver
For models that are
mostly solid elements,
the Element Iterative
Solver can offer
significant performance
improvements. (2-3x)
It does not help shell
or bar elements, and
will be ignored in
dynamics solutions
Set this in the Solution
form

Saratech proprietary and confidential Slide Number: 22


NX Nastran Linear Contact Solutions
Specify the proper search distance
Large Search distances typically
involve more active contacts for
the first few iterations

Saratech proprietary and confidential Slide Number: 23


Multiple CPUs SMP vs DMP

Shared Memory Parallel (SMP) is a single


machine with multiple processors that share
common memory and a common I/O system
(disks) as shown in the figure to the right.

SMP
DMP

Distributed Memory Parallel (DMP)


is a set of multiple machines or
cluster with one or more processors
communicating over a network.
Each machine has its own memory
and its own I/O system
Saratech proprietary and confidential Slide Number: 24
DMP vs. SMP
SMP Shared Memory Parallel
Common Memory Pool, Common I/O Pool
Desktop/Laptop hardware
Tapers off at 8 or so cores
No extra license needed
DMP Distributed Memory Parallel
Multiple machines with one or more processors communicating
over a network (Desktop/Cluster)
Each machine has its own memory and disk I/O
Used Message Passing Interface (MPI) which must be installed in
the OS
Highly Scalable
Extra license needed Now can be supported with a Femap license
DMP Solutions
101 Linear Statics
103 Normal Modes
105/108/111/112 Buckling, Direct/Model Frequency,
Modal Transient response
200 Design Optimization
Saratech proprietary and confidential Slide Number: 25
Multiple CPUs SMP Setup in FEMAP
If you would like to use multiple CPUs to solve a
NASTRAN run, FEMAP can set that right above
the Solver Memory.
If you are running NASTRAN on your desktop
machine, it is recommended to leave one CPU
available for other applications if you want to
continue to use the machine for other work
This can also be done in the input file with:
NASTRAN PARALLEL=x
PARALLEL is a command line option also, and
can be set in the rc file if you would like to have a
default number of processors
There is no extra license needed for SMP

Saratech proprietary and confidential Slide Number: 26


AMD PROFESSIONAL GRAPHICS ADVANTAGE

INNOVATION PERFORMANCE RELIABILITY


Simultaneous render & compute Application optimizations 100+ app certifications
Up to six 4K displays1 Latest API support Rock-solid drivers
Intelligent power technologies PCIe 3.0 support Three year warranty

Image courtesy of Siemens PLM Software

27 AMD Professional Graphics for NX | August 2015 27


AMD FIREPRO W-SERIES GRAPHICS PRODUCT STACK

AMD FirePro W-Series Recommended for NX/FEMAP


UHE

16GB GDDR5
W9100 275W

8GB GDDR5
W8100 220W
AMD FirePro W7100
TM
HE

8GB GDDR5
W7100 150W
Midrange

4GB GDDR5
W5100 <75W AMD FirePro W5100
TM

2GB GDDR5
W4100 LP, <50W
2D/3D
Entry

2GB DDR3
W2100 LP, 26W
AMD FirePro W4100
TM

28 AMD Professional Graphics for NX | August 2015


The Right Solution for your PLM Workflow
Simulation
NX Nastran AMD FirePro W9100
TM

AMD FirePro W8100


TM

Large Assemblies
and Rendering AMD FirePro W7100
TM

Design and
AMD FirePro W5100
TM

Validation

Drafting and
AMD FirePro W4100
TM

Modeling

Visualize, Review
AMD FirePro W2100
TM

and Mark-up
Images courtesy of Siemens PLM Software

29 AMD Professional Graphics for NX | August 2015


NX NASTRAN

y High performance GPUs and OpenCL accelerate modal frequency response calculations in NX Nastran.
y This solution makes it possible to compute a large number of modes over a wide frequency range,
economically and efficiently.
y Results of the AMD FirePro OpenCL acceleration for NX Nastran Modal Frequency Response:

Up to 25x faster than serial


Up to 4x faster than the top of the line 24-core CPU run time
Ref. : Siemens 2012 NX CAE Symposium Presentation: Accelerating Modal Frequency Response in
NX Nastran with AMD GPUs by Hoffnung and Reymond

OpenCL-accelerated solution
System Configuration: Supermicro H8DGi-F Dual Opteron Motherboard 24 core Magny-Cours with AMD FirePro W8000

30 AMD Professional Graphics for NX | August 2015


SCALABLE PROFESSIONAL GPU SOLUTIONS
Servers
} AMD provides a wide range of
products for a wide range of
software solutions
Desktop Workstations

Mobile Workstations &


Thin Clients

31 AMD Professional Graphics for NX | August 2015


Using GPGPU with NX Nastran (OpenCL)
For modal frequency response (SOL 111) with
more than 5000 modes, and if you have a fast
GPU card, such as the AMD FirePro W9100, it
may help turning on the GPGPU acceleration in
the NASTRAN Executive and Solution Options
form
NVIDIA Tesla K40 and Intel Xeon Phi 7120D are
also supported by NX Nastran

Saratech proprietary and confidential Slide Number: 32


FEMAP Performance Graphics

Performance graphics vs. regular graphics comparison


Model: 6 million nodes / elements
Action: full model display / group / full model display

Saratech proprietary and confidential Slide Number: 33


Graphics Preferences - Options
Hardware Acceleration: This will
disable the hardware driver if you
are having significant graphics
problems and want to find out if
the graphics driver is the cause
Performance Graphics (11.1 and
Higher): Uses a new graphics
architecture to improve
performance of initial draw and
dynamic rotation. Needs
OpenGL 4.2 or higher.
Memory Optimization: Should be
off unless you models are very
big and swapping is occurring. If
that is the case, turning this on
can improve drawing speed. If
not, it will slow things down.
Multi-Model Memory: This will
use more memory to help make
the transition time between
switching models faster.
Auto Regenerate: This will force
a redraw after every command.
Its slower, but keeps the
graphics up to date during
modifications.
Saratech proprietary and confidential Slide Number: 34
Graphics Preferences OpenGL
Enabling the Performance
Graphics option can
dramatically improve
performance on models
with a large number of:
Solids
Points
Nodes
Solid and Shell Elements
Set Max VBO MB (Memory)
to no more than 75% of
your graphics card memory
Sample is shown for a
graphics card with 2GB of
VRAM
Min VBO B is set to 1024 by
default and this should
work well with most
graphics cards
Saratech proprietary and confidential Slide Number: 35
Graphics Preferences Dynamic Rotation
Include in Dynamic Rotation
options - switching off any these
options should improve
performance.
Some key options:
Element Symbols - if you
have a lot of lumped masses
and springs
Mesh Size - if you have a
large number of curves with
mesh sizes on
Labels and Undeformed -
switching these off helps
performance.
Elements as Free Edges
this has a slight delay in
starting and finishing
dynamic rotation but dynamic
rotation is much quicker. For
some models e.g. a mesh on
a sphere, there is no free
edge and you will see nothing
as the model rotates.
Saratech proprietary and confidential Slide Number: 36
Graphics Card Performance Considerations
Desktop area resolution should be taken into
consideration when using Femap. Having a
very fine screen resolution can increase the
time animations need to generate and the time
individual windows need to refresh. Something
to consider for Ultra HD (4K/2160P) monitors
with resolutions of 3840x2160.
If Femap appears to be having graphics errors,
it could be the driver for your graphics card.
Update the drivers for your graphics card
often!
Drivers from the manufacturers of the
graphics card chipset tend to be more stable
then the drivers from the maker of the
graphics card. (e.g. use an ATI or nVidia
driver vs. an ASUS driver)
You should also set your graphics card
performance settings to the default settings.
In some cases, setting a card for optimum
performance for an application may cause
Femap to crash.
Saratech proprietary and confidential Slide Number: 37
Database Preferences
The database memory limit is set to
20% of available system RAM by
default. When FEMAP needs more,
it will just swap to the scratch disk,
slowing things down. Increasing
this number will leave less
available RAM for other FEMAP
operations besides the database.
In some cases it may be better to
lower this number.
The Max Cached Label must be set
to an ID higher than any entity in
the model.
The Open/Save method may
improve read/write performance if
you are experiencing slow
performance. Clicking the
Read/Write Test button will
automatically run a test and
determine the best setting for your
hardware. It takes about 1.2 GB of
disk space and a few minutes of
time

Saratech proprietary and confidential Slide Number: 38

Potrebbero piacerti anche