Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
HPCisanabbreviationforhighperformancecomputingandintroducestheconceptofsolving
problemsinparallelinordertoreducethesimulationtime.Thecomputationalpoweravailableto
commonusersisrapidlyevolving,givingtheopportunitiestosolvecomplexsimulationswithin
reasonabletime.Distributethecomputationalefforttoseveralcoresandtherebyreducethe
computationaltimesignificantlyisakeyadvantagetoquicklyproducingaccurateresults.
Introduction
SeveralparametersaresignificantwheninvestigatingthespeedupofsimulationperformedinAnsys.
Hardware,asinmemoryspeed,CPUclockspeed,IODiskSpeedandinterconnectsareall
contributingtothesimulationtime,howeverthesefactorswillnotbementionedinthisblog.The
focushereistoillustratethepossibilitiesofspeedupbyusingparallelsolversorGPUswhen
performingeitherCFDorFEanalysis.
Whenrunningasimulationonseveralcores,theproblemisdividedintoNpieces,whereeach
domainissolvedindependently.Thedifferentpiecesarecommunicatingattheintersectionsandthe
solutionisreassembledwhenthesolverisfinishedintoonesingeresultfile.Sincethepieceshaveto
communicate,doublingtheamountofcoreswillnotalwaysreducethesimulationtimebyafactorof
two.Thebenefitofincreasingnumberofcorescouldbevisualizedinascalabilitychart,visualizing
thereductioninsimulationtimevs.numberofcoresused.
ScalabilityofaCFDairfoilsimulation
Speed Up
Whencomparingtheresponseinsimulationtimewithnumberofcoresuseda
scalabilitychartcouldbegenerated.Thechartbelowisoneexample,generated
fromaCFDsimulationonanairfoilwithameshcontaining9.930.000nodes.As
thegraphshows,thespeedupwhenincreasingnumberofcoresarelinearuntil
33cores.Thisindicatesaperfectlinearscalinguptoanodetocoreratioof
300.000nodespercore.Thecommunicationbetweentheparallelizedpartsreducesthebenefitof
addingadditionalcoresinasmallextent.
50
45
40
35
30
25
20
15
10
5
0
Performence
Linear
10
20
30
40
50
Number of Cores
IntelXeonNehalemX55502.67GHz
24GBmemorypernode
ScalabilityofaCFDIndycarsimulation
Speed Up
ThechartbelowgivesanotherexampleofscalabilityinCFDsimulations.The
meshofthesurroundingfluiddomainembeddingtheIndycarcontains483.360
nodes.Asthegraphindicates,quitegoodscalingisachievedinthisCFD
simulationaswell,whereresultsareproduced6timesfasterwhenrunningon8
coresinsteadofaserialrun.
10
9
8
7
6
5
4
3
2
1
0
Performence
Linear
IntelXeonNehalemX55502.67GHz
24GBmemorypernode
10
Number of Cores
SpeedingupFEanalysisbytheuseofGPUGraphicsProcessingUnit
Unfortunately,FEresultsdonotprovidethesameprocessingscalabilityasCFD.However,theGPU
offersanextensivecomputationalpowerresource,andcouldbeusedtospeeduptheprocess
significantlyinmanyapplications.
GPU
CPU
PCIExpress
channel
AmulticoreCPUprocessor,typically48cores,isapowerfulunitforgeneralpurposecomputations.
TheGPUontheotherhandtypicallycontainshundredsofcoresandisgreatforhighlyparallelcode,
withinmemoryconstraints.TheGPUisamassivecomputationalpowerinmoderncomputers,
capableofhandlingvectoroperationsquicklyseveralordersofmagnitudehigherperformance
thanaconventionalCPU.However,GPUisnotareplacementofCPUs,ratherthetworesources
couldworktogetherinacollaborativefashiontoperformtheANSYSsimulation.
PertodayANSYSsupportsallNvidiaTeslacardsandQuadro6000forGPU.
PresentedareresultsfromabenchmarkstudyconductedbyengineersintheNVIDIAperformance
lab.ThemodelV13sp5isfromthestandardANSYSbenchmarksetandisoneofthebestmodelsto
representrealworldcustomerpractice.Itwasderivedfromtheturbomachineryindustryand
comprisesatypicalnonlinearstaticanalysisoflargedeflectionbutwithonlyasingleequilibrium
iterationfromafullsolutionthatwouldrequire25iterations.Asthegraphsshows,asignificant
benefitcouldbeachievedinthissetup.
ANSYSHPCOffering
ThesameANSYSHPClicensescouldbeusedforANSYS
FEAorANSYSCFD.Theycouldbepurchasedonebyone,
whereoneHPClicensegivetheuserthechangetoruna
FEAorCFDsolveronanadditionalcoreorinpacksof8.
ANSYSHPCPack
ANSYSHPCPacksenabletheparallelyouneedinorder
todohighfidelitysimulationsthatprovideenhanced
insight.Packsworkasillustrated.Eachsimulationjob
willconsumeoneormoreHPCPacklicense.Asyouadd
Packs,theamountofparallelenabledincreasesrapidly:
Onepackallows8way,twopacksallow32way,three
packsallow128way,andonupto5packsforextremelyhighfidelityusing2048processes.Packscan
notbesplitandallwaystakeatleastonefullpackage.