1. HOME
  2. What's New
  3. Benchmark Results of Ansys Fluent & Ansys CFX on Intel Xeon Gold 6230 and AMD EPYC 7702

What's New

Some contents are available in Japanese only.

Tech Blog

Benchmark Results of Ansys Fluent & Ansys CFX on Intel Xeon Gold 6230 and AMD EPYC 7702

XTREME-D Inc. cooperated with Ansys Japan K.K. on benchmarking Ansys Fluent and Ansys CFX. The runs were executed on two HPC clusters – one with Intel CPUs (hereafter G2), and one with AMD CPUs (hereafter G3). These two HPC clusters are managed by XTREME-D Inc. and are available to our customers as a service.

In conclusion, the AMD CPU AMD EPYC 7702 shows better price performance compared with the Intel CPU Intel Xeon Gold 6230 in our environment, based on our assumed cost.

Specifications of HPC clusters with Intel CPU and AMD CPU

The following table shows the specification of HPC clusters used for this benchmarking.

 Cluster Name G2 Cluster G3 Cluster
Number of cores per node 40 [cores/node] 128 [cores/node]
CPUs Intel Xeon Gold 6230 Cascade Lake (2.1GHz, 20Cores) x2 AMD EPYC 7702 64-Core Processor (2.0GHz, 64cores) x2
Memory 192 [GB] 503 [GB]
Interconnect Intel Omni-Path 100Gbps Mellanox HDR 100Gbps
Power Consumption 250 [W/node] 400 [W/node]

Ansys Fluent Benchmark

Version of Ansys Fluent

The version of Ansys Fluent used was 20.1.0.

Model Used

Ansys Japan provided a model, f1_racecar_140m. The sizes of the model are shown below.

  • 314763 tetrahedral cells
  • 329511 tetrahedral cells
  • 139070249 mixed cells

Execution Command

The execution commands are shown below. The calculation was done in single precision. Intel MPI was used (it is a bit faster than IBM MPI). Also note that G2 uses Omni-Path and G3 uses InfiniBand, so the options to specify the interconnects are slightly different.

G2 Cluster

fluent 3d -ssh -t${NCPUS} -cnf=${HOSTS} -mpi=intel -pib.infinipath -g -driver null -cflush -i rcd f1_racecar_140m

G3 Cluster

fluent 3d -ssh -t${NCPUS} -cnf=${HOSTS} -mpi=intel -pinfiniband -g -driver null -cflush -i rcd f1_racecar_140m

Calculation Time Evaluation Method

In order to evaluate only the calculation time, excluding the time of reading the input file and writing the calculated result, (benchmark’ (iterate 50)) is added in the fluent journal file as shown in the example below. The calculated time is for 50 iterations.

rcd f1_racecar_140m
(benchmark’ (iterate 50))
(print-case-timer)
wd f1_racecar_140m_0050.dat.gz
exit

With this, we obtain the following elapsed-time in the log file when the calculation is completed normally.

cortex=11.43333333333333, solver=166433.314899
elapsed-time: 1042.958387136459

Result

The results are shown below. The following three graphs use the same results, with the horizontal axis showing a different view. That is, the number of cores can be converted into the number of nodes, and the number of nodes can be converted into the cost. It turns out that the Xeon Gold 6230 looks faster when viewed in terms of the number of cores on the horizontal axis, but when the horizontal axis is converted to the number of nodes and then converted to cost, AMD’s EPIC 7702 is more cost effective per unit cost. For the same cost, the AMD cluster is faster. To achieve a given speed it takes less AMD CPUs than Intel CPUs. So, when the AMD cluster is expanded, it will require less interconnect switches than the Intel cluster would require for similar expansion. Considering hardware alone without the software fee, AMD shows better price-performance for this comparison. Cost here refers only to the cost of servers and it does not include the cost of network switches and other components. It also does not include the cost of software.

 

 

Ansys CFX Benchmark

Version of Ansys CFX

The version of Ansys CFX used was 2020 R1 Solver.

Model Used

Ansys Japan provided a model named perf_Airfoil_100M_R14. The sizes of the model are shown below.

  • Total Number of Nodes = 104533000
  • Total Number of Elements = 103779720
  • Total Number of Hexahedrons = 103779720
  • Total Number of Faces = 10938532

Execution Command

The execution commands are shown below. The calculation was done in single precision. Intel MPI is used. The commands are the same for both G2/G3 clusters.

cfx5solve -def ${INPUT_FILENAME}.def -par -par-dist ${cfx_hosts} -part ${NCPUS} -large -start-method “Intel MPI Distributed Parallel” -batch

Since G2 uses Omni-Path and G3 uses InfiniBand, the start-methods.ccl files were edited as follows.

start-methods.ccl File for G2 Cluster

$ cat -n /<Ansys Install Dir>/v201/CFX/etc/start-methods.ccl | grep “I_MPI_FABRICS shm:”
166 Appfile Host Entry = -n %{hostprocs} -host %{hostname} -wdir “%{workdir_host}” -genv I_MPI_FABRICS shm:ofa -env LD_PRELOAD %{impidir}/lib/libmpi_mt.so -env %{ldpathvar} %{rldpath} %{executable} %{arguments}

start-methods.ccl File for G3 Cluster

$ cat -n /<Ansys Install Dir>/v201/CFX/etc/start-methods.ccl | grep “I_MPI_FABRICS shm:”
166 Appfile Host Entry = -n %{hostprocs} -host %{hostname} -wdir “%{workdir_host}” -genv I_MPI_FABRICS shm:dapl -env LD_PRELOAD %{impidir}/lib/libmpi_mt.so -env %{ldpathvar} %{rldpath} %{executable} %{arguments}

Calculation Time Evaluation Method

In order to evaluate calculation time only, excluding the time of reading the input file and writing the calculation result, “CFD Solver wall clock seconds” included in the CFX log file was used. An example is shown below.

CFD Solver finished: Fri Jul 31 01:53:05 2020
CFD Solver wall clock seconds: 8.2920E+01

And the number of iterations was set to five.

Result

The results are shown in the three graphs below. The same conclusions were obtained as for Ansys Fluent.

 

Summary

We used Ansys Fluent and Ansys CFX to benchmark HPC clusters made of Intel CPU (Xeon Gold 6230) and AMD CPU (EPYC 7702). The results are as follows:

  • The Intel CPU (Xeon Gold 6230)has a faster core
  • However, if you convert the horizontal axis to a cost ratio calculated from the cost per node, the AMD CPU (EPYC 7702) gives a better result per unit cost.

Since AMD CPUs require fewer nodes than Intel to achieve the same speed, fewer InfiniBand switches are required for new HPC deployment. If you look at the hardware alone without considering the cost of the software, AMD is the more cost-effective CPU for this model number, based on our assumed cost.

XTREME-D supports both Intel and AMD CPUs on our Private and Dedicated Plans.

 

Ansys® and any and all ANSYS, Inc. product names are registered trademarks or trademarks of ANSYS, Inc. or its subsidiaries in the United States or other countries.