November 9, 2019 by Adam Jundt
We often see requests for multiple memory configurations ranging from 2-12 DIMMs per socket. Some of our customers would like to have 128/256/512/1024GB of RAM for the latest generation of Intel Xeon Scalable processors (Skylake and Cascade Lake). The Xeon Scalable processors support six memory channels and up to DDR4-2933 DIMMs. To take advantage of all memory channels, we would need to use 96/192/384/768/1536GB of RAM.
If instead we were to populate at 256GB of memory on a dual socket system via 8x 32GB DIMMs, we would only be using 4 of 6 memory channels/socket. This configuration will still work, but memory bandwidth performance will suffer. To find out by how much, we ran a simple memory benchmark, STREAM, in multiple memory configurations on a couple of motherboards and present our findings.
About the Benchmark
The STREAM benchmark was used to test memory performance. The STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth (in MB/s) and the corresponding computation rate for simple vector kernels. The benchmark was created and maintained by John McCalpin from the University of Virginia [1].
I used the slightly modified version of STREAM from NERSC’s trinity benchmarks. No other changes were made to the source. CentOS 7 was installed with default configuration. From there the benchmark was compiled via:
gcc -fopenmp -O2 -fpic -mcmodel=large -D_OPENMP -DNTIMES=100 -DN=80000000 stream.c -o stream.exe
Where NTIMES is the number of times to run each test and N is the array size (there are 3 arrays, this would set each array size at ~600MB, well above the L3 cache size).
Cascade Lake Setup
The system used was a Tyan Thunder HX FT77DB7109. The motherboard supports dual Intel Cascade Lake processors and up to 24 DIMMs. The system was configured with 2x Xeon Gold 6240 processors (18 core, 2.60GHz), and 32GB DDR4-2933 ECC DIMMs.
Results – Cascade Lake
Total DIMM count was modified from 4 DIMMs (2/socket) to 8, and 12. Benchmarks were run via:
export OMP_NUM_THREADS=36 && ./stream.ex