High-Performance Linpack

HPL Benchmark on my laptop

It's the Top500 season time. I therefore tested HPL on my laptop using Intel's latest OneAPI version 2021.1.10.2261.

The laptop specifications are obtained from lscpu:

$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           158
Model name:                      Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Stepping:                        9
CPU MHz:                         874.469
CPU max MHz:                     3800.0000
CPU min MHz:                     800.0000
BogoMIPS:                        5599.85
Virtualization:                  VT-x
L1d cache:                       128 KiB
L1i cache:                       128 KiB
L2 cache:                        1 MiB
L3 cache:                        6 MiB

I am using Linux Mint20 on Asus ROG.

Here are the benchmark results:

$ ./runme_xeon64
This is a SAMPLE run script for running a shared-memory version of
Intel(R) Distribution for LINPACK* Benchmark. Change it to reflect
the correct number of CPUs/threads, problem input files, etc..
*Other names and brands may be claimed as the property of others.
Fri 13 Nov 2020 08:29:57 IST
Sample data file lininput_xeon64.

Current date/time: Fri Nov 13 08:29:57 2020

CPU frequency:    3.391 GHz
Number of CPUs: 1
Number of cores: 4
Number of threads: 4

Parameters are set to:

Number of tests: 12
Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000
Leading dimension of array                  : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000
Number of trials to run                     : 4     2     2     2     2     2     2     2     2     2     1     1
Data alignment value (in Kbytes)            : 4     4     4     4     4     4     4     4     4     4     4     1

Maximum memory requested that can be used=7200601024, at the size=30000

=================== Timing linear equation system solver ===================

Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm) Check
1000   1000   4      0.007      96.3645 1.022959e-12 3.033181e-02   pass
1000   1000   4      0.006      103.2200 1.022959e-12 3.033181e-02   pass
1000   1000   4      0.006      104.5280 1.022959e-12 3.033181e-02   pass
1000   1000   4      0.007      96.2256 1.022959e-12 3.033181e-02   pass
2000   2000   4      0.054      99.1910 5.619838e-12 4.375464e-02   pass
2000   2000   4      0.053      99.9669 5.619838e-12 4.375464e-02   pass
5000   5008   4      0.634      131.5344 2.548040e-11 3.392018e-02   pass
5000   5008   4      0.636      131.2024 2.548040e-11 3.392018e-02   pass
10000 10000 4      4.641      143.6870 1.054555e-10 3.553909e-02   pass
10000 10000 4      4.506      147.9811 1.054555e-10 3.553909e-02   pass
15000 15000 4      14.650     153.6162 2.368669e-10 3.581348e-02   pass
15000 15000 4      15.110     148.9348 2.368669e-10 3.581348e-02   pass
18000 18008 4      26.769     145.2679 3.162348e-10 3.349350e-02   pass
18000 18008 4      27.580     140.9929 3.162348e-10 3.349350e-02   pass
20000 20016 4      38.582     138.2543 3.807211e-10 3.257923e-02   pass
20000 20016 4      40.702     131.0518 3.807211e-10 3.257923e-02   pass
22000 22008 4      52.958     134.0617 4.590843e-10 3.258820e-02   pass
22000 22008 4      53.794     131.9777 4.590843e-10 3.258820e-02   pass
25000 25000 4      79.499     131.0447 5.770316e-10 3.184866e-02   pass
25000 25000 4      80.791     128.9492 5.770316e-10 3.184866e-02   pass
26000 26000 4      91.586     127.9534 6.257559e-10 3.196386e-02   pass
26000 26000 4      92.436     126.7756 6.257559e-10 3.196386e-02   pass
27000 27000 4      104.169    125.9827 5.721172e-10 2.712944e-02   pass
30000 30000 1      143.041    125.8508 7.350489e-10 2.829664e-02   pass

Performance Summary (GFlops)

Size   LDA    Align. Average Maximal
1000   1000   4       100.0845 104.5280
2000   2000   4       99.5789 99.9669
5000   5008   4       131.3684 131.5344
10000 10000 4       145.8340 147.9811
15000 15000 4       151.2755 153.6162
18000 18008 4       143.1304 145.2679
20000 20016 4       134.6530 138.2543
22000 22008 4       133.0197 134.0617
25000 25000 4       129.9969 131.0447
26000 26000 4       127.3645 127.9534
27000 27000 4       125.9827 125.9827
30000 30000 1       125.8508 125.8508

Residual checks PASSED

End of tests

Below are 3 screen captures showing the load on the computer during the test (these pictures were taken during a previous test): top (top), netdata (middle) and gkrellm (bottom).

Search This Blog

Guy Tel-Zur's Blog

High-Performance Linpack

Comments

Popular posts from this blog

Parallel Debugging with a Serial Debugger

NOPs in MIPS