Showing posts from 2020

High-Performance Linpack

 HPL Benchmark on my laptop It's the Top500 season time. I therefore tested HPL on my laptop using Intel's latest OneAPI version 2021.1.10.2261. The laptop specifications are obtained from lscpu : $ lscpu Architecture:                    x86_64 CPU op-mode(s):                  32-bit, 64-bit Byte Order:                      Little Endian Address sizes:                   39 bits physical, 48 bits virtual CPU(s):                          8 On-line CPU(s) list:             0-7 Thread(s) per core:              2 Core(s) per socket:              4 Socket(s):                       1 NUMA node(s):                    1 Vendor ID:                       GenuineIntel CPU family:                      6 Model:                           158 Model name:                      Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz Stepping:                        9 CPU MHz:                         874.469 CPU max MHz:                     3800.0000 CPU min MHz:                     800.0000 BogoMIPS:       


The following no t e address es   questions regarding the  number of  required  NOPs in 5 stages MIPS pipelined processor.  In this example  lw  is the first instruction  follow ed  by an   add  (R-type) instruction with a RAW data dependency .     Table 1 In Table 1 we have an old MIPS which requires 3 NOPs because only after updating the architectural state (cycle 5) the add instruction can proceed.    (D)   refers to NOP instead of Decode.   Table 2 In Table 2 we assume that our  MIPS  can  write data in the first half of the clock cycle and read data in the second half  of the clock cycle  then the  number of NOPs  can be reduced  to 2.   Table 3 Finally in Table 3, like in the previous case, we assume the processor can  write data in the first half of the clock cycle and read data in the second half of the clock cycle  and in addition  it  also supports   forwarding . This time the second instruction  completes with just  1 NOP  del