Portal
Language
 
Home>Knowledge Base>Using the Linpack Benchmark to Stress Test Your System for Bad Hardware
Information
Article ID62
Created On1/28/2010
Modified1/28/2010
Share With Others
Using the Linpack Benchmark to Stress Test Your System for Bad Hardware
Using the Linpack Benchmark to Stress Test Your Computer for Bad Hardware

Using the Linpack Benchmark to Stress Test Your Computer for Bad Hardware (CPU, RAM, ...)

Intro

Linpack is an HPC benchmark that solves a set of matrices by two different methods and then compares the output, detecting silently flaky hardware through invalid results. The longer of the two methods can be optimized to push the CPU and memory in a system very hard. These two capabilities make this an excellent hardware knockout tool.

You can roll your own, but Intel provides a nice pre-compiled binary and optimized input files at http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download/

These instructions should apply to the Mac and Windows versions as well, though they were developed using the Linux version.

Setup

  • Uncompress the files and go into the linpack_X.Y.Z/benchmarks/linpack directory.
  • Unload any unneeded drivers and stopping all unneeded applications
  • In order to achieve maximum coverage you want linpack to use as much memory as possible. To calculate the run size:
    • take the total memory in the system and subtract 500 MB or so for the OS (I had to take about 4 GB on my 32 GB test machine). Under Linux use "free -b" and take about an additional 1GB off the 'total' column.
    • Take this value in bytes (XXXX MB * 1024 * 1024) and divide by eight, then square root. For better performance find the 16 just below the calculated value and add 8 to it. This gives you your job size:
 Using a python shell:
>>> import math
>>> math.floor( math.sqrt( ( XXXX MB - 500 ) * 1024 * 1024 / 8 ) / 16) * 16 + 8 = Job Size
  • For example
 A 32 GB machine
$ free -b
total used free shared buffers cached
Mem: 29579198464 6443679744 23135518720 0 3863703552 1220497408
-/+ buffers/cache: 1359478784 28219719680
Swap: 2080366592 0 2080366592

>>> import math
>>> math.floor( math.sqrt( (29579198464 - 1024*1024*1024) / 8 ) / 16 ) * 16 + 8 = 59688
  • Construct a new lininput_xeon64 input file, but using the number calculated above rather than the sample number as shown:
Sample Intel(R) LINPACK data file (lininput_xeon64)
Intel(R) LINPACK data
1 # number of tests
59688 # problem sizes
59688 # leading dimensions
1 # times to run a test
4 # alignment values (in KB)
  • Turn off swap with
$ swapoff -a 
  • Run ./runme_xeon64 to launch hpl, it should detect the number of sockets/cores and automatically fire up the proper number of threads.
  • Monitor the first run to ensure it does not consume too much memory, recalibrate the problem size and leading dimensions if needed
  • After the first run completes, determine the number of tests needed to hit an ~ 24 hour run, change the 'times to run a test' entry.

Running/Reporting

  • Run the ./runme_xeon64 test set again and allow it to churn.
  • Inspect the results file lin_xeon64.txt for any errors