Test 0 [Address test, walking 1 bit]
This test changes one bit at a time in memory to see if it goes to a different memory location.
初始地址不断偏移得到新地址,bit 为 1 的位置在 32bit 里移动得到一个序列,把这个序列写入新地址中,然后读取检查。
可以检测 NPSF 和 CFs:
相邻图形敏感故障(Neighborhood Pattern Sensitive Faults,简称 NPSF),一个存储单元的内容或者改变这个单元内容的能力受另一个存储单元内容的影响。
耦合故障(Coupling Faults,简称 CF),一个存储单元的值发生改变,导致另一个存储单元的值发生改变。
Test 1 [Address test, own address]
Each address is written with its own address and then is checked for consistency.
把地址的值写入对应地址的内存中,然后检查。可以检查是否有地址无法访问。
Test 2 [Moving inversions, ones&zeros]
This test uses the moving inversions algorithm with patterns of all ones and zeros.
This test does not take long and should quickly find all "hard" errors and some more subtle errors.
全部是 1 的序列写入,读出比较;全部是 0 的序列写入,读出比较。
inversions 的体现: P1 = 全 1; P2 = 全 0; P2 = ~P1。(P=pattern)
可以检测 SAF:
固定型故障(Stuck-At Faults,简称 SAF),存储单元中的值固定为 0(简记为 SA0,Stuck-At-0)或者 1(简记为 SA1,Stuck-At-1),无法发生改变。
Test 3 [Moving inversions, 8 bit pattern]
This is the same as test 0 but uses a 8 bit wide pattern of "walking" ones and zeros.
This test will better detect subtle errors in "wide" memory chips.
更细粒度的 walking 1 bit,bit 为 1 的位置在 8bit 里移动得到一个子序列,四个子序列构成 32bit 的序列。
Test 4 [Moving inversions, random pattern]
This test uses the same algorithm as test 3 but the data pattern is a random number.
This test is particularly effective in finding difficult to detect data sensitive errors.
随机生成序列,写入与读出比较,数据敏感型测试用例。
Test 5 [Block move, 64 moves]
This test moves blocks of memory. Memory is initialized with shifting patterns that are inverted every 8 bytes.
Then these blocks of memory are moved around. After the moves are completed the data patterns are checked.
构造特定的序列 10101010 01010101 (inverted),把这段序列依次写入连续的内存中,然后读出做比较。
Test 6 [Moving inversions, 32 bit pattern]
This is a variation of the moving inversions algorithm that shifts the data pattern left one bit for each successive address.
The starting bit position is shifted left for each pass. To use all possible data patterns 32 passes are required.
This test is quite effective at detecting data sensitive errors but the execution time is long.
在一段内存中,会多次做写入读出的比较,写入的序列为初始序列和初始序列不断左移 (低位用指定的 sval=0 或 1 补齐) 生成的序列。
测试短时间内反复读写同一位置,是否会有数据异常。
Test 7 [Random number sequence]
This test writes a series of random numbers into memory (1MB).
The initial pattern is checked and then complemented and checked again on the next pass.
However, unlike the moving inversions test writing and checking can only be done in the forward direction.
检测连续顺序写是否正常。
Test 8 [Modulo 20, random pattern]
Using the Modulo-X algorithm should uncover errors that are not detected by moving inversions due to cache and buffering interference with the algorithm.
生成随机序列 P1,P2=~P1,在内存中全部写入 P2,然后在 X,2X....nX 的位置写入 P1。
检测缓存对数据正确性是否有影响。
Test 9 [Bit fade test, 2 patterns]
The bit fade test initializes all of memory with a pattern and then sleeps for 1 minute.
Then memory is examined to see if any memory bits have changed.
长效性测试,检测是否会发生位衰减。
Test10 [Memory stress]
A random pattern is generated and a large kernel is launched to set all memory to the pattern.
A new read and write kernel is launched immediately after the previous write kernel to check if there is any errors in memory and set the memory to the compliment.
This process is repeated for 1000 times for one pattern.
The kernel is written as to achieve the maximum bandwidth between the global memory and GPU.
压力测试。