Memory benchmarks: Addendum secundum: .Net Native →

Memory benchmarks: Addendum primum

September 15, 2015 by Gabriel Horvath Leave a comment

You might have noticed that I missed one case in my previous blog post: performing multiple independent random reads within a single thread. The code was expanded:

switch (inThreadParallelismLevel) {
    case 1: {
        for (int i = 0; i < steps; i++) {
        total0 += array[(11587L*i) &amp; mask]; 
    }
    break;
 }
 case 2: {
    for (int i = 0; i < steps; i++) {
        total0 += array[(24317L * i) &amp; mask];
        total1 += array[(14407L * i) &amp; mask];
    }
    break;
 }
.
.
.

We saw that the random reads performance scaled nicely with number of threads up (kind of linearly up to 5 threads on the 6 cores CPU). So the question is whether we can achieve the same when performing multiple independent random reads within a thread.

Rather disappointingly this is not the case, here are the results for the 8 cores machine, the graph represents the memory bandwidth versus the number of concurrent requests within the single thread:

new-multiple-random-and-chained-reads

So, a bit of a surprise, there is absolutely no advantage here in running multiple reads within a thread. The scaling observed in the case of chained reads doesn’t happen here. Mind you it is going about 4 times faster here, so there is less scaling potential. More importantly the scaling across threads/cores is not present here. So it looks like this will only scale if the memory read requests originate from different cores rather than from a single one.

Filed under performance Tagged with C#

Software Transactions

Memory benchmarks: Addendum primum

Leave a comment Cancel reply

Tags

Categories

Blogs I Follow

Copyright © Gabriel Zs. K. Horvath

Software Transactions

Memory benchmarks: Addendum primum

Related

Leave a comment Cancel reply

Tags

Categories

Blogs I Follow

Copyright © Gabriel Zs. K. Horvath