Memory benchmarks: Addendum primum

You might have noticed that I missed one case in my previous blog post: performing multiple independent random reads within a single thread. The code was expanded:

switch (inThreadParallelismLevel) {
    case 1: {
        for (int i = 0; i < steps; i++) {
        total0 += array[(11587L*i) &amp; mask]; 
 case 2: {
    for (int i = 0; i < steps; i++) {
        total0 += array[(24317L * i) &amp; mask];
        total1 += array[(14407L * i) &amp; mask];

We saw that the random reads performance scaled nicely with number of threads up (kind of linearly up to 5 threads on the 6 cores CPU). So the question is whether we can achieve the same when performing  multiple independent random reads within a thread.

Rather disappointingly this is not the case, here are the results for the 8 cores machine, the graph represents the memory bandwidth versus the number of concurrent requests within the single thread:


So, a bit of a surprise, there is absolutely no advantage here in running multiple reads within a thread. The scaling observed in the case of chained reads doesn’t happen here. Mind you it is going about 4 times faster here, so there is less scaling potential.  More importantly the scaling across threads/cores is not present here. So it looks like this will only scale if the memory read requests originate from different cores rather than from a single one.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Awelon Blue

Thoughts on Programming Experience Design

Joe Duffy's Blog

Adventures in the High-tech Underbelly

Design Matters

Furniture design blog from Design Matters author George Walker


A Blog for Woodworkers by Gary Rogowski


Woodworking, life and all things between

%d bloggers like this: