-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance measure using hash program #208
Comments
Are you running this in Verilog simulation? Could you attach the command line you used and perhaps the diffs of what you changed to increase the core count, just so I can be sure I understand your configuration correctly? Thanks! |
Yes, I'm using verilog simulation. |
Got it. It's been a while since I worked on this project, so I need to refresh my memory, but I'm afraid I can't think of a reason this should be off the top of my head. I'll need to look into this. |
Maybe a quick experiment: what happens with 2 cores? What are the corresponding cycle counts in each case (like is it close to a clean integer multiple)? My first question would be whether this is some artifact of the test configuration that is not synchronizing correctly (and thus misreporting the count), or if you are actually running into some kind of memory saturation issue where the performance is decreasing because of cache thrashing. |
|
Can you clarify what you mean by "performance dropped once"? (One time? Once you were above a certain number of cores?) |
Sorry, performance dropped once means The more cores used, the greater the performance degradation(cycles per hash is high) |
Oops, I see the problem :)
Because there are 16 vector lanes, four threads, and each thread does four hashes = 16 * 4 * 4 = 256. When you increase the number of cores, the total hashes that are being done increases, but this is still assuming it is fixed. The latency for each thread is going to increase because there's more memory contention, but this calculation is not accounting for the fact that the throughput has increased. One each fix might be to add another global variable gTotalThreads and do a __sync_fetch_and_add at the top:
Then use that to compute the total number of operations done:
(looking at this now, it should probably use constant variables for the number of iterations each thread takes and number of vector lanes for clarity instead of hard coding the numbers). I hope that helps. |
Hi,i want to measure nyuzi performance when config to more than 1 core(4 thread per core),
when i using the hash benchmark program and config nyuzi to 8 core,the output “cycles per hash” is my higher than single core, why?
And if i want to measure multicore performance, what should i do?
thanks.
The text was updated successfully, but these errors were encountered: