Here are some results of running my membandwidth tool on various computers and compiler versions. If you have interesting results let me know so I can include them in this table (please state processor type and speed, memory speed, compiler version and optimization-relevant flags).
System | Compiler, options | OS | memcpy(1k) | memset(1k) | blocksum8(1k) | memcpy(8M) | memset(8M) | blocksum8(8M) | Comments |
AMD Athlon XP2200+, PC2700 DDR | GCC-2.95.2 (-O3) | Linux 2.4 | 4147 MB/s | 5172 MB/s | 5947 MB/s | 419 MB/s | 600 MB/s | 1335 MB/s | |
AMD Athlon XP2200+, PC2700 DDR | Intel-6.0.1 (-O3 -xiMK -tpp6) | Linux 2.4 | 4408 MB/s | 5669 MB/s | 7822 MB/s | 423 MB/s | 601 MB/s | 1595 MB/s | |
AMD Athlon X2 4600+ EE, PC2-6400 DDR2 | GCC 4.1 (-O3) | Linux 2.6 | 12068 MB/s | 21225 MB/s | 10367 MB/s | 3016 MB/s | 6646 MB/s | 2589 MB/s | |
Intel P4 3.2 GHz | Intel-8 (-O3 -QaxKN) | WinXP | 9540 MB/s | 12175 MB/s | 11785 MB/s | 1865 MB/s | 4011 MB/s | 3829 MB/s | |
Intel P4 3.2 GHz | MSVC.NET Release | WinXP | 5411 MB/s | 6041 MB/s 8147 MB/s | 9998 MB/s | 1280 MB/s | 1374 MB/s | 2736 MB/s | 1 |
Intel P4 3.2 GHz | GCC 3.2 (-O3) | WinXP/Cygwin | 6576 MB/s 5000 MB/s 4852 MB/s | 8258 MB/s 6881 MB/s 6670 MB/s | 8820 MB/s 10088 MB/s 9760 MB/s |
1623 MB/s 1247 MB/s 1101 MB/s | 1779 MB/s 1348 MB/s 1502 MB/s | 3511 MB/s 2644 MB/s 2524 MB/s |
2 |
Intel P4 3.2 GHz | MINGW 3.4.5 (-O3) | WinXP | 5044 MB/s | 7919 MB/s | 9892 MB/s | 1233 MB/s | 1499 MB/s | 2479 MB/s | 3 |
AMD Opteron 280, 2.3 GHz | Intel-8 (-O3) | WinXP | 2195 MB/s | 2522 MB/s | 8052 MB/s | 864 MB/s | 1319 MB/s | 2697 MB/s | 4 |
AMD Opteron 280, 2.3 GHz | Intel-8 (-O3 -QxW) | WinXP | 2195 MB/s | 2520 MB/s | 8881 MB/s | 880 MB/s | 1464 MB/s | 2334 MB/s | 4 |
AMD Opteron 280, 2.3 GHz | MSVC.NET Release | WinXP | 7864 MB/s | 13818 MB/s | 11118 MB/s | 1204 MB/s | 1988 MB/s | 2622 MB/s | 5 |
AMD Opteron 280, 2.3 GHz | MINGW 3.4.5 (-O3) | WinXP | 7997 MB/s | 13270 MB/s | 8105 MB/s | 1220 MB/s | 1721 MB/s | 2462 MB/s | 5 |
Iyonix PC (Intel XScale 600MHz) | GCC-3.5.4, UnixLib (-O3) | RISC OS | 103.4 MB/s | 101.4 MB/s | 741.7 MB/s | 28.8 MB/s | 98.4 MB/s | 126.4 MB/s | 6 |
1) | Note the huge difference for memset() in cache which occurred consistently by renaming the binary and copying it elsewhere. The slow version was run directly from the Release directory, the fast one renamed and copied to another location. And no, debug info isn't the reason either... |
2) | These are really weird, there are huge fluctuations in this series. As far as I can tell from the task manager, nothing much was running while any of these measurements were in action, still the bandwidths differ enormously over those runs. Either something's broken in Cygwin's CRT or Windows... "frowns" on Cygwin. |
3) | There were some fluctuations in these measurements as well, but nowhere near as bad as in the Cygwin case. |
4) | The Intel-compiler performs very badly on this AMD processor; SSE-acceleration can speed up blocksum8 a little, but the results are still highly disappointing. Coincidence? I think not... |
5) | MSVC.NET and GCC perform a lot better on the AMD CPU than the Intel-compiler. Both compilers provide very similar results, the only difference worth noting is in blocksum8 where I suppose MSVC.NET manages to squeeze in some SSE-acceleration. |
6) | Yes, these are really decimal points you're seeing. I didn't bother with them for the other measurements because the program isn't that precise to begin with, but some of these values go down low enough to actually warrant a decimal point, sad but true. |