MMX-accelerated MemCopy test (Win32 only)
BlitzMax Forums/BlitzMax Programming/MMX-accelerated MemCopy test (Win32 only)
| ||
I've somehow managed to assemble and import some MMX assembly code, but it's just copied-and-pasted so if this test doesn't work it'll be kinda tough! Could people give this a whirl and post their output? I'd be particularly interested to know if it exits politely for CPUs with no MMX support. Save any work before running (at your own risk), though it should be fine! I also have TinyPTC working with MMX acceleration, so if this test works I'll tidy it all up and release it (once Mark fixes a minor bug-ette in importing pre-compiled libs!). My output: ------------------------------------------------------------------------ Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies) ------------------------------------------------------------------------ ------------------------------------------------------------------------ Testing plain MemCopy... ------------------------------------------------------------------------ 1130 ms ------------------------------------------------------------------------ Testing MMX MemCopy... ------------------------------------------------------------------------ 199 ms (One million iterations goes from 11 seconds to 2 seconds!) |
| ||
Plain: 581ms MMX: 111ms |
| ||
Plain:628ms MMX:101ms (my CPU does not report MMX capability (SSE2/3DNOW!)) |
| ||
Plain: 540 ms MMX: 92 ms |
| ||
Plain: 266 ms MMX: 231 ms |
| ||
Thanks all... interesting that it seems a pretty much consistent 5x speed increase. (my CPU does not report MMX capability (SSE2/3DNOW!)) It must report it via the CPU ID flags the code checks, as it wouldn't have run otherwise. (A quick Google shows your Athlon64 3000 does support MMX anyway.) Lapgod: what CPU make/model do you have? Doesn't look like it's accelerated at all with MMX there, yet ran really fast! |
| ||
Strange, I get 176ms for both. p4-3.2ghz |
| ||
Interesting -- I'm guessing Lapgod has a P4 as well now. This mumbo-jumbo seems to suggest the P4 doesn't, (or didn't?) make full use of MMX: http://www.tommesani.com/P4MMX.html "Summing up, the P4 can issue only one MMX instruction per cycle, and the latency is at best twice that on the older Pentium III processor. In pathological conditions, this adds up to bring P4's SIMD performance down to about one third P-III's. Until the P4 ramps up into the 2+ GHz frequency range, its integer SIMD execution speed will simply lag behind the venerable P6 core." Maybe Intel never quite got it figured out! |
| ||
I'm getting 262 plain and 278 mmx on my other computer, also P4 (1.8ghz) |
| ||
Desktop (Northwood):------------------------------------------------------------------------ Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies) ------------------------------------------------------------------------ ------------------------------------------------------------------------ Testing plain MemCopy... ------------------------------------------------------------------------ 196 ms ------------------------------------------------------------------------ Testing MMX MemCopy... ------------------------------------------------------------------------ 150 ms Laptop (Prescott): ------------------------------------------------------------------------ Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies) ------------------------------------------------------------------------ ------------------------------------------------------------------------ Testing plain MemCopy... ------------------------------------------------------------------------ 228 ms ------------------------------------------------------------------------ Testing MMX MemCopy... ------------------------------------------------------------------------ 139 ms Cheers, Stu |
| ||
On my laptop.. Plain: 207ms MMX: 111ms Laptop is a Pentium M 1.6ghz (512mb RAM) And, as I've mentioned before, my desktop is an Athlon 64 3000+ (1gb RAM). |
| ||
------------------------------------------------------------------------ Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies) ------------------------------------------------------------------------ ------------------------------------------------------------------------ Testing plain MemCopy... ------------------------------------------------------------------------ 196 ms ------------------------------------------------------------------------ Testing MMX MemCopy... ------------------------------------------------------------------------ 188 ms Desktop, P4 2.5 Ghz (512 Mb RAM) |
| ||
Plain: 196 ms MMX: 175 ms |
| ||
Plain: 535 ms MMX: 90 ms my cpu Athlon64 3200 |
| ||
Plain: 728 MMX: 128 AMD Barton ~2800XP+ (slightly underclocked 3000XP+) |
| ||
Plain: 130ms MMX: 106ms |
| ||
Plain: 1733 MMX: 286 (1733?!! *kicks computer*) |
| ||
Pentium 4.. Plain : 170 ms MMx: 169 ms |
| ||
Pentium4 2.4Ghz (overclocked to 3.0Ghz) Plain: 177ms MMX: 147ms |
| ||
Plain: 160 MMX: 159 |
| ||
Thanks all. Pretty interesting how the P4 doesn't gain much at all from Intel's own MMX calls (yet is fast enough that it doesn't matter) while the Athlons gain a solid 5x speed increase to near enough match the P4s. Ya learn something every day. Still hoping to see it failing politely for someone with a non-MMX CPU! |
| ||
On my infamously old computer:------------------------------------------------------------------------ Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies) ------------------------------------------------------------------------ ------------------------------------------------------------------------ Testing plain MemCopy... ------------------------------------------------------------------------ 4801 ms ------------------------------------------------------------------------ Testing MMX MemCopy... ------------------------------------------------------------------------ 1143 ms Hit ENTER to exit... Ryan |
| ||
Plain: 204 MMX: 154 on a P4 |
| ||
Plain: 482 MMX : 83 AMD64 3200+ here. About the whole MMX thing, you plannin on releasin a module or something which users could use a flag in their code to use to activate the module for operations if the user's system supports it for some speed gain? |
| ||
Plain: 957 ms MMX : 166 ms AMD Athlon XP1700+ 1.48Ghz 512 MRam WinXP Pro SP2 |
| ||
------------------------------------------------------------------------ Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies) ------------------------------------------------------------------------ ------------------------------------------------------------------------ Testing plain MemCopy... ------------------------------------------------------------------------ 483 ms ------------------------------------------------------------------------ Testing MMX MemCopy... ------------------------------------------------------------------------ 82 ms Hit ENTER to exit... AMD ATHLON64 3200 512MB Ram WinXP Home SP2 And it doesn't go from 11 sec to 2 secs It goes from 1.1 sec to 0.2 secs because 1000 ms = 1 sec. |
| ||
Plain: 195 MMX: 195 |
| ||
Greetings Puppies, ------------------------------------------------------------------------ Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies) ------------------------------------------------------------------------ ------------------------------------------------------------------------ Testing plain MemCopy... ------------------------------------------------------------------------ 478 ms ------------------------------------------------------------------------ Testing MMX MemCopy... ------------------------------------------------------------------------ 76 ms Hit ENTER to exit... AMD 64 3400+ @2.41Ghz, 1Gb Ram, Win XP Pro Peace, Jes |
| ||
About the whole MMX thing, you plannin on releasin a module or something which users could use a flag in their code to use to activate the module for operations if the user's system supports it for some speed gain? The only thing such a module could contain for now is a faster MemCopy (and MMX/CPUID detector), just because I'm not an assembly programmer -- I simply copied and pasted it, then compiled it into a lib. I do know of a few more bits and pieces that might work as well. I'll release it anyway, just so anyone who might be wondering how to call assembly code from Max can look at a simple example. It's taken from TinyPTC, the 2D pixel-blasting library, which should be good for the low level (non-interactive) demo coders out there -- that works nicely with Max so I'm going to do a little mod soon. I'd still like to find an example of it failing nicely on non-MMX machines, just so I know it doesn't crash out... And it doesn't go from 11 sec to 2 secs It goes from 1.1 sec to 0.2 secs because 1000 ms = 1 sec. I know that... but I said one million iterations goes from 11 seconds to 2 seconds, just to illustrate the kind of difference this makes*. This test does one hundred thousand iterations... so, :P * Admittedly, 1 million 10MB copies isn't exactly an everyday requirement for most people... |
| ||
Sorry My fault. I have misunderstood this statement. Again Sorry |
| ||
I am looking foward seeing the source as i like tinker in assembly |
| ||
------------------------------------------------------------------------ Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies) ------------------------------------------------------------------------ ------------------------------------------------------------------------ Testing plain MemCopy... ------------------------------------------------------------------------ 208 ms ------------------------------------------------------------------------ Testing MMX MemCopy... ------------------------------------------------------------------------ 237 ms specs in sig |
| ||
Desktop - P4 Prescott 540: Plain: 776ms MMX: 191ms And this is the last time I've bought an Intel CPU. |
| ||
AMD Athlon 2000+ 850 ms 151 ms |
| ||
549ms 91ms AMD64 3200+ yay! |
| ||
881 ms 133 ms Athlon 2800+ |
| ||
------------------------------------------------------------------------ Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies) ------------------------------------------------------------------------ ------------------------------------------------------------------------ Testing plain MemCopy... ------------------------------------------------------------------------ 520 ms ------------------------------------------------------------------------ Testing MMX MemCopy... ------------------------------------------------------------------------ 91 ms Hit ENTER to exit... Ran on the machine w/the Athlon in my sig below. |
| ||
------------------------------------------------------------------------ Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies) ------------------------------------------------------------------------ ------------------------------------------------------------------------ Testing plain MemCopy... ------------------------------------------------------------------------ 1029 ms ------------------------------------------------------------------------ Testing MMX MemCopy... ------------------------------------------------------------------------ 183 ms Hit ENTER to exit... Athlon XP 1600+, 384 MB, Geforce4Ti, WIN 2000 sp4 |
| ||
It's an Intel 1.9GHz P4, not HT, old socket 478 or whatever with 1024GB PC800 RDRAM |
| ||
AMD Athlon XP +2000 1,67ghz 881ms 149ms |
| ||
P4 2.8 HT Plain 200ms MMX 162ms |
| ||
AMD Athlon 64 3400+ @ 2,4Ghz Plain 468 MMX 76 |
| ||
AMD ath1800 Plain 991ms MMX 168ms |
| ||
what crap processors you all have ;-) Intel P-M 1MB 2nd lvl cache (old P-M mobile processor) @ 1,5ghz ------------------------------------------------------------------------ Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies) ------------------------------------------------------------------------ plain 140 ms mmx 118 ms |
| ||
AMD DURON 1GHZ plain : 1116 mmx : 190 |
| ||
plain : 195 mmx : 194 On my work PC : Intel P4 2.4 GHz |
| ||
824 vs 143, AMD XP2100, 512 DDR, Radeon9600xt |