MMX-accelerated MemCopy test (Win32 only)

BlitzMax Forums/BlitzMax Programming/MMX-accelerated MemCopy test (Win32 only)

BlitzSupport(Posted 2005) [#1]
I've somehow managed to assemble and import some MMX assembly code, but it's just copied-and-pasted so if this test doesn't work it'll be kinda tough! Could people give this a whirl and post their output? I'd be particularly interested to know if it exits politely for CPUs with no MMX support. Save any work before running (at your own risk), though it should be fine!

I also have TinyPTC working with MMX acceleration, so if this test works I'll tidy it all up and release it (once Mark fixes a minor bug-ette in importing pre-compiled libs!).

My output:

------------------------------------------------------------------------
Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies)
------------------------------------------------------------------------

------------------------------------------------------------------------
Testing plain MemCopy...
------------------------------------------------------------------------
1130 ms

------------------------------------------------------------------------
Testing MMX MemCopy...
------------------------------------------------------------------------
199 ms

(One million iterations goes from 11 seconds to 2 seconds!)


N(Posted 2005) [#2]
Plain: 581ms
MMX: 111ms


Perturbatio(Posted 2005) [#3]
Plain:628ms
MMX:101ms

(my CPU does not report MMX capability (SSE2/3DNOW!))


AdrianT(Posted 2005) [#4]
Plain: 540 ms
MMX: 92 ms


OldNESJunkie(Posted 2005) [#5]
Plain: 266 ms
MMX: 231 ms


BlitzSupport(Posted 2005) [#6]
Thanks all... interesting that it seems a pretty much consistent 5x speed increase.

(my CPU does not report MMX capability (SSE2/3DNOW!))


It must report it via the CPU ID flags the code checks, as it wouldn't have run otherwise. (A quick Google shows your Athlon64 3000 does support MMX anyway.)

Lapgod: what CPU make/model do you have? Doesn't look like it's accelerated at all with MMX there, yet ran really fast!


Snarkbait(Posted 2005) [#7]
Strange, I get 176ms for both. p4-3.2ghz


BlitzSupport(Posted 2005) [#8]
Interesting -- I'm guessing Lapgod has a P4 as well now.

This mumbo-jumbo seems to suggest the P4 doesn't, (or didn't?) make full use of MMX: http://www.tommesani.com/P4MMX.html

"Summing up, the P4 can issue only one MMX instruction per cycle, and the latency is at best twice that on the older Pentium III processor. In pathological conditions, this adds up to bring P4's SIMD performance down to about one third P-III's. Until the P4 ramps up into the 2+ GHz frequency range, its integer SIMD execution speed will simply lag behind the venerable P6 core."

Maybe Intel never quite got it figured out!


Snarkbait(Posted 2005) [#9]
I'm getting 262 plain and 278 mmx on my other computer, also P4 (1.8ghz)


StuC(Posted 2005) [#10]
Desktop (Northwood):
------------------------------------------------------------------------
Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies)
------------------------------------------------------------------------

------------------------------------------------------------------------
Testing plain MemCopy...
------------------------------------------------------------------------
196 ms

------------------------------------------------------------------------
Testing MMX MemCopy...
------------------------------------------------------------------------
150 ms


Laptop (Prescott):
------------------------------------------------------------------------
Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies)
------------------------------------------------------------------------

------------------------------------------------------------------------
Testing plain MemCopy...
------------------------------------------------------------------------
228 ms

------------------------------------------------------------------------
Testing MMX MemCopy...
------------------------------------------------------------------------
139 ms


Cheers,

Stu


N(Posted 2005) [#11]
On my laptop..

Plain: 207ms
MMX: 111ms

Laptop is a Pentium M 1.6ghz (512mb RAM)

And, as I've mentioned before, my desktop is an Athlon 64 3000+ (1gb RAM).


Sweenie(Posted 2005) [#12]
------------------------------------------------------------------------
Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies)
------------------------------------------------------------------------

------------------------------------------------------------------------
Testing plain MemCopy...
------------------------------------------------------------------------
196 ms

------------------------------------------------------------------------
Testing MMX MemCopy...
------------------------------------------------------------------------
188 ms


Desktop, P4 2.5 Ghz (512 Mb RAM)


Azathoth(Posted 2005) [#13]
Plain: 196 ms
MMX: 175 ms


GregBUG(Posted 2005) [#14]
Plain: 535 ms
MMX: 90 ms

my cpu Athlon64 3200


Difference(Posted 2005) [#15]
Plain: 728
MMX: 128

AMD Barton ~2800XP+ (slightly underclocked 3000XP+)


Robert(Posted 2005) [#16]
Plain: 130ms
MMX: 106ms


LineOf7s(Posted 2005) [#17]
Plain: 1733
MMX: 286

(1733?!! *kicks computer*)


Snader(Posted 2005) [#18]
Pentium 4..

Plain : 170 ms

MMx: 169 ms


Matthew Smith(Posted 2005) [#19]
Pentium4 2.4Ghz (overclocked to 3.0Ghz)

Plain: 177ms
MMX: 147ms


Sarge(Posted 2005) [#20]
Plain: 160
MMX: 159


BlitzSupport(Posted 2005) [#21]
Thanks all. Pretty interesting how the P4 doesn't gain much at all from Intel's own MMX calls (yet is fast enough that it doesn't matter) while the Athlons gain a solid 5x speed increase to near enough match the P4s. Ya learn something every day.

Still hoping to see it failing politely for someone with a non-MMX CPU!


Ryan Moody(Posted 2005) [#22]
On my infamously old computer:

------------------------------------------------------------------------
Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies)
------------------------------------------------------------------------

------------------------------------------------------------------------
Testing plain MemCopy...
------------------------------------------------------------------------
4801 ms

------------------------------------------------------------------------
Testing MMX MemCopy...
------------------------------------------------------------------------
1143 ms

Hit ENTER to exit...


Ryan


taumel(Posted 2005) [#23]
Plain: 204
MMX: 154

on a P4


Genexi2(Posted 2005) [#24]
Plain: 482
MMX : 83

AMD64 3200+ here.

About the whole MMX thing, you plannin on releasin a module or something which users could use a flag in their code to use to activate the module for operations if the user's system supports it for some speed gain?


semar(Posted 2005) [#25]
Plain: 957 ms
MMX : 166 ms

AMD Athlon XP1700+ 1.48Ghz 512 MRam WinXP Pro SP2


klepto2(Posted 2005) [#26]
------------------------------------------------------------------------
Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies)
------------------------------------------------------------------------

------------------------------------------------------------------------
Testing plain MemCopy...
------------------------------------------------------------------------
483 ms

------------------------------------------------------------------------
Testing MMX MemCopy...
------------------------------------------------------------------------
82 ms

Hit ENTER to exit...

AMD ATHLON64 3200 512MB Ram WinXP Home SP2

And it doesn't go from 11 sec to 2 secs
It goes from 1.1 sec to 0.2 secs because 1000 ms = 1 sec.


fredborg(Posted 2005) [#27]
Plain: 195
MMX: 195


SoggyP(Posted 2005) [#28]
Greetings Puppies,


------------------------------------------------------------------------
Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies)
------------------------------------------------------------------------

------------------------------------------------------------------------
Testing plain MemCopy...
------------------------------------------------------------------------
478 ms

------------------------------------------------------------------------
Testing MMX MemCopy...
------------------------------------------------------------------------
76 ms

Hit ENTER to exit...

AMD 64 3400+ @2.41Ghz, 1Gb Ram, Win XP Pro

Peace,

Jes


BlitzSupport(Posted 2005) [#29]

About the whole MMX thing, you plannin on releasin a module or something which users could use a flag in their code to use to activate the module for operations if the user's system supports it for some speed gain?


The only thing such a module could contain for now is a faster MemCopy (and MMX/CPUID detector), just because I'm not an assembly programmer -- I simply copied and pasted it, then compiled it into a lib. I do know of a few more bits and pieces that might work as well. I'll release it anyway, just so anyone who might be wondering how to call assembly code from Max can look at a simple example.

It's taken from TinyPTC, the 2D pixel-blasting library, which should be good for the low level (non-interactive) demo coders out there -- that works nicely with Max so I'm going to do a little mod soon.

I'd still like to find an example of it failing nicely on non-MMX machines, just so I know it doesn't crash out...


And it doesn't go from 11 sec to 2 secs
It goes from 1.1 sec to 0.2 secs because 1000 ms = 1 sec.

I know that... but I said one million iterations goes from 11 seconds to 2 seconds, just to illustrate the kind of difference this makes*. This test does one hundred thousand iterations... so, :P

* Admittedly, 1 million 10MB copies isn't exactly an everyday requirement for most people...


klepto2(Posted 2005) [#30]
Sorry My fault. I have misunderstood this statement.

Again Sorry


srvaldez(Posted 2005) [#31]
I am looking foward seeing the source as i like tinker in assembly


DaY(Posted 2005) [#32]
------------------------------------------------------------------------
Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies)
------------------------------------------------------------------------

------------------------------------------------------------------------
Testing plain MemCopy...
------------------------------------------------------------------------
208 ms

------------------------------------------------------------------------
Testing MMX MemCopy...
------------------------------------------------------------------------
237 ms


specs in sig


FlameDuck(Posted 2005) [#33]
Desktop - P4 Prescott 540:
Plain: 776ms
MMX: 191ms

And this is the last time I've bought an Intel CPU.


RGR(Posted 2005) [#34]
AMD Athlon 2000+
850 ms
151 ms


Regular K(Posted 2005) [#35]
549ms
91ms

AMD64 3200+

yay!


xlsior(Posted 2005) [#36]
881 ms
133 ms

Athlon 2800+


Dragon57(Posted 2005) [#37]
------------------------------------------------------------------------
Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies)
------------------------------------------------------------------------

------------------------------------------------------------------------
Testing plain MemCopy...
------------------------------------------------------------------------
520 ms

------------------------------------------------------------------------
Testing MMX MemCopy...
------------------------------------------------------------------------
91 ms

Hit ENTER to exit...

Ran on the machine w/the Athlon in my sig below.


degac(Posted 2005) [#38]
------------------------------------------------------------------------
Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies)
------------------------------------------------------------------------

------------------------------------------------------------------------
Testing plain MemCopy...
------------------------------------------------------------------------
1029 ms

------------------------------------------------------------------------
Testing MMX MemCopy...
------------------------------------------------------------------------
183 ms

Hit ENTER to exit...

Athlon XP 1600+, 384 MB, Geforce4Ti, WIN 2000 sp4


OldNESJunkie(Posted 2005) [#39]
It's an Intel 1.9GHz P4, not HT, old socket 478 or whatever with 1024GB PC800 RDRAM


nawi(Posted 2005) [#40]
AMD Athlon XP +2000 1,67ghz

881ms
149ms


Eikon(Posted 2005) [#41]
P4 2.8 HT

Plain 200ms
MMX 162ms


regaa(Posted 2005) [#42]
AMD Athlon 64 3400+ @ 2,4Ghz

Plain 468
MMX 76


Chip&Chop(Posted 2005) [#43]
AMD ath1800
Plain 991ms
MMX 168ms


Dreamora(Posted 2005) [#44]
what crap processors you all have ;-)

Intel P-M 1MB 2nd lvl cache (old P-M mobile processor) @ 1,5ghz

------------------------------------------------------------------------
Plain MemCopy vs. MMX MemCopy (100000 x 10MB copies)
------------------------------------------------------------------------
plain 140 ms
mmx 118 ms


Panno(Posted 2005) [#45]
AMD DURON 1GHZ

plain : 1116
mmx : 190


JPL(Posted 2005) [#46]
plain : 195
mmx : 194

On my work PC : Intel P4 2.4 GHz


Grisu(Posted 2005) [#47]
824 vs 143, AMD XP2100, 512 DDR, Radeon9600xt