AMD athlon slow?
BlitzMax Forums/BlitzMax Programming/AMD athlon slow?
| ||
this test seems to show different results on AMD vs intel and PPC. can others verify and or explain these results? also can someone post numbers for a new AMD proc? is there anything wrong with the test method? my timings below seem to scale the same on each respective computer but that AMD is about 2x slower processing a simple list than doing an array. intel core2 intel core2 ppc 800 MHz, AMD 2.13 GHz duo 4GHz, duo 2.16 GHz, Mac 10.3.9 atholon(old) Windows Vista Mac 10.5.6 Windows XP objects 1,000,000 1,000,000 250,000 250,000 array internal 41 41 46 47 TList internal 42 1.02 43 1.05 65 1.41 66 1.40 list internal 40 0.98 40 0.98 40 0.87 87 1.85 array external 41 1.00 40 0.98 47 1.02 46 0.98 I even coded a test adding an enumerator to the type so I could use eachin. that resulted in the same timings. |
| ||
for my system:array internal: 13 TList internal: 19 list internal: 16 array external: 12 |
| ||
heres mine at 500,000 array internal: 30 TList internal: 34 list internal: 31 array external: 29 at 1,000,000 array internal: 67 TList internal: 62 list internal: 37 array external: 35 thanks for this! It will definitely speed up my physics engine! |
| ||
On my work's new DELL machine - Intel Core2 Duo ~2.33Ghz Intel Q35 Express Array internal: 19 TList internal: 64 List internal: 20 Array External:20 My ancient (4years+) Athlon 64 3000+ laptop ~ 800Mhz ATI Radeon 9600 Array internal: 33 TList internal: 52 List internal: 42 Array External: 33 Considering the age difference, that's not soo bad. Both on Windows XP |
| ||
thanks guys. so Nate you have an Intel then right? take this test with a grain of salt though... it's not a good test at all to compare the actual number of ms. more over the percent gain/loss between the methods is what I was looking for. maybe something is being optimized on AMDs for array loop or vice-versa |
| ||
Intel core 2 Duo 2.4GHz array internal: 19 TList internal: 21 list internal: 36 array external: 18 |
| ||
xlsior, did you run that just once? is multiple runs consistent with these numbers? |
| ||
Yeah I have Intel core 2 Duo 2.6 GHz |
| ||
xlsior, did you run that just once? is multiple runs consistent with these numbers? I ran it a bunch of times -- the numbers do fluctuate quite a bit, but arrays are pretty much always faster than lists for me (which mirrors my own tests I've done in the past). I just ran the test 10 times in a row (using 1,000,000)-- arrays won 9 times, lists 1. Something else to consider other than just AMD vs. Intel: CPU cache. Different chip models have different amounts of on-chip cache, and if your data happens to fit inside the CPU cache it's much faster than having to fetch parts of it from RAM. In general the Intel's have more cache than most AMD models. The Core 2 Duo that I have has 4MB cache, but there are other models that have less. |
| ||
500,000 array internal: 12 TList internal: 13 list internal: 12 array external: 12 1,000,000 array internal: 24 TList internal: 26 list internal: 24 array external: 25 I have the 4MB L2 cache as well. |
| ||
500,000 tested on Athlon64 3500+ (2,2Ghz) array internal: 18 TList internal: 22 list internal: 21 array external: 18 |
| ||
Linking:untitled1 Executing:untitled1 array internal: 14 TList internal: 14 list internal: 13 array external: 14 intel core duo 5200 at 3 .05 GHZ and on a P4 with 3gHZ I GET Linking:untitled1.exe Executing:untitled1.exe array internal: 42 TList internal: 49 list internal: 53 array external: 46 |
| ||
I think xlsior got it - AMD chips have much less L1, L2, (and L3) cache than Intel. You have to effectively pre-fetch, which i'm not sure is possible with Bmax. AMD will generally have better memory bandwidth, and that will scale better with multi-core than Intel, but any cache friendly application Intel wins. Look into Spec-FP and Spec-Int benchmarks. Try getting AMD CPU driver from their website and AMD Dual-Core optimizer, probably wont do much in your case, but worth a shot. I think they fixed some synchronization problems. Please let know if you figure this out. |
| ||
array internal: 38 TList internal: 120 list internal: 50 array external: 40 Strange Tlist result. AMD athlon 64 X2 6100+ @Iprice My ancient (4years+) Athlon 64 3000+ laptop ~ 800Mhz Yet I bet the processor back then was expensive as hell =p 2800 4~5 years ago cost me 500$ at tigerdirect. |
| ||
@HrdNutz, yeah that makes sense except that xlsior's *was* intel with a big cache... his results are out of the norm according to the rest of the thread... it looks more like the AMD results. |
| ||
It was indeed expensive, but it's not caused me any problems, unlike my desktop which I got at exactly the same time. I thought laptops were supposedly less reliable - not in my case. My desktop has gone through 2 motherboards, 3 PSUs and a couple of GFX cards. My laptop is waaaaay underspecced for games nowadays, but perfect still for my programming needs. It plays HalfLife 2 and Doom3 lovely though :) |
| ||
@HrdNutz, yeah that makes sense except that xlsior's *was* intel with a big cache... his results are out of the norm according to the rest of the thread... it looks more like the AMD results. In case it makes a difference: I'm running the 64-bit version of Vista |
| ||
I don't think so as so am i... hmmm. |
| ||
try disabling Cool&Quiet on AMD (if enabled in BIOS) see if that makes any difference. |
| ||
There's also variations among the Core Duo lines of course -- Mine's an E6600 Conroe @65nm |
| ||
500'000 array internal: 4 TList internal: 9 list internal: 6 array external: 4 1'000'000 array internal: 9 TList internal: 17 list internal: 12 array external: 9 Core i7 920 (2.83Ghz), 6GB TriChannel As for your question: one of the major problems with old AMDs is their inexistant L2 cache. 256kb/512kb are nice for small things but at 500k entries, thats 2mb of pure pointer data already, so a lot of cache misses and data requests and transfers from the RAM which cost a lot of time Arrays work because they are aligned so worlds less cache misses which is the real breaking point here. Every cache miss means transfer from RAM to CPU Also the low l2 cache forces the CPU to swap in many blocks when requesting the actual data of the entries. For the array again those entries are aligned in RAM to a much higher degree, so less cache misses on the actual data as well The internal local just makes this problem worse by allocating new variables over and over again. |
| ||
Core 2 duo 7200 (2.53ghz) XP pro 500'000 array internal: 25 TList internal: 26 list internal: 24 array external: 25 1'000'000 array internal: 53 TList internal: 53 list internal: 51 array external: 54 |