1,000,000 primes in 6.9 seconds!

BlitzMax Forums/BlitzMax Programming/1,000,000 primes in 6.9 seconds!

Nate the Great(Posted 2010) [#1]
so I was reading some wiki articles on number theory and prime numbers and I had an idea for a faster prime number program. This one finds the first 1,000,000 prime numbers in 6.9 2.3 0.4 seconds on my computer which is not much faster than average. Don't be fooled, it finishes calculating the numbers before it displays them. My IDE freezes when bmax is printing the numbers and then unfreezes several seconds later after it has printed all the numbers so I have left the thing that displays the numbers out but you can just uncomment it if you want to see them.

edit: new faster method
Global num:Int = 15485864
Global sqrnum:Int = Sqr(num)+1
Global PrimeFlags:Byte[num]
Global Primes:Int[num]
Local tim:Int
Local mill:Int = MilliSecs()
For Local x:Int = 2 Until sqrnum
	If primeflags[x] = 0 Then
		tim = x
		Repeat
			tim:+x
			primeflags[tim] = True
		Until x+tim => num
	EndIf
Next 
Local count:Int
For Local y:Int = 2 Until num
	If primeflags[y] = False Then
		primes[count] = y
		count:+1
		
	EndIf
Next
mill = MilliSecs()-mill
Print mill/1000.0 +" seconds"
Rem
For Local i:Int = 0 Until count
	Print primes[i]
Next
EndRem
Print count + " primes found!"




heres the old code:

Local PR:Int = 3
Local primelist:Int[1000001]
Local prcnt:Int = 1
primelist[0] = 2

Local n:Int = 0
Local flag:Byte = False

Local c:Int = MilliSecs()
Repeat
	n = 0
	flag = False
	While primelist[n]*primelist[n] <= PR
		If ((pr Mod primelist[n])=0) Then
			flag = True
			Exit
		EndIf
		n:+1
	Wend
	If Not flag
		primelist[prcnt] = pr
		prcnt:+1
	EndIf
	pr:+2
Until prcnt = 1000000
c = MilliSecs()-c

'For i:Int = 0 Until 1000000
'	Print primelist[i]
'Next

Print "First 1,000,000 primes in "+c/1000.0+" seconds!"


UPDATED

if you want to see the numbers without risking freezing your computer, you could just display the first 1000 or the last 1000

be sure NOT to run in DEBUG MODE

Last edited 2010

Last edited 2010

Last edited 2010

Last edited 2010

Last edited 2010


xlsior(Posted 2010) [#2]
Just out of curiosity, what processor do you have in your PC?
It takes 12.91 seconds on my computer (Core 2 Duo E6600 @ 2.4GHz, windows 7 x64)


Nate the Great(Posted 2010) [#3]
core 2 duo t8300 @ 2.4 ghz & 2.4 ghz
this is what my control panel says... although I thought it was 2.6 ghz until now...

and its win vista 32 bit

make sure you are not in debug and you have no other programs running... i consistently get 6.9 something, never more or less than that for the first 2 digits

ok just did it in debug and got 14.5 seconds.. maybe thats youre problem?

youre processor http://ark.intel.com/Product.aspx?id=27250
mine http://ark.intel.com/Product.aspx?id=33099

they look almost identicle to my untrained eye

edit: looks like the time it takes to do this is more affected by read/write speeds for your ram considering it reads a ton of prime numbers off of your ram for every number it tests.

Last edited 2010


Dabhand(Posted 2010) [#4]

First 1,000,000 primes in 5.90299988 seconds!



On bootcamped XP!

Intel i3 CPU 550@...

Dabz


PowerPC603(Posted 2010) [#5]
I got 15.5 seconds by running this code on my laptop. (non-debug mode)
With debug mode on, I got 23.97 seconds.

Win Vista 32 bit, processor: Core 2 Duo T7200 @ 2GHz.

Last edited 2010


Pengwin(Posted 2010) [#6]
Just for comparison, I ran this on my iMac (2.66 GHz Intel Core 2 Duo) with OSX 10.5.8

I got 6.339 seconds for non-debug and 15.796 seconds for debug mode.


Czar Flavius(Posted 2010) [#7]
If you want a challenge make a multithreaded one!


zzz(Posted 2010) [#8]
The oiginal code ran in 8.71700001 seconds on a Core 2 Duo T6400. Changing the line
If (pr/(primelist[n]*1.0)) = Int(pr/primelist[n]) Then

to
If ((pr Mod primelist[n])=0) Then

made it run in 2.86400008 seconds instead :p

Last edited 2010

Last edited 2010


Arabia(Posted 2010) [#9]
It took 2hrs and 23minutes on my Vic 20 :(


Nate the Great(Posted 2010) [#10]
@zzz thanks! I cant believe I overlooked that! thanks

edit: updated!

Last edited 2010


Hummelpups(Posted 2010) [#11]
:D

First 1,000,000 primes in 1.79700005 seconds!


Nate the Great(Posted 2010) [#12]
hmmm translated it to ti-basic and it takes at least several hours if not days on my calculator. I didnt want to run my batteries down!


Shortwind(Posted 2010) [#13]
:D

Last edited 2010


Me.32(Posted 2010) [#14]
translated to C:
<5.809s> on my Slow 2GHz Core 2 Duo iMac :D

BMax:
<5.918s> on the same machine

iMac, Intel Core 2 Dou 2.0 GHZ

<update>
C:
<5.726s> with LLVM 1.5 on same Machine :D

Last edited 2010


Shortwind(Posted 2010) [#15]
I guess my computer sucks. I've got a (dual) 2.5ghz cpu, and in both win 7 (32bit) and ubuntu (32bit) it runs in 10 seconds, over and over again. hmm, time to upgrade I guess.

;(


PowerPC603(Posted 2010) [#16]
After the code-change (suggested by zzz), it now runs in 5.97 seconds instead of 15.5 seconds on my laptop.


Shortwind(Posted 2010) [#17]
Nate, or someone else, here is the sieve of eratosthene. Super fast.

Done in 0.962000012 seconds.
Total primes generated: 1000000


This is the time on my computer.

Can someone help me clean up the code. It's needs to be better able to find a specific number of primes.

But I think you'll agree this is much faster.
After the prime arrary is built, it could be turned into a bit table and take up far less ram.

The end for loop is to print sections of the prime array to verify the numbers are actually primes.

Have fun.

P.S. Don't program when your drunk! ha ha

SuperStrict

Const stopme:Int=100000000
Local flags:Int[stopme+1]
Local i:Int
Local theprimes:Int[1000000]

For i=0 To stopme
	flags[i]=-1
Next

Local z:Int=0
Local endprime:Int=15485863
Local test:Int=2
Local x:Int=0
Local count:Int=0

z=MilliSecs()

For Local test:Int=2 To endprime

	If flags[test]<>-1 Then
		Continue
	EndIf
	
	x=test
	If flags[test]=-1 Then
		theprimes[count]=test
		count=count+1
		flags[test]=test
	EndIf

	Repeat
		flags[test+x]=test
		x=x+test
	Until x>=endprime

Next

z=MilliSecs()-z

Print "Done in "+z/1000.0+" seconds."
Print "Total primes generated: "+count


For i=25 To 50
	Print theprimes[i]
Next



Last edited 2010

Last edited 2010

Last edited 2010

Last edited 2010

Last edited 2010


Shortwind(Posted 2010) [#18]
Hmm. my bit array for the first 1,000,000 primes is still 125,000 bytes long.... :)


Nate the Great(Posted 2010) [#19]
thats interesting, I dont see how using that method would be twice as fast as mine... but i get 1.17 seconds with yours which is twice as fast as mine and 7 times as fast as the original! :p I wonder what the fastest way ever though of is to find primes... I cant seem to find many methods for finding prime numbers.


Matty(Posted 2010) [#20]
Isn't there something called the "Sieve of Eratosthenes " or something - very quick to find primes....

http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes


Shortwind(Posted 2010) [#21]
Um, LOL, Matty, that's what I programmed. Not very well, but it does work.

Nate:

The sieve is super fast because it's doing what the computer does best:
count... (no division, just simple memory copying and very simple condition loops...)


Using elipitical curves is the fastest method for large numbers, but not small numbers.

Sieve of Eratosthenes is the fastest for most primes, but of course you have to have the memory to build the array.

For my program (badly written) it only needs:
Total Loops: 47,086,551

Your program needs:
Total Loops: 504,508,950

The only real fast way, is to create a table (bit database) of precalculated primes and just do a lookup. That is really the fastest way if your program is going to need to access primes very frequently. (Factoring for example.)

If your interested start here:
http://primes.utm.edu/prove/

And if you want to make your head hurt go here:
http://polymathprojects.org/category/finding-primes/

And of course you could also start with the obvious:
http://en.wikipedia.org/wiki/Integer_factorization

:D

Last edited 2010


Matty(Posted 2010) [#22]
Sorry Shortwind, I didn't read all the posts (my bad...)


Jesse(Posted 2010) [#23]
I doubt this will make much difference to anyone here as everyone seem to have pretty fast computers but by removing the second if statement in the for loop and removing one unnecessary calculation in the last demo, I was able to reduce the time on my wife's ancient imac 1.2ghz from 13 secs to 7.5 secs. I tried it in my netbook and it was prety much unchanged.
SuperStrict

Const stopme:Int=100000000
Local flags:Int[stopme+1]
Local i:Int
Local theprimes:Int[1000000]

For i=0 To stopme
	flags[i]=-1
Next

Local z:Int=0
Local endprime:Int=15485863
Local test:Int=2
Local x:Int=0
Local count:Int=0

z=MilliSecs()

For Local test:Int=2 To endprime

	If flags[test]<>-1 Then
		Continue
	EndIf
	
	x=test
	theprimes[count]=test
	count=count+1
	flags[test]=test

	Repeat
		x:+test
		flags[x]=test
	Until x>=endprime

Next

z=MilliSecs()-z

Print "Done in "+z/1000.0+" seconds."
Print "Total primes generated: "+count


For i=25 To 50
	Print theprimes[i]
Next


Last edited 2010


Shortwind(Posted 2010) [#24]
LOL, thanks Jesse. I didn't see that until you pointed it out. :D

It's better now, but I still code for crap on the fly. Ha! Ha!

:D


zzz(Posted 2010) [#25]
Thats nice :p Feels a little weird that 10x more iterations only doubles the time taken. Also, theres a sieve of atkin (sp?) around, which, if i dont remember wrong uses some clever way to calculate prime candidates instead of just working through the entire 2..n range like we are doing here.


zzz(Posted 2010) [#26]
Wrote a simple sieve of atkin based on the wikipedia pseudocode;



The original function runs in about 2,9 seconds, the "other sieve" :o takes about 1,4 seconds on average and this one does the work in just under 0,8 for me :D

Supposedly this one can be optimized dramatically also.


Shortwind(Posted 2010) [#27]
I completely forgot about this one. This sieve only needs 16,911,909 loops to find 1,000,862 primes. Nice! And doesn't require the all the extra memory that sieve of Eratosthenes. Cool.

On one final note it should be known that all primes greater than 6 are of the form:

6k + or - 1


:D

Last edited 2010


Nate the Great(Posted 2010) [#28]
0.4 seconds!

I cant believe how much faster its gotten. This is kind of a mix of the 4 ideas presented above by zzz, jessie and shortwind and me. I don't know what you would call it since I just combined ideas but its a sieve I guess.

Global num:Int = 15485864
Global sqrnum:Int = Sqr(num)+1
Global PrimeFlags:Byte[num]
Global Primes:Int[num]
Local tim:Int
Local mill:Int = MilliSecs()
For Local x:Int = 2 Until sqrnum
	If primeflags[x] = 0 Then
		tim = x
		Repeat
			tim:+x
			primeflags[tim] = True
		Until x+tim => num
	EndIf
Next 
Local count:Int
For Local y:Int = 2 Until num
	If primeflags[y] = False Then
		primes[count] = y
		count:+1
		
	EndIf
Next
mill = MilliSecs()-mill
Print mill/1000.0 +" seconds"
Rem
For Local i:Int = 0 Until count
	Print primes[i]
Next
EndRem
Print count + " primes found!"


Last edited 2010

Last edited 2010

Last edited 2010


zzz(Posted 2010) [#29]
0.52 seconds here, Nate is on top again :p
EDIT: just figured out how i can make some massive improvements on the atkin sieve ;D

Last edited 2010


ima747(Posted 2010) [#30]
8 core 2010 2.4ghz Mac pro (under OS X, under vista in bootcamp first post variant was actually a tiny bit slower, tough oddly the same install through parallels was the same as the mac... very wierd)

First post variant (current form with the first optimization update) 2.48600006 solidly

nate's release 2 posts from here up 0.186000004 to 0.189999998

running stuff in the background but that shouldn't matter...

would REALLY love to see a multi-threaded variant... 8 cores, with hyper-threading wants to know... :0)

Last edited 2010

Last edited 2010


ima747(Posted 2010) [#31]
nevermind this post, was just OS randomness

Last edited 2010


zzz(Posted 2010) [#32]
@im747, couldnt see any difference in the generated assembly so i guess its just a 'placebo' effect :)


zzz(Posted 2010) [#33]
Got the aktin sieve down to 0.58 seconds with some basic optimizations, Nates latest version is still slightly faster :p



Last edited 2010


zzz(Posted 2010) [#34]
Slight improvement on nates latest code, keep finishing at around 0,49 seconds for me. Will probably run slower on older machines.




Nate the Great(Posted 2010) [#35]
first 10,000,000 primes in 6.0 seconds

SuperStrict
Framework brl.standardio
Import brl.math

Global num:Int = 179424674
Global sqrnum:Int = Sqr(num)+1
Global PrimeFlags:Byte[num]
Global Primes:Int[num]
Local tim:Int
Local mill:Int = MilliSecs()
For Local x:Int = 2 Until sqrnum
	If primeflags[x] = 0 Then
		tim = x
		Repeat
			tim:+x
			primeflags[tim] = True
		Until x+tim => num
	EndIf
Next 
Local count:Int=0
primes[count]=2
count:+1
For Local y:Int = 3 Until num Step 2
	primes[count]=y
	count:+(primeflags[y]=False)
Next
mill = MilliSecs()-mill
Print mill/1000.0 +" seconds"
Rem
For Local i:Int = 0 Until count
	Print primes[i]
Next
EndRem
Print count + " primes found!"


thanks for the optimization zzz


ima747(Posted 2010) [#36]
Sweet

2.87800002 seconds
10000000 primes found!


zzz(Posted 2010) [#37]
Nice :P Finished in 7.13600016 seconds for me


zzz(Posted 2010) [#38]


Modified Nates a bit, finishes in around 6,85 for me now.


zzz(Posted 2010) [#39]


A real mess, i tend to not delete any code i write when "scribbling" like this, instead putting comments all over the place :P

Anyways, nates latest sieve inspired me to write this one, the speed increase is pretty much only from reduced memory overhead. Churns out a million primes in ~0,145 seconds and 10 million in ~4,9 seconds on my core2duo t6400 (2ghz).

EDIT: cleaned up the code
EDIT2: small change suggested by Czar
EDIT3: changed the prime counting code a little, now doing ~0,11s/1m primes for me.

Last edited 2010

Last edited 2010

Last edited 2010


Nate the Great(Posted 2010) [#40]
Hey zzz, what are the masks for? I dont see how the masks are used to optimize it?


Czar Flavius(Posted 2010) [#41]
Will making the masks constant improve speed more?


zzz(Posted 2010) [#42]
I used the div/mod mask to get rid of division. cpus calculate the remainder as a byproduct when doing division so the div/idiv opcodes gets used for either operation, and theyre really slow.. And with a fixed known value its pretty straightforward to use bitwise operators instead.
EDIT: that doesnt really have much impact on the code, at least not in a positive way. the main speed increase comes from reducing the memory footprint by using bits for flags instead of bytes. The extra fiddling is outweighted by the reduced memory overhead by a lot :) (and thats also a big reason to why it isnt much faster on the 10m prime test)

The third 'mask' is just a quickie way to mark of all even numbers, 16 numbers at a time :o

@Czar: that works indeed, got it down to ~0,12seconds/1m primes for me :)

sample code:


Last edited 2010

Last edited 2010

Last edited 2010

Last edited 2010

Last edited 2010


zzz(Posted 2010) [#43]
Its now doing 1m primes in under 0,1 seconds for me :P, getting steady 0,097/0,099 second results. 10m primes in ~4,5 seconds.



Last edited 2010


Shortwind(Posted 2010) [#44]
This is very exciting watching this thread over the last few days.

zzz, Nate, super fun!

Still analyzing your last bit of code zzz. A pretty original line of thought there!


:D


Nate the Great(Posted 2010) [#45]
wow zzz, 3.9 seconds, 10,000,000 primes found. I believe we have reached a sort of road block for efficiency of counting primes but zzz may prove me wrong. (it may be faster in asm hint hint haha) Anyway, I hadn't touched bmax for a while so this helped me really get into it and pick it back up. thanks!


zzz(Posted 2010) [#46]
@Shortwind, i realized it was the way to go when i messed around with nates latest creation. changing the flags array to ints instead of bytes made it way slower, so figured it would work the other way around too.

@Nate, try doing 1m primes instead, it will be way faster since the 10m prime test re-introduces the memory issue i tried to get rid of.. :)

I dont think ill mess around much more with this. Assembly wont really help since its mostly a memory issue now, the algo touches memory all over the place over and over again so it is most likely racking up a huge number of chache/page mispredictions. A single L2 misprediction can cost over 200 cycles accordingly to google, and for reference i think an ordinary integer addition is a one cycle ordeal.

One could probably rewrite the code to work around this in some way, but i cant really come up with any ideas atm.


zzz(Posted 2010) [#47]
Expanded a bit on the preliminary culling, the code gives a pretty solid average of 0,075 seconds / 1m primes :D



Last edited 2010


xcessive(Posted 2010) [#48]
With the above code: 0.009200... seconds.

i7 extreme OC'd :D


Imphenzia(Posted 2010) [#49]
0.128000006 seconds
1000000 primes found!

i7 950 @ original 3.06GHz (non-debug)


Grey Alien(Posted 2010) [#50]
This thread is fantastic, well done all! Fascinating.


Grisu(Posted 2010) [#51]
0.0540000014
1000000 primes found

i7 920 @ 2.98 GHZ (non-debug)


zzz(Posted 2010) [#52]
Had a bunch of school-related stuff to finish today, so figured the sensible thing to do was to write yet another sieve to post here. Same as the last one pretty much. Continuing on the memory footprint philosophy i threw out even numbers completely.

EDIT: on my computer it does 30m primes in just under 10 seconds, where 10m takes just over 2 seconds, so it scales fairly well this time. Set limit to 573259392 if you want to try :)

EDIT2: updated the prime number counting code so now it should run considerably faster on the 1m prime test too. For me its slightly more than twice as fast as the last one i posted.



Last edited 2010

Last edited 2010


zzz(Posted 2010) [#53]
Am I the only one who still have some interest in this? Nate, Jesse, Shortwind? :) You guys probably have better things to do (im studying ~2days a week atm so i dont), just wondering if I should continue to post updates etc.

I could try and clarify whats going on in the code a bit if anyone wants me to. I havent done that much more than obfuscating the code from the original sieve so its not hard to understand the logic in any way.

Last edited 2010


Shortwind(Posted 2010) [#54]
Sorry, zzz, haven't lost interest, just been busy.

I believe the code we have in this thread is very usefull, and a good learning experience. What I'd suggest now is that you modify your code to quickly, with reasonable memory requirements, find the X'th prime number, the next prime, and the previous prime. (If you feel so inclined.)

In other words, if I tell your program that I want the 1,578th prime number, how fast can your program return that number?

If I give your program the number 95,456, how fast can it tell me the next or previous prime number?

And finally, (you'll have to start a new thread), what is the fastest method in BlitzMax for "factoring" a number into it's prime elements?

:D


zzz(Posted 2010) [#55]
That functionality would come as a bonus if i could get a segmented sieve to work properly (and fast enough). Youd still have a bit of overhead but it would be able to quite quickly calculate enough primes to be able to answer those questions :)

Ive been thinking about upping the performance of the sieve by packing values.. The last piece of code excludes even numbers completely, so:

a) I get a smaller memory footprint since the algorithm needs one bit per two values in the 1-limit range when culling prime candidates.

b) The culling code can be made much more efficient. Initially i hade the problem of prime multiples not existing in the flag array. For example the first multiple of 5 is 10, which is even so it doesnt exist as far as the sieve is concerned. This one was pretty easy to figure out and had the nice side-effect of making the culling loop twice as fast since i could skip every second multiple completely.

Ive been tinkering a bit with excluding multiples of 2,3,5 and 7 in the same way, but this makes the 1-limit range non-linear as opposed to just excluding odd numbers.. It would give the flag array a value density of around 440% (ie, 4,4values per bit) tho if i figured it out, which would give a very nice performance boost :D

After i nail those two ill get a factorization thread going :P

Last edited 2010


zzz(Posted 2010) [#56]
Proof of concept for a segmentet sieve. Seems to work properly so time to trim the code a bit i guess. Also opens up for multithreading which is always nice




zzz(Posted 2010) [#57]
Messed around a bit today with this and managed to get a segmented sieve going. The code needs some major cleaning up which i dont feel like doing now so heres some pretty numbers to look at instead:

edit: changed my mind, heres the code instead. Bit of warning that while it churns out primes pretty fast (100000000th prime took 12sec for me) it gobbles up lots of memory.. The TARGET constant is what prime the code will find so mess around with that.

edit2: small optimization. its currently doing 10m primes in just over 0,7 seconds for me :P



Last edited 2010

Last edited 2010

Last edited 2010


zzz(Posted 2011) [#58]
A bit of thread necromancy here.

I started refactoring the above code into something useful but lost interest soon afterwards.. Remembered the topic and figured id post it here anyways if someone is interested.

Feel free to try it and post some execution times :)




Peter(Posted 2011) [#59]
The 100000000th prime is 2038074743. ( total running time: 8.94799995 seconds )

from my company notebook (Intel Core 2 Duo, 2GHz, 2 GB RAM)

The 100000000th prime is 2038074743. ( total running time: 7.98600006 seconds )

from my Intel Core 2 6400 2,13 GHz, 2 GB RAM (growing old now)

-- Peter

Last edited 2011


_JIM(Posted 2011) [#60]
The 100000000th prime is 2038074743. ( total running time: 5.26900005 seconds )


work pc: i3 3.06GHz


ima747(Posted 2011) [#61]
The 100000000th prime is 2038074743. ( total running time: 6.32000017 seconds )


Mac Pro (mid 2010) 8 core (2x4 2.4ghz Xenon) with a few things in the background (shouldn't make much of a difference, extra cores and such)

Wonder if memory/bus speed would start having an impact with that high a load generated in such a short time...


virtlands(Posted 2012) [#62]
Hi Nate the Grate!

It works just great on my laptop:

1.48599994 seconds
1000000 primes found!

My computer type is:
Intel Core 2 CPU T5200 @ 1.6 Ghz speed
(Centrino Duo) HP Pavilion DV6245us
4GB memory, 32-bit, Windows 7


PowerPC603(Posted 2012) [#63]
The 100000000th prime is 2038074743. ( total running time: 3.68799996 seconds )


64-bit Quad Core i5-3450 @3.1GHz
12GB RAM
64-bit Windows 7 Home Premium SP 1
AMD Radeon HD7770 1GB


Using the edited code in first post:
0.194999993 seconds
1000000 primes found!


Last edited 2012