MMO server and multi-threading

BlitzMax Forums/BlitzMax Programming/MMO server and multi-threading

Chroma(Posted 2007) [#1]
I'm assuming that MMO type servers that have about 500 people online at once, utilize multi-threading? Does that sounds correct?

What's your opinion?

And what do you think the max number of people you could have online at once in the same area of a 3D world, using BMax UDP?


Dreamora(Posted 2007) [#2]
No thats wrong

You use one server per zone and the like.

Threads are great but if one zone dies, the whole realm would die which is definitely inacceptable.

As well there is no real reason they should share the data ... the zones are distinct and so are the players -> nothing to share actually.


FlameDuck(Posted 2007) [#3]
Does that sounds correct?
Yes.

And what do you think the max number of people you could have online at once in the same area of a 3D world, using BMax UDP?
Depends on a lot of things, mainly your bandwidth and how tight your loops are. Probably not more than about 50-80 tho' on the best of days on a really powerful server and with some tight UDP code.

Threads are great but if one zone dies, the whole realm would die which is definitely inacceptable.
How do you figure that? If one thread should crash, it only affects the player dealing with that thread. Unless it's the main thread ofcourse, in which case all child threads (players) are disconnected.

nothing to share actually.
Except "other players" for starters.


Chroma(Posted 2007) [#4]
So you're saying that for each zone in an MMO, there's a single computer dedicated to that zone? And the client is just transfered between computers...

Sounds expensive. :( Otherwise you'd have to have one computer handle the whole world. While would probably lag out horribly.


FlameDuck(Posted 2007) [#5]
It depends largely on how you define "computer" and "zone". But in abstract terms, sure.


Canardian(Posted 2007) [#6]
From what I heard, in EverQuest and World of Warcraft they had 1 server per one or a few zones. When a zone was getting too laggy, they moved other zones away from the server (good example was Molten Core+Blackwing Lair on weekends in WoW).

The money what a server costs means really nothing, if you have 300000 players who pay $15 each month, that's 4.5 million $ per month for basically doing nothing :)


Winni(Posted 2007) [#7]
There's a nice chapter on that subject in the book "Game Programming with Python" by Sean Riley. You will find all the background information that you need there. Even if you are not going to use Python itself, the book will be very useful.

In C# or Java apps, you will mostly find the "concurrent threaded server" model being implemented, meaning that there will be a worker thread for every connected client. The main reason for this is that the applications are using blocking I/O sockets.

In BlitzMax, however, you could implement a "concurrent asynchronous server", which Sean Riley describes as the "reactor pattern". It is a single-process, single-threaded server. Which basically is something like this baby here that I started a while ago:

'*
'* Digital Nomad News Server
'* 
'* Author: Winfried Maus
'*
'* 
'*====================================================================================================================================	

SuperStrict

Const NNTP_PORT:Int = 119
Const MAX_CMD_PARAMS:Int = 10



GetServer().Create()
GetServer().Run()
End


Function GetServer:NomadServer()
	Global Nomad:NomadServer = New NomadServer
	
	Return Nomad
End Function


'*====================================================================================================================================	
'* This class defines the worker instance For successful
'* NNTP connections.
'* 

	Type NNTPServer
		Field NNTPSocket:TSocket
		Field NNTPStream:TSocketStream
		
		Field dbConnstring:String = "/Users/Winni/DigitalNomad.dat"

		Field currentGroup:String = ""
		Field currentArticleNum:Long = 0
		Field lastKnownArticleNum:Long = 0
		Field article:String

     	Field posting:Int = False
		Field ihavetransfer:Int = False
		Field postingallowed:Int = True
		Field serverpostingallowed:Int = True
		Field articleToPost:String = ""

		Field cmd:String
		Field cmdParam:String[] = New String[MAX_CMD_PARAMS]
		Field cmdParamCount:Int = 0

		Field running:Int
		Field TimeOutTime:Int = GetTimeNumeric()
 	
		Field localConnection:Int = False


		Method wmsendMessage( message:String ) 
			WriteLine( NNTPStream, message )
		End Method


		Method cmdHelp() 
			wmsendMessage( "100 List of recognized commands follows." )
			wmsendMessage( "Help" )
			wmsendMessage( "List" )
			wmsendMessage( "Group" )
			wmsendMessage( "Post" )
			wmsendMessage( "Slave" )
			wmsendMessage( "Mode reader | stream" )
			wmsendMessage( "XOver" )
			wmsendMessage( "XHDR" )
			wmsendMessage( "Article" )
			wmsendMessage( "Stat" )
			wmsendMessage( "Head" )
			wmsendMessage( "Body" )
			wmsendMessage( "Last" )
			wmsendMessage( "Next" )
			wmsendMessage( "Ihave" )
			wmsendMessage( "Newgroups" )
			wmsendMessage( "Newnews" )
			wmsendMessage( "Date" )
			wmsendMessage( "Quit" )

			'Show administrative commands only on Local connection
			If localConnection 
				wmsendMessage( "XShutDown" )
				wmsendMessage( "XDeleteMessage msgid articleid" )
				wmsendMessage( "XCreateGroup groupname [y | n]" )
				wmsendMessage( "XDeleteGroup groupname firstarticle lastarticle" )
				wmsendMessage( "XPosting server | groupname y | n" )
			End If

			wmsendMessage( "." )
		End Method


		Method clearParams() 
			For Local nPos:Int = 0 To cmdParam.Length - 1
				cmdParam[nPos] = ""
			Next

			cmd = ""
			cmdParamCount = -1
		End Method

        
		Method MessageHandler( message:String ) 
			clearParams()

			While message.Length > 0 And cmdParamCount < MAX_CMD_PARAMS
				Local spcpos:Int = message.Find(" ")

				If spcpos = -1 
					spcpos = Len( message )
				End If

				If cmdParamCount = -1 
					cmd = Upper( Left( message, spcpos ) )
				Else 
					cmdParam[cmdParamCount] = Upper( Left( message, spcpos) )
				End If

				If spcpos + 1 <= message.Length 
					message = Right( message, Len( message ) - spcpos - 1 )
				Else 
					message = ""
				End If

				cmdParamCount :+ 1
			Wend

			Select cmd 
				Case "HELP"
					cmdHelp()
				Case "QUIT"
					running = False
				Case "XSHUTDOWN"
					If localConnection
						GetServer().ShutDown()
					Else
						wmsendMessage( "500 Local connection required." )
					End If
				Default
					wmsendMessage( "500 Command not recognized." )
			End Select
		End Method

		Method Create ( pSocket:TSocket, pStream:TSocketStream )
			NNTPSocket = pSocket
			NNTPStream = pStream
			
			
			If SocketRemoteIP( NNTPSocket ) = SocketLocalIP( NNTPSocket )
				localConnection = True
			End If
			
			
			running = True

			If serverpostingallowed 
				wmsendMessage( "200 Welcome to the Digital Nomad News Server (BlitzMax version)." )
			Else 
				wmsendMessage( "201 Welcome to the Digital Nomad News Server (BlitzMax version), read only." )
			End If
		End Method
		
		Method GetTimeNumeric:Int()
			Local Now:String = CurrentTime()
			
			Return Left(Now,2).ToInt() * 3600 + Mid(Now,4,2).ToInt() * 60 + Right(Now,2).ToInt()
		End Method
		
		Method Run()
			Local Now:Int = GetTimeNumeric()
			
			'* Exception for midnight...
			If Now = 0
				Now = 1
				TimeOutTime = 86400 - TimeOutTime
			End If
			
			'* Check incoming stream
			If Eof( NNTPStream )
				running = False
			Else
				If SocketReadAvail( NNTPSocket ) > 0
					MessageHandler( ReadLine( NNTPStream ) )
					TimeOutTime = GetTimeNumeric()
				Else
					'* Idle for more than 5 minutes, close this connection.
					If Now >= TimeOutTime + 300
						running = False
						wmsendMessage( "500 Timeout: Connection idle for more than five minutes." )
					End If
				End If
			EndIf
			
			If Not running
				wmsendMessage( "500 Bye." )
				CloseStream( NNTPStream )
				CloseSocket( NNTPSocket )
			End If
		End Method
	End Type

'*====================================================================================================================================	
'* Main Application Class
'*

	Type NomadServer
		Field ShuttingDown:Int = 0

		Field ServerSocket:TSocket = New TSocket

		Field WorkerSocket:TSocket
		Field WorkerStream:TSocketStream

		Field NNTPobj:NNTPServer
		Field NNTPObjects:TList = New TList


		Method Create()
			ServerSocket = CreateTCPSocket()

			If Not BindSocket( ServerSocket, NNTP_PORT )
				Print "Bindsocket failed. Now quitting."
				End
			EndIf

			SocketListen( ServerSocket )
			Print "Listening."
		End Method


		Method ShutDown()
			ShuttingDown = 1
		End Method

		Method Run()

			While ShuttingDown < 3
				'* When still running normally, accept new incoming connections.
				If ShuttingDown = 0
					WorkerSocket = SocketAccept( ServerSocket )

					If WorkerSocket <> Null
						WorkerStream = CreateSocketStream( WorkerSocket, True )

						NNTPobj = New NNTPServer
						NNTPobj.Create( WorkerSocket, WorkerStream )
						NNTPObjects.AddLast NNTPobj 

						WorkerSocket = Null
						WorkerStream = Null
					EndIf
				End If
				
				'* When shutting down is initialized, do this in a separate loop to make sure that all connections will be closed properly.
				If ShuttingDown = 1
					For NNTPobj=EachIn NNTPObjects
						NNTPobj.running = False
					Next
					ShuttingDown = 2
				End If

				'* Iterate through all open connections.
				For NNTPobj=EachIn NNTPObjects
					NNTPobj.Run()
					If Not NNTPobj.running
						ListRemove( NNTPObjects, NNTPobj )
					End If
				Next

				If ShuttingDown = 2
					ShuttingDown = 3
				End If
			Wend
		End Method
	End Type

'*====================================================================================================================================	
'* EOF


It's a base skeleton for an NNTP server and not doing much. I have a fully working "concurrent threaded server" version written in C# on (the leftovers of) my homepage.

Personally, I no longer think that in this domain (network servers) the multithreaded approach has any benefits over the single-threaded approach. Multi-threading itself does not solve any scalability problems, but instead it creates a whole set of new issues: CPU usage and "ghost" threads being just two examples.

As for the rest of your question, you are talking about something like "Second Life" here. I have read in the German c't magazine that they have around 50 people on one server at a time. They also use one server per zone in their "world"; if you leave a zone, they use object persistency, transfer the object to a new server and "revive" it there. At least that is what Miguel de Icaza wrote in his blog a while ago. Second Life actually uses Mono as the server framework, that's why Miguel wrote about it.

Anyway. I think it gives an idea about the size of the server farm that one can expect for a full blown MMO project. It simply is to big for an low-budget one-man shop...


Chroma(Posted 2007) [#8]
Very true, the number of servers required for a large MMO would typically be beyond the scope of an indie developer.

The last thing you would want to do in an MMO would be to put a limit on the number of people allowed to play. Which really makes the low-budget indie MMO not possible. :(

This isn't my latest project; I know better than to tackle an MMO. But a multiplayer game kinda like an MMO that has a max capacity of 64 people sounds doable.


LarsG(Posted 2007) [#9]
I would guess that if you are good enough to make such a big project as an indie developer, I don't think that getting or renting the hardware to run the project would be such a big issue..

I'm sure there are alot of people who would lend you the money to buy or rent a startup server farm, if you've already got a MMO to show them..


Winni(Posted 2007) [#10]
You probably would find people to finance that project - but the problem with Venture Capitalists is that in the end they want to own your business. The other problem with VCs is that they might drop you over night. This happened during the x-mas holidays at a company where I've worked back in 1999 - believe me, it was not a very amusing for us employees.

As for the 64-person project: Wouldn't it be cool to have something like that on a peer-to-peer network basis without a dedicated server? Or maybe a game that uses a similar technical foundation as the Freenet project? No central servers, but most parts of the game world would still be alive as long as some people are online. I find that quite fascinating. If it makes sense, now that's a different story. ;-)


Dreamora(Posted 2007) [#11]
Would be fascinating ... A year or so ago I was thinking of a partly / mainly userdriven MMO where each player is able to create zones himself and host it on his system and "connect" them to a main server hosted by me.

The needed editors etc to do the zones, to create new items would have been part of the game they bought.
There wouldn't have been any fees.

But I dropped the idea ... the market for iso like MMOs isn't large enough and especially flooded with free MMOs and true 3D is too complicated to attract the masses needed to make such a system with a shared world really working.
And creating the needed toolset is just to expensive and time consuming to give it for free. Advertisement would be the only way to go but I do not really trust into "post funding" ...

None the less, I still think the idea in general is fascinating.
And I think it actually makes sense.

How many people pay to host crap like CounterStrike servers? *yeah I know, so called Pro gamers don't see it as such ... but thats a whole different story*
And now think of what kind of a world a single rented server of this capabilities would give to this type of game.

When I think back of how large the UO freeshard world has been years ago on a crappy 450Mhz P2 with 128MB of RAM and a 10M connection and 200 concurrent players online without lag and massive amounts of animals and other NPCs


plash(Posted 2007) [#12]
There's a free UO shard called UOGamers (now with 1200 active players, daily), back a few yrs ago they tested a new server and at that time over 4500 clients were on simultaneously (i wonder how much it lagged :P).

www.uogamers.com


Paul "Taiphoz"(Posted 2007) [#13]
Looking at the number one mmo in the market right now, it uses 3 servers per domain where a domain is the server you play on.

For Example, on Doombringer. 1 server for kalimdor, 1 server for the eastern kingdoms and 1 server for outland. each of those areas have a vast number of zones. its not 1 server per zone, its 1 server per continent.

I dont think its out of the realm of the indie developer at all, if an Indie was able or had the guts and time to make a game with same size as just one of the main islands in world of warcraft they would make an absolute fortune. as long as the price was right.

I also don't agree to the number of players you could get in one area, it all depends on how you manage your zone. for example world of warcraft uses a portal system. its only ever rendering or dealing with the things in your current area. each zone might have 30 or more area's split up, thats how it manages to handle so many people in one zone.

Now put them all into the exact same zone, and the exact same area, then you run into trouble, for example again with world of warcraft, if 100 or more people turn up in iron forge at the same time then 90% of them will be lagged out, or lagged to the point they need to wait a while for their computer to get all the new player data.

Again there are a number of ways to combat this, you could instance all cities, and add a server dedicated to instances. oh, wow has 4 servers, 1 for each land mass and 1 for all instances, forgot about that one.

I think the point I am trying to make is that it is possible, it is indeed feasible , and if you have the coding talent to pull off even a small island mmo then you should def go for it. get some good artists and take your shot at it, other people have done it on more than one occasion now and they are managing to make a nice living out of it.


Dreamora(Posted 2007) [#14]
the splitup has no real influence on the server. It has on the network traffic and the client performance hunger thought.

the only thing that has a straight influence on the server are:

- latency (reaction time needed to let the player believe it is realtime. the faster it needs to react the higher the cpu usage)
- AI (quite some CPU time is burnt here)
- Calculus (combat formulas, abilities etc etc etc. This is the largest part of the whole cpu time gone)


TaskMaster(Posted 2007) [#15]
Yavin, I find it hard to believe that a WoW server is only 3 machines. Do you have a link or something showing this info?


Paul "Taiphoz"(Posted 2007) [#16]
@TaskMaster - Wireshark and run it while playing wow you can see your connection being redirected to new servers as you zone in and out of locations.

1. for instances.
1. for eastern Kingdons.
1. for Kalimdor
1. Outland the newish expansion.

You only see a loading screen in wow when you travel to one of the above locations, because thats when it needs to move you to the other server.

You dont have to run wireshark tho, simply play the game, iv been in outland and the instance servers have gone down, or iv been in Kalimdor and seen the outland and eastern servers go down but the instance and kalimdor servers remain online.

4 domains per server. nothing more, actually I think the instance servers cover more than one realm, as evident from the cross realm battle grounds.


TaskMaster(Posted 2007) [#17]
That does not tell you that it is one machine, it can be a whole server farm with shared memory space...

I really doubt it is one machine for an entire continent.


Raph(Posted 2007) [#18]
Typical MMO architecture works like this:

Clients connect to some form of a userserver (after going through an auth process, which may be its own server), which handles traffic for a set of users. Typically, multiple userservers are run for each world server. You may get shuffled between userservers for a variety of reasons. You may get clustered because of geographical proximity in thr world, for example.

User connections are multiplexed through to the world servers which actually run the simulation. World servers may or may not require zoning between them depending on how much proxying they do along the boundaries. Most world server architectures use static geographical load balancing, but some have used various forms of dynamic load balancing, and are not doing load purely based on geographical proximity.

World servers are typically managed by some sort of nanny process. It's not unusual to launch world servers for instances, for world servers to go down and be relaunched while not damaging the cluster as a whole, etc.

There may also be DB processes -- there's almost certainly DB servers.

In terms of server CPU usage, there's really nothing stopping you from running an OS plus a few thousand concurrent connections as long as you can handle the sockets and aren't running much else on the box. But it's very heavily dependent on the amount of additional processing required for your world -- scripting, pathfinding, AI, etc -- and on the n^2 problem of densely packed crowds. Even old text muds regularly for hundreds on one world -- and no, modern MMO servers are *not* necessarily more intensive in terms of their CPU demands. (Tho of course the overall architecture is more complex).

The largest demands these days tend to actually be pathfinding and "awareness" (radial searches, radial notifications), unless you are running a crazy fast heartbeat. Typical heartbeats are on the order of 100-250ms.

Typical clusters are well over a dozen machines. I am pretty sure WoW does not run on just four per shard. But the machines can also be multicore, of course.


BadJim(Posted 2007) [#19]
Zones are sized to reduce the number of players that can see each other, in order to reduce outgoing bandwidth consumption.

How many zones per server depends on your game. If you use Everquest size zones on an Everquest style game, considering it was working a decade ago, you will be able to get a lot of zones on a modern server. There is no good reason to use one server per zone in this case, but you may find you need more than one server for all of them.


Raph(Posted 2007) [#20]
Zones are typically not used anymore as the basis for bandwidth culling; there's smarter ways to do that.


Dreamora(Posted 2007) [#21]
Yes, WoW shows quite good how it should be done with PVS within cities and radius based network ghosting outside (even tribes 2 had a decade ago) including subzoning for AI and other things


Banshee(Posted 2007) [#22]
I dont know if my server tool for sim racing counts as an MMO, it runs 7 race servers or 'zones' with around 100 racers at any one time and even manages a good 50-60 during work hours, hardly WoW level but fairly ambitious by Blitz multiplayer standards.

What I will say is that it has a memory leak, and after extensive debugging today I am sorry to say the fault appears, at this time, to most likely lie in Win32 BlitzMax. I'm not sure where yet, but memalloc() shows my app using a steady 130kb yet the software is raising in memory at a rate of roughly 10mb per hour. I extensively use sockets and a 3rd party SQL library, it's possible the fault lies there I guess - but i'd expect that to show up in memalloc().

The previous version written in Blitz3D did not suffer this at all, and given the choice having completed a migration to BlitzMax i'd actually like to go back to Blitz3D again.

I've found BMax too unstable on the whole, maybe it's down to the way i'm using it - but flushing a stream that has disconnected in Blitz3D wil return a fail - in BlitzMax it crashes the entire software - which is a bit harsh if the stream has died with a dirty disconnect since I last checked.

All in all, I regret developing an MMO style program in BlitzMax.

In terms of players the limiting factor is the size of the network connection. A regular low end dedicated box will typically have a 100mbit connection, giving you about 12.1mb upload per second if my memory serves (i've not calculated, going from the RAM installed in my brain :) ).

As long as your total network usage is sufficiently under that so your ping latency is not too bad then it really does not matter how many players are connected to a 'zone'.

Some games use zones as a way of shaping what data to send, but there are other ways. Zones are a simple method, but not necessarily the best for every game type.

What is most important to a developer is how much bandwidth goes through a single server box, and this is very dependant upon the multiplayer code that your software requires to operate.

Again smart coding can make all the difference, its not just a matter of how much data you put into a packet in determining the size, but also the frequency your data needs to be sent, aswell as to how many players the data must be sent too.

Some primitive coders use a packet per second system - which strikes me as a crude method of immediately implementing around 200ms of lag - but hey, it's good enough for some 'professionals' :) - it doesnt take much use of the box upstairs to start sending packets when the controller state changes, and so begins the route of your data throughput becoming dynamic.

Anyway, my advise for MMO's at the moment - as I say i'm currently debugging and at the moment BlitzMax is reporting data that suggests BlitzMax itself has a memory leak somewhere. I'll know more when i'm done debugging, it might turn out to be my mistake yet - or a 3rd party module - but if I was starting a new project tomorrow, i'd use Blitz3D (with a 2D canvas) as my MMO server.


Winni(Posted 2007) [#23]
If BlitzMax itself is causing the memory leak, then it should be possible to encircle the responsible function(s) in Blitz. However, Blitz does not manage the memory of wrapped C libraries, so it can neither report about their memory usage nor does its garbage collector work for them. So unless proven guilty, my guess would be that your third party library is having that leak, not Blitz.

Disconnected streams: Since BlitzMax has exception handling, I'd expect BlitzMax to fire an exception if you write to a disconnected stream. If your application does not catch that exception, by all means, it HAS to abort with something that looks like a crash. And in any case, your code should check if the connection is still valid BEFORE writing to the stream. Basically, that is the same practice one would also use in Java or C#.

Do you make sure that your sockets are always properly closed and that no reference to them exists anymore? If not, they will go on consuming memory.

Furthermore, there is also a chance that Windows itself does not properly handle no longer used sockets. (It had a bad reputation for that in the past.)

Finding memory leaks is a pain in the butt, so I can really understand that you are, hm, not very happy with the situation. Especially when you know that it is not your own code that causes the problem.


Banshee(Posted 2007) [#24]
I've done some testing with a more basic application using just the socket library, and I have found the strangest sympton with regards to the memory leak.

My software runs on a dedicated box in a datacentre, and I connect to it via Remote Desktop (RDC). When I sit and watch task manager over RDC the memory usage continues to spiral, when I disconnect and reconnect the memory usage is reset.

This occurs on two applications, the second of which does not use any 3rd party libraries, it's purely using the stream command set.

It is a very peculiar bug, but does seem to be related to Blitz itself. I'm not fully sure of the circumstances in which it occurs still, it's a complicated process to setup a server to run tests to debug it - but it seems from my testing that the problem is related to RDC or Windows Server in some way. Other software running on the box does not suffer this, and my software does not appear to be effected when running locally (although thats with 1 player rather than 100's).


Winni(Posted 2007) [#25]
The Microsoft RDP protocol transports bitmaps over the network, or at least the differential changes to the last complete bitmap, and stores them in chaches. Is it really just the Blitz app that is showing increased memory usage or is it the entire Windows box?

Another thought: Your app is not running as a service, is it? There is a chance that Windows only closes all the handles and sockets when you log off or when the process that opened them is closed. This could indicate that your application is not closing the streams and the sockets properly, or that your app requests a close, but it does not happen until the process itself is terminated.

Maybe it is also something related to the OS version that you are using. Windows versions do not behave consistently. Some bugs only appear on certain builds of a specific OS version. You can experience a problem on Windows 2000 SP4, for example, but not on Server 2003 SP1 or XP.

By the way, XP has a one-connection "Terminal Server Light" built in. Asszuming that you run XP on your development machine: When you connect from another computer via RDP to your XP box running your app, do you then also have the same problem or does it just happen on the Windows Server?


Banshee(Posted 2007) [#26]
The app runs on a machine with Windows Server 2003, it's got all the latest updates as of this morning. I connect to it from a machine with XP and SP2.

Thing is, i'm not actually creating and deleting any sockets as my software connects to various server apps (zone apps shall we say).

There's some basic type handling where I scrub them (by removing from list) and i've made GCcollect manual and even put in a 200x1 pixel window I refresh every 5ms with a flip just to be sure it's doing garbage collection - although it doesnt need a graphics window.

Aside from that i'm a bit confused because other than the odd linked list which I can assure you are not growing (I put a command in to count them to be sure) i'm not deleting anything. My memalloc() isn't growing either, i'm using a steady 110-130kb.

That is about as much as I can figure out. I'm not creating & deleting streams, so it's either types or stream flushing that is bugged [or behaving in a manner that I personally didnt expect] on Windows 2003 Server.

I dont know what else to do, but I am facing the prospect of a rewrite.

I just managed to do 19 hours without a crash - it's a record.