Download & Recover

BlitzMax Forums/BlitzMax Programming/Download & Recover

degac

(Posted 2014) [#1]

Hi
I'm trying to figure out to resolve a problem.
I'm using some function to download from a site a file. It works perfectly and it's quite fast.

But I have this problem: if for some reason or error the program stops, even the download (of course) is interrupted. The file is 'broken' but saved.
I created an output file to save the 'position' of reading, so technically I know 'where' the download was interrupted.

The only problem is that SeekStream() (the function I use) doesnt' work on stream that cannot' be read, so everything start from 0.

Any ideas or solutions?

(I was thinking to use an external program like Wget that *should* support reloading from the break point - I think)

Brucey

(Posted 2014) [#2]

There is a header you can pass to the server that will ask it to resume the file from a certain point.
Not sure how you go about setting up everything manually, but libcurl has built-in support for doing stuff like this.
Also, your server needs to support it too, or it won't work anyway.

degac

(Posted 2014) [#3]

Ok, so I will focus on libcurl... thanks.
So recover depends on the server side? The server are not mine in any case.

degac

(Posted 2014) [#4]

Ok, I found a solution!
It uses Lib_Curl (thanks to Brucey mod)
After some internet research I discovered how to recover an interrupted download.

Rem

			LIB CURL
			Requires BaH.libcurl installed

			11-may-2014 degac
			
			Download & Recover

			
			In the example a file (MiniPlayer.zip) is downloaded from my website
			If you interrupt (stop the execution in MaxIDE) and then restart it
			the application will recover from the last point (in this case retrieved
			with FileSize).
			Of course the file *must* in the same place, if there's no file present
			if will been download again (or if the file is there, download resume point is at the end... so nothing really is donwloaded

End Rem

SuperStrict

Import BaH.libcurl
Import BRL.StandardIO

Local curl:TCurlEasy = TCurlEasy.Create()


Local out_stream:TStream		'where to save the file
Local fsize:Int			'the size of the LOCAL file


Local filename:String="MiniPlayer.zip"

If FileType(filename)=1
	fsize=FileSize(filename)
	Print "File already downloaded - FileSize: "+fsize+" recover from here...?"
	out_stream=OpenStream(filename)
	SeekStream(out_stream,fsize)	'put at the last position - APPEND
Else
	out_stream=WriteFile(filename)	'start from zero
End If

curl.setOptInt(CURLOPT_FOLLOWLOCATION, 1)
curl.setWriteStream(out_stream)			'set where to write the file
curl.setProgressCallback(progressCallback) ' set the progress callback function
Local url:String="http://www.graphio.net/download/"+filename
curl.setOptString(CURLOPT_URL, url)
curl.setOptInt(CURLOPT_RESUME_FROM,fsize)	'set FROM where to restart the download
Local res:Int = curl.perform()
curl.cleanup()
If out_stream CloseFile out_stream


Function progressCallback:Int(data:Object, dltotal:Double, dlnow:Double, ultotal:Double, ulnow:Double)
	Print " ++++ " + dlnow + " bytes"
	Return 0	
End Function

I tested it with small files (300-500 Kb) because here my connection is capped to 100 MB/day... I will test with BIGGER files (maybe some ISO!) to check it everything works.

I presume FileSize handle big size file....

Derron

(Posted 2014) [#5]

It does not matter what download program you use:

server A has a file
client B wants the file

B requests the file from A, no "startPosition" requested: A sends from position 0.
If B pauses the download, or crashes or whatever it just has to request the file from B again but this time he could alter the position. If A "understands" what B requests, you will just receive the "rest" of the file.

Server side you would do something in the lines of the following snippet (coming straight from my downloadscript I use to track files while using canonical urls mydomain.de/file/filexyz.zip)

		//check if http_range is sent by browser (or download manager)
		if(isset($_SERVER['HTTP_RANGE'])) {
			list($sizeUnit, $rangeOriginal) = explode('=', $_SERVER['HTTP_RANGE'], 2);
			if ($sizeUnit == 'bytes') {
				list($range, $extraRanges) = explode(',', $rangeOriginal, 2);
			}else{
				$range = '';
				header('HTTP/1.1 416 Requested Range Not Satisfiable');
				exit;
			}
		}else{
			$range = '';
		}

		//figure out download piece from range (if set)
		list($seekStart, $seekEnd) = explode('-', $range, 2);

		//set start and end based on range (if set), else set defaults
		//also check for invalid ranges.
		$seekEnd   = (empty($seekEnd)) ? ($fileSize - 1) : min(abs(intval($seekEnd)),($fileSize - 1));
		$seekStart = (empty($seekStart) || $seekEnd < abs(intval($seekStart))) ? 0 : max(abs(intval($seekStart)),0);

		//Only send partial content header if downloading a piece of the file (IE workaround)
		if( $seekStart > 0 || $seekEnd < ($fileSize - 1) ) {
			header('HTTP/1.1 206 Partial Content');
			header('Content-Range: bytes '.$seekStart.'-'.$seekEnd.'/'.$fileSize);
			header('Content-Length: '.($seekEnd - $seekStart + 1));
		}else
			header("Content-Length: $fileSize");

		header('Accept-Ranges: bytes');

		fseek($fileHandler, $seekStart);

So what you might see there: your request to the server needs to contain the field "HTTP_RANGE" which sets what you want from a file/request.

bye
Ron

degac

(Posted 2014) [#6]

Hi
I know there are 'negotiations header', but it seems that LibCurl handles it automatically (of course the server MUST accept & support resume download).
On my site this is true (without changes by me).
I will test on other website.

Of course if the server doesn't handle this, there's very little to do than re-download the file.

Bye

Derron

(Posted 2014) [#7]

That handling is of course done by the "server" (apache2, nginx, ...).

But as soon as you route your files through scripts (eg. they dynamically inject affiliate-code, code to identify a specific user), you have to include a snippet like the above in that scripts code.

bye
Ron

degac

(Posted 2014) [#8]

Oh, yes, thank's your script is very useful for this.

But my idea is to create a 'general download manager' in BlitzMax, so I have no idea from 'where' the files are downloaded (ie:BlitzBasic.com or xyz.com, google.com, adobe.com etc)
If the server support resume, good, otherwise - at this point - I will force a redownload.

Thanks to all!

Brucey

(Posted 2014) [#9]

general download manager

How general? Over HTTP is easy. FTP gets a bit more difficult. HTTPS and SFTP you'll need to use the libcurlssl module - which includes support for certificates, etc.

degac

(Posted 2014) [#10]

with 'general' I mean not locked to my website.
And I dont' want to touch FTP things or HTTPS if possible!