Download & Recover

BlitzMax Forums/BlitzMax Programming/Download & Recover

degac(Posted 2014) [#1]
Hi
I'm trying to figure out to resolve a problem.
I'm using some function to download from a site a file. It works perfectly and it's quite fast.

But I have this problem: if for some reason or error the program stops, even the download (of course) is interrupted. The file is 'broken' but saved.
I created an output file to save the 'position' of reading, so technically I know 'where' the download was interrupted.

The only problem is that SeekStream() (the function I use) doesnt' work on stream that cannot' be read, so everything start from 0.

Any ideas or solutions?

(I was thinking to use an external program like Wget that *should* support reloading from the break point - I think)


Brucey(Posted 2014) [#2]
There is a header you can pass to the server that will ask it to resume the file from a certain point.
Not sure how you go about setting up everything manually, but libcurl has built-in support for doing stuff like this.
Also, your server needs to support it too, or it won't work anyway.


degac(Posted 2014) [#3]
Ok, so I will focus on libcurl... thanks.
So recover depends on the server side? The server are not mine in any case.


degac(Posted 2014) [#4]
Ok, I found a solution!
It uses Lib_Curl (thanks to Brucey mod)
After some internet research I discovered how to recover an interrupted download.


I tested it with small files (300-500 Kb) because here my connection is capped to 100 MB/day... I will test with BIGGER files (maybe some ISO!) to check it everything works.

I presume FileSize handle big size file....


Derron(Posted 2014) [#5]
It does not matter what download program you use:

server A has a file
client B wants the file

B requests the file from A, no "startPosition" requested: A sends from position 0.
If B pauses the download, or crashes or whatever it just has to request the file from B again but this time he could alter the position. If A "understands" what B requests, you will just receive the "rest" of the file.


Server side you would do something in the lines of the following snippet (coming straight from my downloadscript I use to track files while using canonical urls mydomain.de/file/filexyz.zip)
		//check if http_range is sent by browser (or download manager)
		if(isset($_SERVER['HTTP_RANGE'])) {
			list($sizeUnit, $rangeOriginal) = explode('=', $_SERVER['HTTP_RANGE'], 2);
			if ($sizeUnit == 'bytes') {
				list($range, $extraRanges) = explode(',', $rangeOriginal, 2);
			}else{
				$range = '';
				header('HTTP/1.1 416 Requested Range Not Satisfiable');
				exit;
			}
		}else{
			$range = '';
		}

		//figure out download piece from range (if set)
		list($seekStart, $seekEnd) = explode('-', $range, 2);

		//set start and end based on range (if set), else set defaults
		//also check for invalid ranges.
		$seekEnd   = (empty($seekEnd)) ? ($fileSize - 1) : min(abs(intval($seekEnd)),($fileSize - 1));
		$seekStart = (empty($seekStart) || $seekEnd < abs(intval($seekStart))) ? 0 : max(abs(intval($seekStart)),0);

		//Only send partial content header if downloading a piece of the file (IE workaround)
		if( $seekStart > 0 || $seekEnd < ($fileSize - 1) ) {
			header('HTTP/1.1 206 Partial Content');
			header('Content-Range: bytes '.$seekStart.'-'.$seekEnd.'/'.$fileSize);
			header('Content-Length: '.($seekEnd - $seekStart + 1));
		}else
			header("Content-Length: $fileSize");

		header('Accept-Ranges: bytes');

		fseek($fileHandler, $seekStart);



So what you might see there: your request to the server needs to contain the field "HTTP_RANGE" which sets what you want from a file/request.


bye
Ron


degac(Posted 2014) [#6]
Hi
I know there are 'negotiations header', but it seems that LibCurl handles it automatically (of course the server MUST accept & support resume download).
On my site this is true (without changes by me).
I will test on other website.

Of course if the server doesn't handle this, there's very little to do than re-download the file.

Bye


Derron(Posted 2014) [#7]
That handling is of course done by the "server" (apache2, nginx, ...).

But as soon as you route your files through scripts (eg. they dynamically inject affiliate-code, code to identify a specific user), you have to include a snippet like the above in that scripts code.


bye
Ron


degac(Posted 2014) [#8]
Oh, yes, thank's your script is very useful for this.

But my idea is to create a 'general download manager' in BlitzMax, so I have no idea from 'where' the files are downloaded (ie:BlitzBasic.com or xyz.com, google.com, adobe.com etc)
If the server support resume, good, otherwise - at this point - I will force a redownload.

Thanks to all!


Brucey(Posted 2014) [#9]
general download manager

How general? Over HTTP is easy. FTP gets a bit more difficult. HTTPS and SFTP you'll need to use the libcurlssl module - which includes support for certificates, etc.


degac(Posted 2014) [#10]
with 'general' I mean not locked to my website.
And I dont' want to touch FTP things or HTTPS if possible!