HTTP Redirection

Community Forums/General Help/HTTP Redirection

_PJ_	(Posted 2015) [#1]

For some reason, (apparently only) amazon content resources are always returning "301 Moved Permanently" responses, the odd thing fort me is that the "Location: " header is ALWAYS identical to the initial destination, potentially resulting in infinite loop...

I am very new to all this http/html stuff, but I know I used to be able to download the actual html pages. Obviously "official" web-browsers seem capable of identifying rthe correct redirect location since they find the right page every time, as does Blitz+ HtmlView gadget.

Does anyone have any idea what's going on or how to fix it?
Example:

Const Test$="http://www.imdb.com/find?q=alien+resurrection"

Local con=OpenTCPStream("imdb.com",80)
Local Response$
Local Request$

While (con)
	If (Request="")
		Request=Send()
		WriteLine(con,Send())
		DebugLog Request
	End If
	Response$=ReadLine(con)
	If (Response="")
		con=False
		Exit
	Else	
		DebugLog Response 
	End If
Wend

WaitKey()
End

Function Send$();Wraps GET request line
	Return "GET "+Test+" HTTP/1.1"+Chr(13)+Chr(10)+"Host: imdb.com"+Chr(13)+Chr(10)+"User-Agent: AnyUA"+Chr(13)+Chr(10)
End Function

// Note I have tried many other amazon related queries and alla have the same issues...

BlitzSupport

(Posted 2015) [#2]

If I run the code below with your IMDB URL, it seems to infinitely redirect, as you say, despite handling the redirect 'properly', as far as I'm aware.

However, I noticed that the output says the response is 'chunked', so it's not plain text -- not sure if this relates to the actual problem, just had a quick try, but might be worth looking into:

BlitzGet MaxDeluxe

I have briefly encountered chunked encoding (from my own webhosting, actually), and used the following (PureBasic) code to 'decode' it -- this was very specific to my use-case, but there are some comments relating to how to process chunked return-data (scroll to "EXAMPLE RESPONSE"):

Procedure GetIP_Thread (nothing)

	; Only valid for chunked response from <a href="http://hi-toro.com/ip.php," target="_blank">http://hi-toro.com/ip.php,</a> working in March 2014!

	Define www = OpenNetworkConnection (Host$, 80)
	
	If www
	
		Define downloadsize = 65536

		Define *get = AllocateMemory (downloadsize)
	
		SendNetworkString (www, "GET " + File$ + " HTTP/1.1" + CRLF$)
		SendNetworkString (www, "Host: " + Host$ + CRLF$)
		SendNetworkString (www, "User-Agent: " + App$ + CRLF$)
		SendNetworkString (www, "Accept: text/plain" + CRLF$)
		SendNetworkString (www,  CRLF$)
	
		Define recvd = 0
		Define gotnd = #False ; netdata is zero until data starts coming in...
		
		Define netdata
		
		Repeat
		
			netdata = NetworkClientEvent (www)
			
			If netdata = #PB_NetworkEvent_Data

				gotnd = #True ; Can read events until zero after we have an event!
				recvd = ReceiveNetworkData (www, *get + recvd, downloadsize - recvd)

			EndIf
			
			Delay (100)
			
		Until gotnd = #True And netdata = 0 ; Had a response and no more data
		
		Define response$ = PeekS (*get)

		; Debug response$
		
		; EXAMPLE RESPONSE, comes as a single string (with newline characters):
		
		; HTTP/1.1 200 OK								; <----- [START OF HEADER]
		; Date: Tue, 20 Sep 2011 14:13:34 GMT
		; Server: Apache/2.0.63 (Unix) mod_ssl/2.0.63 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
		; X-Powered-By: PHP/5.2.9
		; Transfer-Encoding: chunked
		; Content-Type: text/html
		; 												; <----- [END OF HEADER (blank line)]
		; e												; <----- [CHUNKED RESPONSE: BYTES IN HEX, E (14) HERE]
		; 99.100.101.102								; <----- [EXAMPLE IP ADDRESS HERE!]
		; 0												; <----- [ZERO DENOTES END OF DATA CHUNK]
		; 												; <----- [BLANK LINE INDICATES END OF ALL DATA]
		
		; SKIP HEADER...
		
		Define eol
		Define thisline$
		Define response$
		
		Repeat
			eol = FindString (response$, CRLF$, 1)
			thisline$ = Left (response$, eol - 1)
			response$ = Right (response$, Len (response$) - (eol + 1))
		Until thisline$ = ""
		
		; FIND THE LINE CONTAINING THE IP (second line in this case, but should really read
		; the number of bytes, $e, and then the data. Still, it comes separated with newlines,
		; so what the hell!)...
		
		Define count = 0
		
		Repeat

			count = count + 1

			eol = FindString (response$, CRLF$, 1)
			thisline$ = Left (response$, eol - 1)
			response$ = Right (response$, Len (response$) - (eol + 1))

			If count = 2
				IP$ = thisline$
			EndIf

		Until thisline$ = ""
	
		CloseNetworkConnection (www)
	
	Else
		IP$ = "Disconnected"
	EndIf
	
EndProcedure

(Open in a wide text editor window so the comments line up properly!)

Apologies in advance if this turns out to be unrelated...

_PJ_	(Posted 2015) [#3]

Hi James, thanks so much for looking at this -sorry it's taken me a while to get back but I managed to solve the issue!

After all that, it was (as usual) just a stupid blunder on my part. I had trimmed the "www" from the actual HOST field of the GET request.

Essentially, just changing to:

Function Send$();Wraps GET request line
	Return "GET "+Test+" HTTP/1.1"+Chr(13)+Chr(10)+"Host: www.imdb.com"+Chr(13)+Chr(10)+"User-Agent: AnyUA"+Chr(13)+Chr(10)
End Function

Was all that was required. So it explains why the Test$ string (which had the www sector) was always redirected when looking there on http://imdb.com which I imagine automatically redirects to www.imdb.com because the ACTUAL Send$ function was never including "www" in the Host name it was just going round in circles!
(Hope that makes sense)

My code was initially based on the old BlitzGet but unfortunately I do not have BlitzMax (yet), so I wasn't able to use your code above. However, whilst that "chunked" stuff wasn't entirely clear to me, it doesn't seem to affect much and the original BlitzGet code seemes to handle it correctly, thankfully!