HTTP Redirection

Community Forums/General Help/HTTP Redirection

_PJ_(Posted 2015) [#1]
For some reason, (apparently only) amazon content resources are always returning "301 Moved Permanently" responses, the odd thing fort me is that the "Location: " header is ALWAYS identical to the initial destination, potentially resulting in infinite loop...

I am very new to all this http/html stuff, but I know I used to be able to download the actual html pages. Obviously "official" web-browsers seem capable of identifying rthe correct redirect location since they find the right page every time, as does Blitz+ HtmlView gadget.

Does anyone have any idea what's going on or how to fix it?
Example:
Const Test$="http://www.imdb.com/find?q=alien+resurrection"

Local con=OpenTCPStream("imdb.com",80)
Local Response$
Local Request$

While (con)
	If (Request="")
		Request=Send()
		WriteLine(con,Send())
		DebugLog Request
	End If
	Response$=ReadLine(con)
	If (Response="")
		con=False
		Exit
	Else	
		DebugLog Response 
	End If
Wend

WaitKey()
End

Function Send$();Wraps GET request line
	Return "GET "+Test+" HTTP/1.1"+Chr(13)+Chr(10)+"Host: imdb.com"+Chr(13)+Chr(10)+"User-Agent: AnyUA"+Chr(13)+Chr(10)
End Function


// Note I have tried many other amazon related queries and alla have the same issues...


BlitzSupport(Posted 2015) [#2]
If I run the code below with your IMDB URL, it seems to infinitely redirect, as you say, despite handling the redirect 'properly', as far as I'm aware.

However, I noticed that the output says the response is 'chunked', so it's not plain text -- not sure if this relates to the actual problem, just had a quick try, but might be worth looking into:

BlitzGet MaxDeluxe

I have briefly encountered chunked encoding (from my own webhosting, actually), and used the following (PureBasic) code to 'decode' it -- this was very specific to my use-case, but there are some comments relating to how to process chunked return-data (scroll to "EXAMPLE RESPONSE"):



(Open in a wide text editor window so the comments line up properly!)

Apologies in advance if this turns out to be unrelated...


_PJ_(Posted 2015) [#3]
Hi James, thanks so much for looking at this -sorry it's taken me a while to get back but I managed to solve the issue!

After all that, it was (as usual) just a stupid blunder on my part. I had trimmed the "www" from the actual HOST field of the GET request.

Essentially, just changing to:

Function Send$();Wraps GET request line
	Return "GET "+Test+" HTTP/1.1"+Chr(13)+Chr(10)+"Host: www.imdb.com"+Chr(13)+Chr(10)+"User-Agent: AnyUA"+Chr(13)+Chr(10)
End Function


Was all that was required. So it explains why the Test$ string (which had the www sector) was always redirected when looking there on http://imdb.com which I imagine automatically redirects to www.imdb.com because the ACTUAL Send$ function was never including "www" in the Host name it was just going round in circles!
(Hope that makes sense)

My code was initially based on the old BlitzGet but unfortunately I do not have BlitzMax (yet), so I wasn't able to use your code above. However, whilst that "chunked" stuff wasn't entirely clear to me, it doesn't seem to affect much and the original BlitzGet code seemes to handle it correctly, thankfully!