HTTP Redirection
Community Forums/General Help/HTTP Redirection
| ||
For some reason, (apparently only) amazon content resources are always returning "301 Moved Permanently" responses, the odd thing fort me is that the "Location: " header is ALWAYS identical to the initial destination, potentially resulting in infinite loop... I am very new to all this http/html stuff, but I know I used to be able to download the actual html pages. Obviously "official" web-browsers seem capable of identifying rthe correct redirect location since they find the right page every time, as does Blitz+ HtmlView gadget. Does anyone have any idea what's going on or how to fix it? Example: Const Test$="http://www.imdb.com/find?q=alien+resurrection" Local con=OpenTCPStream("imdb.com",80) Local Response$ Local Request$ While (con) If (Request="") Request=Send() WriteLine(con,Send()) DebugLog Request End If Response$=ReadLine(con) If (Response="") con=False Exit Else DebugLog Response End If Wend WaitKey() End Function Send$();Wraps GET request line Return "GET "+Test+" HTTP/1.1"+Chr(13)+Chr(10)+"Host: imdb.com"+Chr(13)+Chr(10)+"User-Agent: AnyUA"+Chr(13)+Chr(10) End Function // Note I have tried many other amazon related queries and alla have the same issues... |
| ||
If I run the code below with your IMDB URL, it seems to infinitely redirect, as you say, despite handling the redirect 'properly', as far as I'm aware. However, I noticed that the output says the response is 'chunked', so it's not plain text -- not sure if this relates to the actual problem, just had a quick try, but might be worth looking into: BlitzGet MaxDeluxe I have briefly encountered chunked encoding (from my own webhosting, actually), and used the following (PureBasic) code to 'decode' it -- this was very specific to my use-case, but there are some comments relating to how to process chunked return-data (scroll to "EXAMPLE RESPONSE"): (Open in a wide text editor window so the comments line up properly!) Apologies in advance if this turns out to be unrelated... |
| ||
Hi James, thanks so much for looking at this -sorry it's taken me a while to get back but I managed to solve the issue! After all that, it was (as usual) just a stupid blunder on my part. I had trimmed the "www" from the actual HOST field of the GET request. Essentially, just changing to: Function Send$();Wraps GET request line Return "GET "+Test+" HTTP/1.1"+Chr(13)+Chr(10)+"Host: www.imdb.com"+Chr(13)+Chr(10)+"User-Agent: AnyUA"+Chr(13)+Chr(10) End Function Was all that was required. So it explains why the Test$ string (which had the www sector) was always redirected when looking there on http://imdb.com which I imagine automatically redirects to www.imdb.com because the ACTUAL Send$ function was never including "www" in the Host name it was just going round in circles! (Hope that makes sense) My code was initially based on the old BlitzGet but unfortunately I do not have BlitzMax (yet), so I wasn't able to use your code above. However, whilst that "chunked" stuff wasn't entirely clear to me, it doesn't seem to affect much and the original BlitzGet code seemes to handle it correctly, thankfully! |