Loading a webpage to a string

BlitzMax Forums/BlitzMax Programming/Loading a webpage to a string

Thareh(Posted 2013) [#1]
Hi!
I'm writing a HTML parser thingy and I'm trying to optimize it atm.
The current bottleneck is my "Webpage -> String" routine which is kinda messy:



Anyone know a faster way?

Thanks!

Last edited 2013

Last edited 2013

Last edited 2013


Brucey(Posted 2013) [#2]
how about LoadText(URL) ?


Thareh(Posted 2013) [#3]
Thank you Brucey!
LoadText was a tiny bit faster and so much simpler.

Any more ideas or is this as fast as it gets? :)


Brucey(Posted 2013) [#4]
Well, you can only fetch data as fast as it fetches it from your source.

If it was a LOT of text, you could always try asking the server to return it as .gz (zipped), but then you'd have to unpack it on your side. But that's only useful if it really is a lot of text, and the server supports it.


Banshee(Posted 2013) [#5]
If the bottleneck is multiple requests for small amounts of content then you could be getting throttled, this often happens to bots (if it is not your server then you should read robots.txt for the safe passing speed so as not to cause crashes).

If it is your server, then you could optimise the delivery and format of the information that you are sending to yourself, by sending the amount of data that your application can process in the time it takes to ping the server.

If you are doing something that takes time but you need instant access to data, like price scraping where you dont care if you down a competitors systems and you just want the process to have ready available information of published competitor data, then the best way is to scrape over time to a database so you can look up the data at any point, but have your script gradually refresh the data over time on a cron job or timed interval. In effect, pre-fetching it before the information is actually required.