HTMLView - Save Web Page

BlitzMax Forums/MaxGUI Module/HTMLView - Save Web Page

Chalky(Posted 2015) [#1]
I am trying to capture and save to disk the HTML source of a webpage displayed in an HTMLView. I searched the forum and found this:



which was posted by skidracer 7 years ago (I changed Notify to Debuglog) in a response to a thread with the title "HTMLView - how to get source". However, when I run the code, I get this:



which is not the page HTML source.

The online manual for HTMLViewRun has no information about the "window." or "document." parameters - can anyone tell me where I can find details about their format and possible usage?


markcw(Posted 2015) [#2]
What you can do is get the path to the file with HtmlViewCurrentUrl. Then you use brl.filesystem and brl.stream commands to parse the source for what you want and save. Saving stuff is very simple, look at the WriteFile example in the docs. For reading stuff look at the OpenFile, etc commands. You should use OpenFile when parsing Http otherwise OpenStream. You can use ReadString to get the whole file contents (with FileSize) or ReadLine for one line at a time until Eof.


Chalky(Posted 2015) [#3]
Thanks for taking the time to reply munch. I already know how to read/write files (I'm a programmer by profession) and am aware of the need to use OpenFile (rather than OpenStream).

I'm not sure how HtmlViewCurrentUrl will help as it surely returns the website url (running it displayed "http://www.blitzmax.com/" - which I knew anyway as I supplied it via the HtmlViewGo statement)?

The reason I need to be able to access the HTML source code is that no remote file exists at the time of access as the page is "auto-generated" at the remote end depending on parameters submitted via a form. My intention was therefore to use an HtmlView to access the page, then save the source so that I could parse the saved file.

I was hoping that there was some documentation somewhere on how to use the "window.xxxxx" and "document.xxxxx" parameters with an HtmlView. Maybe none exists (other than that I found on the MDN site, which I am unsure how to utlilise within BlitzMax)?


grable(Posted 2015) [#4]
In short, you cant. There is no way to get anything out of the htmlview without resorting to hacks (like putting it in the location bar).

That is unless you access the raw control itself...

Note: that IHTMLElement is incomplete, since only outerHTML and get_parentElement is used here.

edit: I noticed that getting the actual HTML element (not just the BODY) doesnt work correctly..