Load PDF Pages?

BlitzMax Forums/BlitzMax Programming/Load PDF Pages?

ima747(Posted 2010) [#1]
I've been looking but can't find a module to handle loading PDFs. There's been mention of one a few years ago but I couldn't find anything in the code archives...

I need to load a PDF file and render it's pages into pixmaps. I'm sure I'll need to do a number of steps to get that going, but short of finding a PDF lib and wrapping it myself (a bit beyond my current interest and expertise level...) I don't really know where to start...


Otus(Posted 2010) [#2]
Do you have to do it inside BlitzMax? Can you just get ImageMagick and do this:
System_ "convert file.pdf tmp.png"
Local img:TImage = LoadImage("tmp.png")

Edit: I see it's also available as a DLL, so it should fit your needs in any case.


ima747(Posted 2010) [#3]
sadly it's not 1 PDF, it's user PDFs so it needs to be built in. And I need it cross platform (mac and windows, linux would be nice but not needed right now).


xlsior(Posted 2010) [#4]
There's a number of PDF writers for BlitzMax, but I've never come across a PDF *reader*

Which is unfortunate, because I could really use one of those as well... :-?


Space_guy(Posted 2010) [#5]
Yes me too. But imagemagick and ghostscript works well enough for pc anyway.


Brucey(Posted 2010) [#6]
Apparently, if you have ghostscript installed, graphicsmagick can do it too.


Space_guy(Posted 2010) [#7]
Interesting Brucey. How would you go about it?


Space_guy(Posted 2010) [#8]
-r must be followed by <res> or <xres>x<yres>
magicktest: Postscript delegate failed (tmp\9.pdf) reported by C:/BlitzMax/mod/bah.mod/magick.mod/src/coders/pdf.c:377

I get this error trying to load a pdf file. Any ideas?


ima747(Posted 2010) [#9]
mupdf looks promising as a starting point... anyone with more skill interested? *nudges brucey with crossed fingers*

http://ccxvii.net/mupdf/


Otus(Posted 2010) [#10]
mupdf looks promising as a starting point... anyone with more skill interested? *nudges brucey with crossed fingers*

It seems to be cross platform, which is good. Note that it uses GPL v3+, so only suitable for projects that use the same.


xlsior(Posted 2010) [#11]
I do have ghostscript installed (it was a requirement for CutePDF), and if I run magick.mod\examples\example_04.bmx to return the supported formats, then I do see .PDF among them:

PDF: (Portable Document Format) : Readable = true, Writable = true, Multiframe = true



However, when trying to load a PDF file using example_05.bmx, I receive the following error:

-r must be followed by <res> or <xres>x<yres>
testing: Postscript delegate failed (C:\Misc\DISH_Player-DVR_522-625_User_Guide.pdf) reported by C:/Code/BlitzMax/mod/bah.mod/magick.mod/src/coders/pdf.c:377



(Same like Space_guy, apparently) any idea what needs to be done to render a multi-page PDF file?

<EDIT> It appears that the parameters for the filetypes are stored in the /config/delegates.mgk file -- there are a bunch of references for various types of PDF, which have things like this:

<!-- Read monochrome Postscript, EPS, and PDF -->
<delegate decode="gs-mono" stealth="True" command='"gs" -q -dBATCH -dMaxBitmap=50000000 -dNOPAUSE -sDEVICE=pbmraw -dTextAlphaBits=%u -dGraphicsAlphaBits=%u -g%s -r%s %s "-sOutputFile=%s" -- "%s" -c quit' />


So it does refer to the -r parameter in there, but something seems odd: According to the header %s is "scene", and %x / %y are used for resolution... There seem to be too many %s variables in there, I'd think.... But I tried some changes to that mgk file, but so far none of those appear to work...


xlsior(Posted 2010) [#12]
Note: looking for that error message on google, it does appear that it's Ghostscript that's generating that "-r" error, so at least it's seeing it and trying to communicate?

something is going squirrely, though... :-/


AdamRedwoods(Posted 2010) [#13]
poppler -> cairo?


xlsior(Posted 2010) [#14]
Brucey: do you happen to have any other examples, for example on how to display *any* of the multi-page document types?

I found that by hardcoding some values in the delegates.mgk file I could get it to return a very tiny thumbnail of a semi-random portion within a PDF, but have absolutely no idea on how to control any of the specifics...


Space_guy(Posted 2010) [#15]
Did anyone have any progress with loading pdf with imagemagick?


xlsior(Posted 2010) [#16]
No dice here.... Just tiny, semi-random blurry snippets, nothing that I managed to control. :-/


Foolish(Posted 2010) [#17]
PDF is a really challenging format to work in. However, the best non-Adobe library I have worked with is from PDFTron. It's commercial though. The open source libraries never really cut it for me when it comes to viewing.

The PDF format is like a Frankenstein's monster of a format. You can throw anything in there. When it comes to the high standards users expect of page fidelity, this makes it twice as hard to deal with.

And foget the Java based libraries. They are too way, way too slow.


xlsior(Posted 2010) [#18]
the PDF format is like a Frankensteins's monster of a format.


No kidding...

All I'm really after is a way to extract plain vanilla text from a PDF file... which is still near impossible. The only thing I found so far are some command line tools like PDF2HTML that can extract it in HTML format, but I haven't found anything that I can include as a library instead of having to call a 3rd party command-line app...


Henri(Posted 2011) [#19]
I found out that if you edit the delegates.mgk file by removing the "-g%s" switch from one of the four PDF read options (usually gs-color) you get full picture (not sure how to get multi page image in TMimage object)