You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi - i just used #master to do the common thing of extracting all text from a pdf. it worked, thanks for the nice library! it took a while to figure out how to do it and required more contortions than i expected. perhaps you could add some api support for such a basic task? here's what i wound up with, is this what you expect users to do?
main =do
withPdfFile "file.pdf"$\pdf ->do
txt <- extract pdf =<< catalogPageNode =<< documentCatalog =<< document pdf
extract pdf = (T.concat<$>) . (traverse ((extract' =<<) . loadPageNode pdf) =<<) . pageNodeKids
where
extract' (PageTreeLeaf tn) = pageExtractText tn
extract' (PageTreeNode tn) = extract pdf tn
The text was updated successfully, but these errors were encountered:
hi - i just used #master to do the common thing of extracting all text from a pdf. it worked, thanks for the nice library! it took a while to figure out how to do it and required more contortions than i expected. perhaps you could add some api support for such a basic task? here's what i wound up with, is this what you expect users to do?
The text was updated successfully, but these errors were encountered: