data mining

Extracting Tables from a PDF

Today I finally find myself needing to extract information from files delivered in the PDF format. I have heard good things about the tabulizer package, so we will try that out today. Whelp, it turned out that I needed to ensure a 64-bit installation of Java before I could install tabulizer. Also, I used remotes::install_github(...) command from [the package’s GitHub page[(https://github.com/ropensci/tabulizer)] to force the installation (as there appears to be issues with installing a package through CRAN where there are concerns of Java dependency).

Continue reading