PDA

View Full Version : PDF to WEBP tool



sedgetone
June 24th, 2023, 12:37
In an attempt to learn Python, I thought I'd motivate myself by doing something hobby related. Now I can't stress this enough, I AM NOT A CODER. What I've made works for me and I hope some of you folks get some use out of it too. Don't worry if you haven't got Python installed, I've built a Windows executable but you will still need Poppler installed and in your PATH environment variable.

This is a simple little app that provides two functions:

1. Converts each full page of a PDF to a WEBP
2. Extracts images from a PDF and converts them to WEBP

Installation
Grab the app from my Google drive, the Python script is there as well if you want to use/abuse that instead. I've put some comments in the top regarding required libraries if you need to run it.
https://drive.google.com/drive/folders/1UWjs6WQU3TtFyHf2CSWzKykEvLujy7FB?usp=sharing

Make sure you download and setup Poppler:
https://github.com/oschwartz10612/poppler-windows/releases
1. Download the latest release zip file and extract it.
2. Copy the extracted poppler folder in to C:\Program Files\.
3. Click the Windows search and type environment, in the results select "Edit environment variables for your account".
4. In the upper box titled "User variables for <username>", Select Path and press the Edit button.
5. Click New and Browse to the poppler bin directory. For example for me it was C:\Program Files\poppler-23.05.0\Library\bin
6. Ok and close off the settings.

Check Poppler is working:
1. Click Windows Start icon and type cmd, open the Command Prompt tab.
2. If Poppler is installed you should be able to type pdfinfo.exe -h and run it. If it says that it can't find pdfinfo.exe then check you've set the bin properly in the Path variable.

Usage
Start with a low page count PDF, I used the Call of Cthulhu quick start guide as a suitable test, grab it of the Chaosium website. The app interface is pretty obvious, browse to select your PDF and the output directory will update to the folder where the file is located. You can of course set it to anywhere else you like. Quality does what it says on the tin and adjusts the quality of the WEBP compression. I've found I can wind this down to 30 and still get usable results but it defaults to 70.

Extract Images should be pretty quick. If you watch the output directory you'll see JPGs or PNGs that get extracted are then converted to WEBP and the source images deleted. Convert Pages will take a lot more time to run, it will say Not Responding but under the hood the hamster is busy working. When I say 'a lot more time', I mean go make a warm beverage of you choice. Maybe go out for a short walk. You should then see it spit out JPGs and convert them to WEBP in the output directory.