PDA

View Full Version : How To Extract Images From A PDF



ColinBuckler
October 27th, 2017, 22:13
OK, so you are trying to create your own custom game with images from a PDF - this can be a real pain in the you know what if there are several images and then cutting and pasting from the PDF.

You will be pleased to know there is a much easier way.....

(1) Download the attachment and extract the contents to a folder on your windows PC.

(2) In the extracted folder you will find an "images" folder. Copy the PDF into this folder.

(3) Run the "pdfextract.bat", this will extract the images from the PDF into the "images" folder.

(4) Copy the images your require to another folder.

Note1: This will extract the images at their original resolution.

Note2: Running the "pdfextract.bat" will erase and jpg or ppm files in the folder before extracting the files. The batch file will also rename the pdf to z.pdf

The pdfimages.exe file can be downloaded from https://www.xpdfreader.com/download.html as a part of the Xpdf tools suite .zip file.

Edit: .zip attachment now contains the 32 bit version of Xpdf rather than the 64 bit previously included.

LordEntrails
October 27th, 2017, 23:34
Nice, I've bookmarked this thread. Sure I will be needing it in the future.

Talen
October 28th, 2017, 06:02
Does this need to be run as an administrator? I downloaded the zip file here, extracted, put my pdf in the images folder, ran the bat file andI see the screen blink like something ran - but nothing new is in the folder and the pdf isnt renamed.

ColinBuckler
October 28th, 2017, 09:50
Hi Talen,

It shouldnt - or it doesnt on my Windows 10 machine here.

Have you copied the pdf into the images folder which must be a subfolder directly beneath the batch & executables.

I just downloaded the zip file and placed it n a temp folder. Copied a 650+ page pdf into the fole images folder and ran the batchfile. It extracted over 1,400 images in about 10 seconds.

ColinBuckler
October 28th, 2017, 12:06
Changed the zip file to include the 32 bit version rather than the 64 bit version.

Talen
October 28th, 2017, 13:19
All set. Thanks for the suggestion!

bugmenot
October 29th, 2017, 20:31
Great Tool, thank you a lot!

I'm going to use this on a regular basis for preparing for my games and will probably find a million other uses for it.

donpaulo
March 18th, 2018, 11:22
Hmm for some reason the program isn't saving any files.

I am using the 64 bit one.

There is an image format folder so I am using that.

I can copy and load the pdf file fine.

When I choose save image I get a popup with a jpeg pulldown menu.

I click OK

I get the standard popup save image folder

but cannot save

22677

damned
March 18th, 2018, 12:17
Im just using the 32bit one and having no issues... maybe give that a try...?

donpaulo
March 18th, 2018, 12:20
Indeed I will do that

thanks mate

ColinBuckler
March 18th, 2018, 13:47
64bit apps do have some performance/system requirements increases over the 32bit versions. In this case you will find limited benefit from a 64bit version.

Using the 32bit version I have extracted 1,400 images in less then 1 minute. A 64bit version may do it quicker, but does it really matter? It will take you a lot longer to review the extracted images!!!

Trenloe
March 18th, 2018, 14:36
Hmm for some reason the program isn't saving any files.

I am using the 64 bit one.

There is an image format folder so I am using that.

I can copy and load the pdf file fine.

When I choose save image I get a popup with a jpeg pulldown menu.

I click OK

I get the standard popup save image folder

but cannot save

22677
I'm confused what your actual issue is here. The screenshot you show displays the "Save As" window. You need to enter a name for the image in the "File name" field and then you can click "Save". Is this the "but cannot save" issue you're having?

donpaulo
March 19th, 2018, 09:10
Installed the 32 bit version

having the same issue

I can load the PDF

It wants me to run as admin

I reload the program as admin

load the PDF

optimize the photo

Click on save image as

bzzt no love :(

donpaulo
March 19th, 2018, 09:11
I'm confused what your actual issue is here. The screenshot you show displays the "Save As" window. You need to enter a name for the image in the "File name" field and then you can click "Save". Is this the "but cannot save" issue you're having?

Indeed, I get the pop up

when I run as admin, I can use the save button, select a file name but there is no file in the folder after the program "runs"22709

damned
March 19th, 2018, 09:21
Indeed, I get the pop up

when I run as admin, I can use the save button, select a file name but there is no file in the folder after the program "runs"22709

Are you running the pdfextract.bat file?

donpaulo
March 19th, 2018, 09:29
Having issues with that bit actually :(

New to this batch file thing22711

damned
March 19th, 2018, 09:56
Download the attachment from the first post.
Extract is somewhere eg c:\users\donpaulo\pdfextract
copy your pdf into the folder and rename it as z.pdf
Double click the batch file

Trenloe
March 19th, 2018, 15:26
select a file name but there is no file in the folder after the program "runs"22709
Use windows files explorer to look for a newly created text file in that directory or in the directory when the original PDF was. If there are security issues a short message will be output to a text file.

Minty23185Fresh
June 11th, 2018, 18:46
Colin, as a test I used this tool on "A Great Upheaval", a .pdf I downloaded from the DMs Guild. I "successfully" extracted 101 "images" from the .pdf as .ppm files. What are .ppm files? And why are they used? (A quick Googling of .ppm states that they are less than idea image files.) Also what do I open them with? Photoshop, paint.net, paint will not open them. Inkscape does, but the rendering looks questionable to me.

ColinBuckler
June 11th, 2018, 20:26
Hi,

Good question... before when I have extracted and ppm files have appeared they were ever so small so I assumed they were other data files and I never tried to open them - just deleted them leaving me with the image files.

Do you have any other image files or just the ppm files?

Minty23185Fresh
June 12th, 2018, 20:05
… Do you have any other image files or just the ppm files?

I did not have any image files. I ran it in a VM (Virtual Machine). I have no idea why the image files were not written to disk.

I tried again. In the VM, I invoked a DOS shell (cmd.exe) and then executed the batch file from within that. I did this to observe possible error messages. There were two messages, missing *.jpg and missing *.ppm files. (Which are unimportant, see below.) My test .pdf was large (~30Mb, >700 images). As near as I can tell all images were extracted this time.

I also tried running the batch file without a VM but using a DOS command shell. Again all images seem to have been extracted. And the same two error messages were reported.

As a another test, no VM no DOS shell, just simply double clicking the batch file from Windows Explorer. Another success.

So to summarize, everything seems to be working fine. The results of my first attempt must have been an aberration.

The following will be unimportant to most people reading this thread, except those with a particular "techie" inquisitive nature or those having difficulties.

The error messages were a conundrum until I started digging. They are unimportant, since they are the result of the two erase commands in the .bat file. In addition to the .jpg image files, .pbm, .pgm and .ppm files are created, but not one-to-one with the .jpg files.

I went to the "superuser.com" web site (part of stack exchange) so I feel comfortable with the information I got there. The .ppm, .pbm and .pgm files have to do with image type within the .pdf. I'm not sure that I care enough to explore this further, but someone else might. One item of interest is the -l command line parameter, which gives an image by image extraction report. It might be helpful if an image you want from the .pdf doesn't extract.

Colin. Thanks for your help.

ColinBuckler
June 12th, 2018, 23:57
Wow ... a significant and in depth amount of testing.....

It is painful to extract a lot of images from a PDF and the program does it (generally) quite well and very, very fast.

All I can take credit for is the batch file which simplifes the process for the average user.

The true credit should go to the developers over at https://www.xpdfreader.com/ who developed the routine in the first instance.

Anyways I hope the images extracted are of sufficient quality for use. If not, I am sure you are aware of a variety of programs to resize them, one I like is https://www.xnview.com/en/xnconvert/ as its very good at batch resizing.

Minty23185Fresh
June 13th, 2018, 01:06
… Anyways I hope the images extracted are of sufficient quality for use...

Actually Colin, after reading the information on the "superuser" site. It is my belief that one can't get better quality. The routines extract the image in the format that it was embedded into the .pdf file. With many of the DriveThruRPG and DMs Guild PDFs, the WotC guidelines specify that the PDFs should be constructed using JPGs. If I understand all of this correctly, it doesn't get any better than that.

When I first saw the images extracted as JPGs I thought to myself, "too bad it doesn't extract them in an uncompressed format, extracting to JPGs might not be totally "lossless"". But after further reading my ignorance has been assuaged. I'll probably do a little more extraction comparisons, and report back.

But right now, I'd recommend the methodology you posted as the #1 avenue for extraction of images from PDFs. Especially since file size is irrelevant, as is password protection of PDF properties (like watermarks). The PDF I extracted from was a DriveThruRPG watermarked purchased adventure.

Keep in mind I'm not the #1 authority on this, but I have spent just about two weeks now searching the forums, trying various methodologies and trying to digest it all into one easy to eat package.

Swifty0x0
June 14th, 2018, 08:20
When I run it I get an I/O error message saying it can't open the pdf file. I've tried more than one pdf file, none could be opened. I ran the batch file and I also ran the exe from a command prompt, both as administrator. I've tried the pdf files with different names including z.pdf and they were in the images subfolder. I have Windows 8.1. I've also checked the pdf files for security issues, but they look normal.

damned
June 14th, 2018, 08:42
My last run batch looks like this:

@echo off
cd images
erase *.jpg
erase *.ppm
..\pdfimages.exe -j ..\Strange_Tales_of_the_Century.pdf images
cd ..
pause


Worked fine.

Swifty0x0
June 14th, 2018, 08:48
The batch file in the zip looks like this...

@echo off
cd images
erase *.jpg
erase *.ppm
rename *.pdf z.pdf
..\pdfimages.exe -j z.pdf images
rem erase z.pdf
cd ..

Is that right, or is there something missing or set wrong?

damned
June 14th, 2018, 11:14
Hi Swifty0x0

I had trouble with the rename part of the script so I ran it on specified files each time.

Nyghtmare
June 14th, 2018, 18:00
Hi Swifty0x0

I had trouble with the rename part of the script so I ran it on specified files each time.
I just manually renamed the PDF that I wanted to extract the pictures from as "z.pdf" and left the batch file alone.

ColinBuckler
June 15th, 2018, 12:44
I was working on several files and tried scripting the batch file to process each pdf and write out to its own folder.... but became fedup as to how long it was taking me (seem to recall it was a problem with .exe not working in sub folders) - so I to change the process to use a single file called z.pdf. After all how often do you process a file compared to creating the module and editing the images (damn them secret doors !!!)

All I then had to do was simply copy the pdf into the folder and rename it. Run the batch and move the extracted files. Rinse and repeat....

Swifty0x0
June 17th, 2018, 05:09
Thanks for the replies guys. I ended up downloading a free version of PDFMate PDF Converter and that had no problem working on the same PDF files I couldn't get to work with the pdfimages.exe file.

patriot101
January 5th, 2020, 22:49
This works amazingly well. I got a bit greedy and lazy and tried to copy more than one pdf into the /images folder. It seemed to convert the first and stop. That is ok. I had a few PDFs and I just copied them in one at a time when I did the conversion. I looked at the batch file, and it made sense. I can't thank you enough for this utility!

What I love is that even where I have a 2 page PDF, it converts the file to a couple of jpegs, which is nice.

daboking
April 28th, 2020, 20:43
sadly this does not work with my L5R pdf's. I borrowed a friend's laptop with acrobat, and CAN remove the text over images, but that will be tedious. When I use this, it does not find any images, though. I hope I can find a solution as I have spent weeks and weeks converting the books to morecore, and would love to add the images to my modules I am creating.