How to Extract Images from PDF Files with pdfimages
This article is going to be a bit different than other articles I have published previously. You guys have learned a lot about Linux command-line and now it is time to put some simple command in practice. Practice, practice… This is the way to learn and be very good at anything.
What tool is this article going to teach you? Have you ever heard or used pdfimages tool?
What is PDF?
PDF (Portable Document Format) is a wonderful format that makes sharing ideas and information very easy. It is an open standard for document exchange nowadays. Maybe you have questioned yourself about .pdf extension at the end of your favorite e-book. This extension tells that it is a portable document. Sometimes, when you read an e-book or a travel guide or whatever you are amazed by photos you see there and want to share with others. You have tried to copy and paste these images and this method does not work. You looking for a solution? Unixmen is here to help!
What do you need?
Ok guys, before going further you need a tool which is called pdfimages. It is used to extract images from pdf files and it has many useful options such as write jpeg images as jpeg, specify the first page and the last page for image extraction, specify the username and password for encrypted files etc.
The pdfimages tool is part of poppler-utils. Open a new terminal and type the same command as shown in Figure 1.
Mine is already installed. The pdfimages tool offers you help on its usage through four options. The one I like is –help. If you are the kind of person that do not like to remember a lot of options or you use to many tools and there is no more space in your brain for such options, then you can use the help options. These options are very useful because they help you to learn how to use the tool. I am sure that you guys can learn how to use the tool without reading this article, but there are a lot of newbies out there and this is information is priceless for them.
Imagine someone that has used Windows OS all their life, can he or she complete the task without reading again and again this article? I think that this question does not need an answer. Ok guys, lets steal some images from those pdf files.
After the tool is installed and ready for use, change the directory where your files are by using the cd command. My e-book is in my Desktop so I will use cd to go there, as shown in Figure 2.
Some important things you need to know before using the tool:
- The name of the document you want to extract images from
- The starting page (specify as integer)
- The end page (specify as integer)
- The -j option (Save jpeg images as jpeg)
The -j option is very important. If you do not specify it the pdfimages tool will extract jpeg images and save them in the .ppm format (Portable Pixmap). It will be a time and memory consuming process for your machine because each image will be over a megabyte in size. If you want to extract only one image it is not a big deal, but what if you have a document with 130 images and you want to extract them all? Uff!
As you can see from Figure 3, I used –help option to find information about this tool. Now it is time for some action.
I want to extract all images from the e-book you see in Figure 4. So I don’t have to specify the start and end page.
Figure 5 shows that i used the -j option to save jpeg images as jpeg and i used ‘firebug‘ at the end of the command. Every image name will start with ‘firebug’. The complete image name has the word you put at the end of your command and a number as you see from Figure 5.
As you can see from Figure 6, some of images are in the ppm format and some in jpeg. This means that jpeg images are extracted and saved as jpeg files thanks to the -j option. Do not forget the -j option, it is very important!
One last thing I want to add is that all images are with the same quality as in the pdf document. Amazing!
Now it is time to share this method with others. If you like this guide, share it with your friends and family and comment on your thoughts below.