PawPrint.net News

search main menuTechnoblogglePDF Repair

September 15th, 2010

Technobloggle
1 Chirp

PDF Repair

The completely manual, last ditch, option to recover a PDF

I was recently given a PDF file that consisted of scanned pages requiring repair. Commercial repair tools only recovered the first page. I managed to find one option to go a little further. This option for advanced users is presented here in case it helps others.

First of all, I have dealt with these a few times and if you need the text content of a PDF this will not work - grab one of the many commercial PDF repair tools and try that. I have a few I like to use and in general they will recover a lot, but no automated approach is going to get you 100% of the way there...

Tools Needed

Photoshop (perhaps some other image editing program)
Textpad (or another good text/hex editor)

Knowing that when you scan an image into a PDF, most of the time it is stored as a JPEG within the PDF I figured I would start by changing the file extension from .pdf to .jpg and see if it opened in Photoshop... It did! but only the first page. So now the tricky bit...

I opened the pdf in Textpad and started scanning down the file searching for "endobj" basically you will see constructs that look like:

endobj

13 0 obj

<< /Type /XObject /Subtype /Image /Width 1275 /Height 1649 

/BitsPerComponent 8 /ColorSpace /DeviceRGB

/Filter /DCTDecode /Length 239648 >> 

stream

followed by a bunch of binary data. If you look closely at the first line of the binary you´ll see "JFIF" (a JPEG/JFIF compression header)
The endobj is the end of the object before and the 13 0 obj starts a new object.
so... the method goes like this:

Remove everything in the file down to and including the next endobj, save the file (make a copy obviously) open in Photoshop (hopefully getting the next page) and save as to a JPG then repeat for each subsequent image.

In the case of my file I only got 1 and a half more pages before the file abruptly ended, but that was better then nothing. I truly hope this helps others, granted it´s an edge case for repairing a PDF file and will only work to get out the images from a PDF, but as it was for me, something was better then nothing

Share this:

No Comments

You must login or register to post comments Login/Signup

RSS feed	Feed Description
All News RSS feed	Complete RSS feed
Technobloggle	RSS feed for: Technobloggle
A Rich Site Summary (RSS) feed is an xml data file that provides a summary of the information contained here. It is not designed to be viewed in your browser, but instead by rss reader software. If you do not know what this means - you can safely ignore it, as it is provided for advanced users with rss reader software only.

Web Design

Software

Company

Tell your friends Circle on Google+Follow on Twitter Subscribe