ColdFusion 8: Working with PDFs (Part 2)

Yesterday I blogged about new PDF functions added in ColdFusion 8: isPDFFile and isPDFObject. Today I'm going to continue my discussion of the new PDF tools in ColdFusion 8 by introducing the CFPDF tag. This is one of the 5 new PDF related tags added to ColdFusion 8. This one tag can do many things:

  • It can add or remove a watermark to a PDF.
  • It can remove pages from a PDF. (Ever wanted to remove the legal crap from in front of a PDF? Or an ad?)
  • It can return information about a PDF.
  • It can merge multiple PDFs into one.
  • It can add/remove security from a PDF.
  • It can read a PDF. (Duh.)
  • It can set metadata to a PDF.
  • It can create thumbnails from a PDF.
  • It can write out to a PDF.

Lets start off with a simple example of reading a PDF. Consider the following example:

<cfif isPDFFile("book.pdf")>

   <cfpdf action="read" source="book.pdf" name="mypdf">

   <cfdump var="#mypdf#">
   
</cfif>

I begin by checking to see if a file is a proper PDF. If it is, I then use the CFPDF tag to read the PDF into a variable named mypdf. At that point I can dump the PDF and see information about it. By the way, the same trick (reading and dumping) works for images as well.

I've displayed the dump to the left, and you can see it reveals quite a bit of information about my PDF. The PDF I had used was one made from scratch using CFDOCUMENT, so somethings like Author and Keywoard are blank. But it did pick up the page size and security settings. It is too bad that CFDOCUMENT doesn't easily allow us to set the metadata, but guess what? We can use the CFPDF tag to correct that!

The setInfo command lets you pass in a struct of information. You can change the author, the subject, the title, and the keywords for a PDF. Let's look at a simple example:

<cfif isPDFFile("book.pdf")>

   <cfset data = {author="Raymond Camden", Subject="Paris Hilton", Title="The Wit and Wison of Paris Hilton", KeyWords="paris hilton,wisdom,wit"}>
   
   <cfpdf action="setinfo" source="book.pdf" info="#data#">

   <cfpdf action="getinfo" source="book.pdf" name="mdata">

   <cfdump var="#mdata#">   
   
</cfif>

I first create a simple struct of data. I then pass this struct to the CFPDF tag, noting the action of setinfo, the source for my PDF, and the struct of data. I then use getInfo to get the information back, and dump it. Now my PDF created from CFDOCUMENT has proper metadata in it.

Tomorrow I'll demonstrate adding and removing watermarks from your PDF documents.


Comments

It disturbs me that when you type about Paris Hilton your spelling error chance increases by 1000% percent.

Look at the Title of your PDF, Wison huh?
# Posted By Nick | 7/10/07 5:11 PM
"The Wit and Wisdom of Paris Hilton"

Wouldn't that be an empty document?
# Posted By RobW | 7/10/07 8:16 PM
Well, it *might* have a watermark in it. Tune it tomorrow to see.
# Posted By Jeff Fleitz | 7/10/07 8:48 PM
I am trying to change the Title of some PDF's (Which already have titles) and i am not having any luck. It seems like Setinfo does not override current settings.

The strange thing though is when i view the CFDUMP everything is correct. but not when i View the PDF in Acrobat and view the document properties.

Any Ideas?
# Posted By mike | 9/11/07 12:42 AM
Did you _save_ the PDF object after you did setInfo?
# Posted By Raymond Camden | 9/11/07 6:03 AM
Ray - just wondering if you knew a way to copy the content of a pdf that has ContentExtraction=NotAllowed and CopyContent=Allowed? Basically, I would like to use CFPDF to programatically copy the content of the pdf to a text file so i could then parse it into normalized data. I can copy and paste manually from the pdf, just not sure if i can use CFPDF or some other technique to do this in an automated way. Any advice or suggestions are greatly appreciated.
# Posted By Steve Eller | 1/9/08 3:39 PM
Steve, search my blog for my pdfutils.cfc. It has a utility which uses DDX to get the text form a PDF. You can't do it directly with CFPDF, but you can with CFPDF and DDX.
# Posted By Raymond Camden | 1/9/08 4:14 PM
Thanks Ray - I'll check that out.
# Posted By Steve Eller | 1/9/08 6:05 PM
After I do a SETINFO with CFPDF to update keywords, author and subject, the updated data is visible IN ACROBAT 5 but it does not display in newer versions of Acrobat or Acrobat Reader when you go to view document properties.

How can I correct this?

Thanks,
Jim
# Posted By Jim Nauta | 2/5/08 3:28 PM
Not sure. I had no problem with this at all. Maybe you can share your code.
# Posted By Raymond Camden | 2/5/08 4:24 PM
I set up a structure and enter the title, author, keywords, and subject. Then it is a simple cfpdf setinfo tag. The changed info shows up in Adobe reader 5, but not in newer versions, the older information still shows up. Is there a change in field names maybe?

<cfpdf action="SETINFO" source="#fldir##flnm#" info="#PDFInfo#" destination="#fldir##flnm#" overwrite="Yes" >
# Posted By Jim Nauta | 2/7/08 10:26 AM