Best common file format to use to create PDFs?

Discussion in 'Photoshop' started by Zak, May 30, 2006.

  1. Zak

    Zak Guest


    Hi Nils and others. I understand now that when I create a PDF from a
    image file that the format of the image file is not used inside the
    PDF. Instead some other format is used in the PDF (which Nils kindly
    suggests may be a specialized form of TIFF).

    It is this conversion from my image file format to the internal PDF
    format which I want to be done smoothly. I am on XP and I am
    wondering if it is better to start with a GIF or a JPG or BMP or
    whatever to feed into my PDF creation utility.

    I should say that I am starting with a hard copy of a document
    created on a word processor. I want to avoid artefacts, unecessarily
    jagged lines, moire effects and all that stuff which might come from
    transforming from an "awkward2 image format to a PDF.

    My PDFs will be for public distribution. I have preferred to scan to
    a GIF file rather than a TIFF because I have assumed that when I
    circulate the basic image file among certain people that the best
    balance between image size and the best chance of them being able to
    see the file is a GIF.

    To me TIFF feels a bit specialized. For example, I never see a web
    page with TIFF images but I see lots of pages with GIFs.

    Also there seem to be various compression options for a TIFF (group 3
    or 4, LZW, JPEG deflate, none) which might makes it even harder for
    me to know what to choose as a common format! The Wikipedia says
    documents are often scanned to TIFF group 4 but is that something
    which has the best chance of being seen on various PCs in various
    organisations that I might need to send it to?
     
    Zak, May 30, 2006
    #1
    1. Advertisements

  2. Zak

    CSM1 Guest

    You can create a very clean PDF directly from a Microsoft Word Document
    (.doc).
    There are programs that act like a printer that creates a PDF, just by
    "printing a PDF".

    PDF Create! is one such program.

    Just search Google for "microsoft word print pdf" without the quotes.
    You will get lots of responses.
     
    CSM1, May 30, 2006
    #2
    1. Advertisements

  3. Zak

    Zak Guest


    The documents are not written by me. They have been sent to me so they
    are in hard copy form and need scanning.
     
    Zak, May 30, 2006
    #3
  4. Why not let a program like PDF-Tools take care of the problems for you -
    this will scan direct to PDF for you without the intermediate image
    process - all you need to do is make you decision's regarding
    optimisation/compression.

    you can try it here within the PDF-XChange PRO package (not standard or Lite
    versions) - until licensed you will get demo watermarks in the top
    right/left corner of each page which do add about 4kb to each page.

    http://www.docu-track.com/downloads/users/

    --
    Best Regards

    John Verbeeten
    Tracker Software Products
    PDF-XChange & SDK, Image-XChange SDK,
    PDF-Tools & SDK, TIFF-XChange & SDK, DocuTrack.
    Email :
    Support: http://www.docu-track.com/forum/index.php
    Web site : http://www.docu-track.com
     
    John V-Tracker, May 30, 2006
    #4
  5. Zak

    AES Guest

    If the documents are in single-sheet form and can be fed thru a
    sheet-feed scanner, the fairly new Fujitsu "ScanSnap" can automatically
    produce PDF output (or other formats, at user's option).

    It's a bit pricey (circa $400) but it's a pretty nice unit, small, fast,
    easy to use, can do both sides at once, auto-select for B&W or color,
    and so on.
     
    AES, May 30, 2006
    #5
  6. ["Followup-To:" header set to comp.periphs.scanners.]

    Zak is obviously not a programmer, let alone an experienced one. Using
    PDFlib from C isn't that difficult if you have some experience in C,
    though.

    And creating a PDF using only the spec would take a bunch of experienced
    programmers a while. The PDF spec is really, really complex. Its
    complexity is one reason why PDFlib and ps2pdf and OpenOffice's "print
    to PDF" functionality exist.
    Using tiff2ps -> ps2pdf says that a grayscale TIFF ends up converted to
    a stream object that can be decoded by the FlateDecode filter.
    YPDFEngineMV, obviously.
    Depends on what you want. Get a good scan, and convert it to
    black-and-white if you can do that without losing important info;
    that'll make the PDF smaller. JPEG may introduce artifacts, so you
    probably don't want to use that. TIFF G4 and TIFF LZW are lossless, so
    you may want to use those.
    Yuck. The original WordPerfect or whatever file would've been a much
    better place to start from. PDFs with just text in them tend to be
    smaller, display faster, and can look good at any zoom level. PDFs made
    from images take a longer time to display, are larger, and look terrible
    at high zoom levels.
    ? You're creating a PDF, not distributing a series of image files.
    This is because of Hysterical Raisins in the history of web browsers,
    and because of Unisys being asses. JPEG compresses better than TIFF-LZW
    for lossy color images, and smaller images are preferred, especially when
    you're on dialup. TIFF-LZW gives the best lossless compression for
    color images, but TIFF-LZW is usually used where losslessness is more
    important than file size (like in prepress.) Also, Unisys said they'd
    sue anyone who made a TIFF-LZW compressor unless they paid Unisys a
    license fee.[0] These things combined made it so that the earliest GUI
    browsers didn't support viewing TIFFs, just JPEGs and GIFs. And this
    has persisted to the present day... even though TIFF-G4 compresses
    better than *anything* else, and does so losslessly, iff your image is
    black-and-white.
    ....what? If somebody can't figure out how to view a Group4 TIFF,
    they're probably computer-illiterate. Anyway, aren't you making a PDF
    here? It doesn't matter what the original image format was if it's been
    PDFed. Acrobrat Reader can decode the image data within a PDF, as long
    as the PDF library/PDF writer/whatever that created that PDF wasn't
    smoking crack. Anyway, HTH,

    [0] Fortunately, their patent (on a *mathematical method*!) expired a
    couple of years ago, so all the Free stuff can write LZW now, which is a
    win for everybody.
     
    Dances With Crows, May 30, 2006
    #6
  7. Zak

    Aandi Inston Guest

    Never use JPEG for this purpose. GIF and BMP are not the normal
    choice.
    Yes. The image file format isn't stored in the PDF.
    Absolutely not JPEG. BMP has no advantage over TIFF and GIF has
    disadvantage.

    I don't really follow your question. since GIF and TIFF use lossless
    compression, then preserve quality and avoid artefacts and
    interference patterns, by definition.

    You may have the choice of whether to use lossless compression, or
    not, in making the PDF.
    If you are distributing the image file, that may be true. If you are
    preparing the PDF file from the image file, it is not relevant at all.
    That's because web browsers can display GIF and JPEG images as
    standard, so web graphics are in those formats. That doesn't make them
    in any sense "best".

    TIFF is the industry standard format for document scanning, by a very
    wide margin.
    These options are not relevant. The PDF file doesn't include the TIFF
    information, only the image from the TIFF file.
     
    Aandi Inston, May 30, 2006
    #7
  8. Zak

    Roger Guest

    As far as I can see there really is no best common file format to
    convert. If it'll convert it'll work. However the size of the
    original file will have a direct bearing on the size of the pdf.

    If you are doing something like creating a newsletter, flyer, or
    Internet distribution then why not use the original doc file?

    I handle several newsletters on line and in print.
    With Adobe pro any Office and I believe Word Perfect doc can be
    converted directly to a pdf. However any images in the documents
    should be of the proper size and resolution for the end media. I've
    had Word docs sent to me that had the full original images with just
    the physical dimensions set. They were still the original one or two
    meg images set to a dimension of 2 X 3 inches. These produced nice
    looking pdfs, but of many megabytes. Having the images set to the
    proper resolution (300 ppi for print and about 100 ppi for screen)
    dropped the pdf to less than 100K.

    Also not all pdf creators are created equal. About a year ago I tried
    using open office to convert a word doc and produced one that was
    about 3 to 4 times the size of one using Adobe Pro. This is fine for
    printed media, but may (or may not) be a royal pain in the back side
    for on-line viewing.

    For on-line I much prefer HTML rather than pdfs as the HTML will be
    faster to load and more compact. At least it will if it wasn't created
    by saving a Word doc as HTML or using Front Page to create it. Those
    are huge. OTOH converting to a pdf is faster and much easier and I do
    use them when the pdfs are relatively small.

    Roger Halstead (K8RI & ARRL life member)
    (N833R, S# CD-2 Worlds oldest Debonair)
    www.rogerhalstead.com
     
    Roger, May 30, 2006
    #8
  9. Zak

    Aandi Inston Guest

    No. The size of the original will usually have no effect whatsoever,
    though some PDF creation methods are influenced by it.
    With Acrobat Pro or Acrobat Standard, any file you can print can be
    converted directly to a PDF.
    Or, you could use Acrobat options to reduce the resolution.
     
    Aandi Inston, May 30, 2006
    #9
  10. Zak

    tacit Guest

    Why don't you just start with the word processor file, and not with a
    hardcopy at all? Go straight from the word processor file to PDF.
    A GIF is almost the worst possible choice to use, because GIF images
    contain a very small number of colors, and because of this they don't
    tend to downsample smoothly.

    Use TIFF. Anything that can read a PDF, can read a PDF, period. It does
    not matter what you start with; once it is turned into a PDF, it is a
    PDF. However, a TIFF image will downsample and compress smoothly.
    That doesn't mean a GIF is the best format to use for general purposes,
    however.
    You do not need to choose any of these. You do not need to compress the
    TIFF at all.

    Scan a TIFF, make a PDF, send out the PDF, you're done. Or, better yet,
    do not use your scanner at all. Start with the word processor file, make
    a PDF--it'll be smaller and far higher quality. :)
     
    tacit, May 31, 2006
    #10
  11. Zak

    Zak Guest

    Snipped and trimmed to context.

    I think you are "echoing" what I have just been reading from Dances-
    With-Crows.

    I should explain the artifacts notion i was asking about. If I scan to
    a GIF which I understand is lossless, then it still has a certain
    number of "lines" and a certain block size or whatever it is that is
    inside a GIF. If these blocks and lines do not match up with those used
    by the image it is converted to inside the PDF then there may be
    additional irregularities introduced at those places of mismatch.

    It's a bit like memory and a system bus on a motherboard. If they are
    both 100 MHz then they sing in harmony. If the memory is 133 MHz (and
    can not fall back to 100 MHz) then they may give a slightly "off"
    performance.
    Are you not saying that it is important to choose the internal image
    inside the TIFF correctly? I think you are. Then I guess you would
    concur with Dances-With-Crows about using Group IV. Remember that I do
    want the option of sending the raw image to colleagues (rather than the
    shrink-wrapped and sealed PDF).

    Thank you for any extra info.
     
    Zak, May 31, 2006
    #11
  12. Zak

    tomm42 Guest

    When i creat a document that is going to be a PDF I always use TIF
    files, mainly because Indesign handles TIF files well. I generally use
    LZW compression on my TIFs, seems to make no difference.
    Once the PDF is created the image files, in my understanding are
    converted to Jpeg files, at least that is how they can be extracted.
    With the file already being downsampled for the web it is very unlikely
    you will see jpeg artifacts coming from an orginal TIF. Multiple
    compressions or resampling from a jpeg is another story.
    Working from graphics or drawings GIF may be applicable, but for
    photographs GIFs should be avoided.

    Tom
     
    tomm42, May 31, 2006
    #12
  13. Zak

    Zak Guest

    Unfortunately, some of the documents have been sent to me in hard copy
    form.

    Yes, I feared that once I had mastered the basics then my next task is
    to identify is my PDF creator is doing as good a job as I might want it
    to.
     
    Zak, May 31, 2006
    #13
  14. Zak

    Zak Guest

    DOWNSAMPLE. That's the word! I ahve just written one if not two
    paragraphs trying to explain what I man and then you come along and
    express the idea in a single word!

    OK, so TIFF it is going to be. And to swagger my newly gained
    knowledge I will add that it might be group 4 or LZW (and I nod very
    slowly as if I know what I am talking about - which I don't).
     
    Zak, May 31, 2006
    #14
  15. Zak

    Zak Guest


    Can I add about an additional point to do with TIFFs.

    When I go into Acdsee and launch Twain, I am asked what format I want
    to scan to.

    I say TIFF and then I have an option where I can select Group 4. I
    am also asked to fill in the dpi value horizontally and vertically.
    I don't get asked this when I choose to scan to GIF or to JPEG.

    When I get into the actual Twain screen I choose the scanning
    resolution as usual.

    So, what values should go into those horizontal and vertical boxes
    for TIF? Do I need to put in the same value as I use for Twain's
    scanning resoultion? (This can be awkward.)

    The software is slow to load and if I put in 200 for these TIF value
    and scan at 266 or 300 then does that lead to problems or loss of
    quality?

    I have tried 200, 300 and 600 in the H & V boxes (at scanning
    resolutions of 200, 240, 266, 300) and the 200, 300 or 600 seems to
    make no difference at all to the final size.

    I will have to look closely to see the quality.

    Can you or annyone else comment on this extra pair of values.
     
    Zak, May 31, 2006
    #15
  16. Zak

    tacit Guest

    Group 4 compression is the compression used by FAX machines. When you
    send a FAX, the vertical and horizontal resolutions are different; FAX
    machines use pixels that are not square.

    TIFF supports Group 4 primarily to facilitate software that receives
    FAXes on a computer, or computer programs designed to make scans and
    then send FAXes. Since that's not what you're doing, there's no reason
    to use CCITT Group 3 or Group 4 compression (which really only works
    well on simple bitmaps anyway).
     
    tacit, Jun 1, 2006
    #16
  17. ["Followup-To:" header set to comp.periphs.scanners.]
    Fax machines use Group3, not Group4. Group3 is less efficient than
    Group4.
    TIFF has supported having different horizontal and vertical resolutions
    since the format started up; this is not a fax-specific thing. Not many
    people use this TIFF capability, and some programs will barf if they
    read different values for TIFFTAG_XRESOLUTION and TIFFTAG_YRESOLUTION,
    but it's in the TIFF spec.
    Group4 is A) lossless B) more efficient than any other compression
    method for bilevel data. These qualities make Group4 an excellent
    choice for storing black-and-white images. Zak was scanning documents
    that consisted mostly of text, which is typically very high-contrast and
    works really well in black-and-white. So every page with just text (no
    graphics) on it could easily be turned into a Group4 TIFF with no loss
    of data. HTH,
     
    Dances With Crows, Jun 1, 2006
    #17
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.