How to downsample and convert from grayscale-> black&white?

Discussion in 'Photoshop Tutorials' started by Ramon F Herrera, Jun 5, 2004.

  1. I have a bunch of TIFF images that were scanned in grayscale
    mode at 600 dpi. Each one takes ~32MBytes of disk space, and
    the images are typical office documents -mostly text with
    a few logos-, which are being processed by OCR.

    My main concern is: what is the best way to obtain as much text
    recognized as possible? I chose 600 dpi in order to get even
    the smallest type. The grayscale leaves a lot of "gray dust"
    in the areas were the original paper page was the purest white.
    Is there an Photoshop filter that will leave the white background
    really white? If such filter exists and I apply it, will it
    affect the OCR recognition? (in a positive, negative way?).

    Since I won't have access to the documents forever, I am trying
    to get the most complete file at scan time, but I may be doing
    an overkill.

    Should I reduce the sampling to 300 dpi? Or perhaps I should stick
    with 600 dpi but scan in black and white?

    Finally, how do I change a 600dpi TIFF to 300 dpi?
    How do I change a grayscale to B&W? (both with Acrobat)

    My OCR software (ABBYY FineReader) takes the original file that
    I provide and makes a working copy which is the one that actually
    gets OCR'd. The copy that I provide is 32MBytes and the working
    copy is 100 KBytes. They achieve that by (1) converting from
    grayscale to B&W and (2) doing some compression (lossy or non-lossy?
    I don't know).

    Thanks in advance,

    -Ramon F. Herrera
    Ramon F Herrera, Jun 5, 2004
    1. Advertisements

  2. Ramon F Herrera

    arrooke1 Guest

    I have a bunch of TIFF images that were scanned in grayscale
    Scan for line copy (black & white) @ 600 ppi. Adjust your exposure to obtain
    a suitable balance between background noise & image quality. If you have
    some images (fancy colour logo's) on some pages you can scan the image only,
    as greyscale, and place it into your line copy.
    arrooke1, Jun 5, 2004
    1. Advertisements

  3. Ramon F Herrera

    Xalinai Guest

    It depends on your scanning software. Older software needed clean
    black and white scans and a resolution as high as possible. Modern
    software will work better on grayscale scans with a not too big
    dynamic range.
    If you try to clean the images for the scanning software you sometimes
    end up with the software assuming a perfect scan and trying to
    interpret each little lost pixel as some text.
    If you feed the software with the raw scan it corrects contrast by
    itself, has a better guess on the decision between paper structure and
    real text and produces better quality.
    FineReader works even with averagely compressed greyscale JPGs.
    Saves a lot of disk space and scanning time.

    Xalinai, Jun 5, 2004
  4. Ramon F Herrera

    Tacit Guest

    The grayscale leaves a lot of "gray dust"
    Don't use a filter for this. use the Levels command.

    Once you've created a good, crisp image, leave it at 600 pixels per inch and
    turn it into a bitmap; this is usually what OCR software will perform best
    Tacit, Jun 5, 2004
  5. Ramon F Herrera

    Brian Guest

    When I have to scan something that will be run through OCR software I
    scan at 1200ppi, linework (1-bit) mode.
    Brian, Jun 7, 2004
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.