"Creating Accessible PDFs" Session Summary

Thursday December 8th, representatives from the Library, IT, and the LEADS offices presented a session about creating accessible PDF documents. Topics covered scanning tips and working with Adobe Acrobat Pro. This was followed by a question and answer period.

Making PDFs accessible is not only important for persons who may need accommodation, but for everyone. A PDF with rendered text is searchable, may be read aloud, and may be marked up or annotated. This makes the document more inclusive and a better tool for teaching and learning, enabling students to engage with the text in different modes.

OCR - What Is It?
OCR stands for Optical Character Recognition. This is the process in which a software program identifies where it thinks words are by drawing rectangles around them, compares the light and dark pixels to what it knows are particular letters, and finally provides the output of the computer’s guess as to what the text is. In the image below, the boxes are drawn in green and red. Click the image to see a larger version.

Example of optical character recognition.

DPI Explained, and Why it Matters

DPI, or dots per inch is the resolution of the document. The more dots in a given area the higher the resolution. Resolution is an important factor in the ability of software to be able to recognize text. If the resolution is not high enough, the software will not be able to accurately render the text. The recommended minimum resolution when scanning for OCR is 300 dpi.

Document Quality

The most important consideration for making a document accessible is the overall quality of the document. By quality we mostly mean resolution or DPI. This greatly affects how easily the document can be made accessible. The presence of additional drawings, underlining, highlighting, or handwriting within the document can make attempting OCR a difficult and frustrating experience. The age of the digital document also plays a role in how well the document may be OCRed.

If the document you are working with is an old scan, it may be recommended to re-scan depending on the results of the OCR. The quality of a document also degrades each time it is copied or scanned. While the document may have sufficient quality to be read, it may not have sufficient quality to be OCRed. If the results from the OCR are not accurate, it may be recommended to re-scan the original. Below is an example of how the quality of a document may affect the quality of the OCR.

In a session about universal design for learning (UDL), Kristen Dabney shared the following images* as examples of the effect that resolution and document age can have on the ability for the software to render the text. The first image shows an older, low DPI document. The second image is newer and has higher DPI. Click the thumbnail images to see full-size images.

An image showing a side-by-side comparison of an older, low resolution PDF  and the resulting rendered text.
OCR Example 1

This image shows a side-by-side comparison of a newer, higher resolution scan and the resulting rendered text.
OCR Example 2

Scanning Setting Recommendations for OCR

In order to ensure that the quality of the resulting scan will be as high as possible to support OCR there are three settings we need to check and/or change. They are:
  1. Color - This should be set to Grayscale. The grayscale setting helps to gather as much detail as possible from the original without making the file size too large.
  2. Resolution (DPI) - This should be set to 300 dpi to best support OCR. 
  3. Format - The file format should be PDF.
The links below to specific devices cover how to set each of the above settings for the particular device.

Editing a PDF with Adobe Acrobat

In order to make a PDF accessible requires that one edit that PDF. Adobe Acrobat Pro is the recommended tool on Beloit college's campus. The college has a site license for this software. If you do not have it and would like it installed, please submit a request through SchoolDude, and IT will be happy to install it for you.

The instructions below are for Adobe Acrobat Pro DC. The steps are listed in the order they are to minimize the processing time for OCR. OCRing a document is a fairly resource intensive task. it is recommended that you be plugged in to power, not running on battery power. It is also advisable to close other programs that use a larger amount of RAM (e.g. Google Chrome). It is also advisable to save your document each time you make a change.

Crop Pages

Cropping pages may allow you to crop out handwriting, extra pages that you don't need, or shadows from edges of pages. This is helpful to do before running OCR to minimize the amount of area

To Crop a page:
1. Select Tools > Edit PDF.
2. Click the Crop Pages button.
3. Click and drag a selection box around the portion you want to keep.
4. Double-click inside the selection box.
5. In the Set Page Boxes dialog box confirm the crop. You may also choose to which pages you want the crop to apply.
6. Click OK.

Remove blanks

Removing blank helps reduce file size and increases the readability of a document. Removing blank pages is important for anyone accessing the PDF with a screen reader as blank pages are understood to be the end of the document.

To remove page(s):
1. Select Tools > Organize pages.
2. Click to select the pages you want to remove. There are options to select varying groups of pages if necessary.
3. Click the Delete icon (trash can).

OCR (Optical Character Recognition)

OCR looks at the document image and attempts to render what it identifies as text.

To run OCR on a document:
1. Select Tools > Enhance Scans > Recognize Text > In this File.
2. Click the Recognize Text button to scan document. This may take some time depending on how many pages there are in the document.

Once the OCR has completed it is recommended to check to see how well the OCR process identified the text. A simple way to do this is to copy text from the PDF into a blank Microsoft Word document.

When you run OCR, anything that Acrobat finds that it is not sure about is called a "Suspect." Suspects are outlined in red and will ask for verification of particularly questionable words. If there aren’t many in the document, it may be worth editing. If the entire page is filled with them, re-scanning will be a better use of your time. For more about this check out the following link: https://helpx.adobe.com/acrobat/using/scan-documents-pdf.html

Optimize PDF

Optimizing a PDF reduces file size. This is the last thing you should do before using your file. You may read more about it here: https://helpx.adobe.com/acrobat/using/optimizing-pdfs-acrobat-pro.html

To optimize a PDF:
1. Select File > Save as Other... > Optimized PDF... .
2. The defaults in the PDF Optimizer dialog box are generally good for our purposes. Click OK.
3. Choose a save location and click Save.


Working with PDFs scanned from books (These are documents in which there are two book pages on a single page in the PDF.)

    Scanning Locations and Devices

    There are a number of locations to scan on campus. The Library has a number of scanners located in the North Riser area. Instructions for using the two models you will find there are linked below. All of the computers in the Library have Adobe Acrobat Pro installed.

    Multi-function Devices (MFDs)

    There are a number of other locations at which you may scan documents. Below is a list of the multi-function devices at which documents may be scanned to PDF and sent by email. The links navigate to instructions for each specific device.
    *images have been used with permission.

    If you would like a downloadable, printable PDF version of this post, click here.

