One of the techniques I use heavily is the photoshop "levels" command. Even if a document that is scanned in is on white paper, there will be background noise that shows up. Levels allows one to take much of that out. This came up when I was talking with a friend and commented that I wished the Historic Naval Ship Association would do it on some of their documents. While it can add a few seconds to each document, I do believe the results are worth it, if you're aiming for a document that prints well or is to be reproduced in a book or magazine.
The below images show the technique I came up with farting around on my own; I don't profess to be a master, and I'm using a ten year-old version of Adobe Photoshop, so your screen may look a little different if you're trying this for the first time on a newer version, but the principles are the same.
In photoshop, go to "image" --> "Adjust" and choose "levels." You'll wind up with something that looks like the below image:
Now, take the black point slider on the left and move it towards the center, to make the darks darker, and then the white point slider on the right and move it to the center, to make the whites whiter. You are essentially adjusting the contrast of the image. Each image will have a different histogram, so there is no set value in the input level boxes up top that you can memorize and set. What I've found works best for me is to move the white point slider either to the center of the hump on the right of the histogram, or a bit beyond it towards center; this will take out a lot of the background noise, but also fades the black text and lines out a bit. So we then compensate by taking the black point slider in to where there is a a little bit of histogram showing, which will darken our lines back up. Such as you see below:
So, now I invite you to compare the results below with the original, both on screen and with a print out.
This is also a technique I've found that can be used to increase OCR accuracy when converting document scans to HTML. It essentially filters out most of the noise that can confuse OCR programs but I only use it in selected sheets as it does add a minute or two to each page.
Thursday, May 27, 2010
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment