16 August 2010

How to Scan a Lot of Books

Don't bother reading this post if you are opposed to violence towards books.

Books are hard to move for mobile families.  I hadn't really bought many books for the last 5-10 years before scanning everything because we moved so often and nearly always had an excellent public library, but we still had over 1000 books.  They'd been moved and stored, and, every once in a while, all of them were out on shelves.  It was impossible to take them overseas, impractical to store them again, and painful to part with them.

So we scanned them.  It was a huge project, but we ended up with 600 scanned ebooks (we gave the rest away, or sold a few).  I don't regret one second spent on that project.  It was the best preparation I have ever done for an overseas move.

Most people are rather horrified when they learn that I cut up so many books and then recycled them all.  But there have been a few people, usually those who've moved overseas without help, who get it.  Ebooks are a lifesaver.

Anyway. It took some time to figure out how to get the scanning done quickly.  We started with a flatbed scanner and an old computer and it took forever to get through one book, so we bought this scanner.

Fujitsu ScanSnap S1500 Instant PDF Sheet-Fed Scanner for PC
I never thought a scanner could make me so happy. It folds up into a nice little package and all that, but it works amazingly quickly.  It's not a flatbed scanner so (here's the catch for a lot of people) I have to cut the binding off all my books.  I was curious how long it took so I timed it and took pictures.  Here's the process:

10:23 AM Begin cutting. There are lots of ways to do this. I don't recommend using a saw because it creates a lot of dust that can jam up your scanner. You can take your books to an office supplies store and they'll cut off the bindings for about $1, but that adds up quickly when you need hundreds of bindings cut off. You can buy a guillotine cutter like the office stores use and do it yourself.

I decided to use my fabric supplies because they were available, cheap, and the mat wasn't going to get stored anyway (it's gotten pretty beat up in the process). First, I recommend cutting the book into sections with a box cutter or exacto knife. It's much easier to work with smaller sections, the cutting is neater, and it ends up not taking any longer. 50-75 pages is good.

Then use a rotary cutter to neatly slice off the binding.  Make sure you don't leave any glue bits on the page, but try to take off as little as possible.  As I said before, it's much easier to do this if you're not working with a huge chunk of the book.

Finally, you need to check the pages to make sure none are still stuck together.  Flipping carefully through the pages helps you notice this.  It's worth taking the time to check the pages well. The entire cutting process took 6 minutes for a 270-page paperback. 

10:29 AM Begin scanning. It took me 10 minutes to scan the 270-page book, but it would have taken our faster computer about 7 minutes. 

You can stop here because the scanner creates a pdf document. However, it doesn't create a searchable document that can have its font adjusted on an ereader. If that what you want, then you'll need to open up Adobe Acrobat and run the book through OCR. That took my (slow) computer 19 minutes.

Obviously, all these steps can be shortened a lot by combining them, and the longest part is the OCR which doesn't require any effort at all on my part (I'd usually tell it to OCR a lot of books overnight). The process can take a little longer when you're cutting up a book with pages wider than 8.5 inches. You'll need to figure out where to trim more paper off.

One more thing- we got a refurbished scanner.  It includes Adobe Acrobat standard, which we needed to buy anyway, so we were very happy with the price for the scanner.  If you already have Adobe and buy the scanner new, it'll be more than $400.

Totally happy.


  1. You are very convincing! (I've seen that Silk Road book before.)

  2. How big are the PDF files at the end?
    I'll admit I had never heard of OCR before but I am guessing it is the format used in ereaders. Do you have an ereader? Do you like it?

  3. The size of the files depends mostly on how many pictures the book has. The Silk Road book had some pictures, so it ended up at about 17 MB. A book that size without any pictures or images is about 4-5 MB, so it's a larger size than what you'd buy (if the book I'm scanning were actually available as ebooks). The file size is much smaller after you OCR it.

    OCR is optical character recognition. If you just scan each page as a single image or PDF file, an ereader with a small screen can't increase the font size of the words because the letters aren't recognized as words. If you have a reader like a Kindle DX or even an iPad, it's not such a problem.

    All the books I've done are pdf files. I prefer epub, but I can't do that right now.

    And yes, I have a Sony Reader that I love. If we didn't move much, none of this would be worth it, but digitizing our library makes sense.

  4. I think about two minutes after we built our new bookshelves they started to look totally anachronistic to me. I'm reading everything as etext these days. I prefer to buy etext.

    I think your scanner method is ingenious. Really, the point is to have something to read, to have access to one's library. So often the people I meet who are attached to tangible books (like my dh) are people who don't read that much.

  5. Oh my gosh, I still can't believe you did this. But I just recommended your post again to someone.