The Union Catalog of Pre-1600 Books Online: How We Do It
This page is part of the pre-1600 Books Online collection.
The main purpose of these pages is to show books relevant to my
re-creation hobby, the SCA, which mainly studies Western Europe
between the fall of the Roman empire and the year 1600. This includes
both books actually printed before 1600, and later books about the earlier era.
Data sources
The first step is to collect as many online books as possible. I do
that by mooching off the efforts of others.
- The Online Books
Page is a large index of English books on the web. It has good
call numbers added to each book.
- The archive.org
text collections contain a large number of scanned books, most of
which have MARC records (with call numbers) associated with them.
- Project Gutenberg has a
catalog with some basic call numbers. While many of these are also
indexed (with better call numbers) by the Online Books Page, there are
(currently) around a thousand 'selected' books which aren't.
I'd like to use Google Books as a data source, but I can't figure out
how to get good call numbers, or how to get easy access to the list
of books in their collection.
Selection of pre-1600 books
Once I have a large list of books and their call numbers, I use some
scripts to try to pick out the ones relevant to the years before
1600. This is done fairly naively; any 4 digit number is considered to
be a date, and a list of keywords ("medieval"), names of people
("tomas becket"), and complete book titles is consulted. Next there is
an 'exceptions list' which gets rid of some of the false
positives. This is a fairly fragile process.
Limitations of this approach
This approach goes wrong in several ways. First, it misses some books,
which is why you can also review the list of unselected books, and the
search function shows results for both selected and unselected books.
Second, it picks a lot of books which are either just after 1600, or
are material from before the fall of the Roman empire which were not
available during the Middle Ages and Renaissance.
Another flaw is that almost all of the books in this collection were
scanned because they were out of copyright. This means that they may
not be very up-to-date in their scholarship.
Future Improvements
At the moment only I can tag books, but eventually I'd like to let
users of the site tag books, both with the "pre-1600" tag, and with
user-defined tags, such as "primary source", "published in the 15th
century", etc.
I would also like to build a tree of subjects, possibly replacing the
current tree of call numbers.
Return to the pre-1600 Books Online collection.
Webbed by Gregory Blount / Greg Lindahl
(lindahl@pbm.com)