Dating Spanish chapbooks: the wonders of artificial intelligence

Cambridge University Library was recently awarded a Cambridge Humanities Research Grant to continue work on the Spanish chapbooks catalogued and digitised under the “Wrongdoing in Spain, 1800-1936” project, as featured in the Cambridge Digital Library. This new year-long project aims to reliably date about 67% of chapbooks bearing estimated dates, often drawn from the printer’s period of activity. To establish more accurate dates of printing for these items, we aim to conduct visual search on woodcut illustrations within the chapbooks to compare prints made from the same woodblocks.

Printing houses used woodblocks (as well as metal stereotype plates in the nineteenth century) to illustrate the chapbooks. Woodblocks were expensive to produce, so printers often had a limited stock that they reused, sometimes through several generations of printers. Earlier woodblocks were crudely made on softwood, but the technique developed to produce much more detailed woodblocks etched with metal-engraving tools on harder wood. More intricate images are typical of the later period, although many older woodcuts continued to be used in later years to cut costs. It comes as no surprise then that wood blocks deteriorated over time, becoming less sharp, developing cracks. We see how, after many printings, the finest lines began to fade, and it is this wear-and-tear that we are hoping to use to our advantage to date the Cambridge Digital Library Spanish chapbooks more accurately.

During the first phase of the project (October 2021-to date) images of the chapbooks were run through a machine learning model created by Oxford University’s Visual Geometry Group. The model was pre-trained on similar Scottish chapbooks from the National Library of Scotland. This process recognized the woodcut images and created annotations to mark them using bounding boxes, but the result was not perfect. Manual input was needed to ensure that the gathering of images suited the parameters of the project. Our aim was to isolate individual woodblock prints (i.e., woodcuts made from a single woodblock). The software missed the fact that some images consisting of two or three separate woodblocks had been combined to make an individual image. It also missed borders and garlands and made “false detections”, so manual input was essential not just to serve our purposes for the project, but also to train the machine learning model to make more accurate predictions in the future.

On the next phase of the project, all the images and annotations, alongside metadata from Cambridge Digital Library, will be imported into an instance of VISE (Virtual Geometry Group Image Search Engine). VISE will allow us to visually search many images (we annotated a total of 18,757 images out of 26,527 scanned images of chapbooks). By using an image or a metadata field as a search query, we are hoping to use machine learning and computer vision to explore relationships between the illustrations and not only narrow down the publication dates of the chapbooks, but also open up fields for research in printing and social history.

Sonia Morcillo García

Leave a comment