Greetings From! Extracting Address Information From 100,000 Historical Picture Postcards
This paper details the development and validation of computational methods aimed at creating a comprehensive dataset from a vast collection of historical picture postcards.1 By connecting three distinct locations – the sender’s, the recipient’s, and the depicted – the medium of the picture postcard has contributed to the formation of extensive spatial networks of information exchange. So far, the analysis of these spatial networks was hampered by the fact that picture postcards are – literally and 昀椀guratively – hard to read. Using traditional methods, transcribing and analyzing a sizeable number of postcards would take a lifetime. To address this challenge, this paper presents a pipeline that leverages Computer Vision, Handwritten Text Recognition, and Large Language Models to extract and disambiguate address information from a collection of 102K historical postcards sent from Belgium, France, Germany, Luxembourg, the Netherlands, and the UK. We report a mAP of 0.94 for the CV model, a character error rate of 7.62%, and a successful extraction rate of 419 coordinates from an initial sample set of 500 postcards for the LLM. Overall, our pipeline demonstrates a reliable address information extraction rate for a signi昀椀cant proportion of the postcards in our data (with an average distance di昀昀erence between the HTR-determined addresses and the Ground Truth text of 36.95km). Deploying our pipeline on a larger scale, we will be able to reconstruct the spatial networks that the medium of the postcard enabled.
Source (book)
Proceedings of the Computational Humanities Research Conference 2023 Paris, France, December 6-8, 2023 / Šeļa, Artjoms [edit.]; Jannidis, Fotis [edit.]; Romanowska, Iza [edit.]
Paris : Computational Humanities Research Conference , 2023
p. 512-529
Research group
Publication type
Publications with a UAntwerp address
External links
Creation 21.12.2023
Last edited 23.12.2023
To cite this reference