METS/ALTO Conversion, Knowledge, Technology Solutions & Services
METS/ALTO Conversion
 

The Metadata Encoding and Transmission Standard (METS) is a data encoding and transmission specification in XML format that provides the means to convey the metadata necessary for both the management of digital objects within a repository and the exchange of such objects among repositories or among repositories and their users. This common object format was designed to allow the sharing of efforts to develop information management tools/services and to facilitate the interoperable exchange of digital materials among institutions including vendors. The METS XML schema was created in 2001 under the sponsorship of the Digital Library Federation (DLF), is supported by the Library of Congress as its maintenance agency, and is governed by the METS Editorial Board.

Purpose of METS


  • Maintaining the metadata of the digital objects for the long term
  • Recording the names and locations of the files that comprise those objects
  • Creating XML documents that express the hierarchical structure of, and
  • When a repository of digital objects intends to share metadata about a digital object, or the object itself, with another repository or with a tool meant to render the object, the use of a common data transfer syntax among repositories and tools greatly improves the facility and efficiency with which the transactions can occur. METS was created and designed to provide a relatively easy format for these kinds of activities during the lifecycle of the digital object.

METS




ALTO – An Introduction


ALTO (Analyzed Layout and Text Object) is an XML Schema that details technical metadata for describing the layout information and OCR recognized text of resources, such as pages of a book or a newspaper. It is used as an extension schema to METS (Metadata Encoding and Transmission Standard), where METS provides metadata and structural information while ALTO contains content and physical information.

Alto Features


  • ALTO file contains a style section where different styles are listed. The layout section contains what is on the page.
  • A page is divided into several regions (print space, left margin, right margin, top margin, and bottom margin). For each region all objects are listed that have been detected inside.
  • Measurements in ALTO XML files can be given in 1/10mm or in 1/1200inch. To use the coordinates within the ALTO file with any resolution they need to be transformed into pixels.

Why METS/ALTO Conversion?


The METS and ALTO now have been utilized for a number of years. Libraries, universities, newspaper publishers and newspaper aggregators are familiar with these standards.

METS is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, using XML. Though METS is excellent at describing the structure of a digital object, it is missing the ability to describe the content and layout of each piece of the digital object. So an extension to METS called ALTO (Analyzed Layout and Text Object) is required for this purpose. The combination of METS and ALTO was originally developed by the METAe project, and later was adopted by the Library of Congress for its large-scale National Digital Newspaper Program (NDNP). Since then, METS/ALTO has been used in many newspaper digitization projects – both large and small – as well as a number of projects digitizing books and journals.

A typical METS/ALTO object encodes the complete logical and physical structure of a document (i.e. chapters, sections, articles, pages, etc., and their associated metadata), as well as the full-text content of each section of the document, and even the physical coordinates of every word in the document.
Latest Events
event image
Book Expo America
29th May to 1st June, 2013
To set up a meeting please contact us at: sales@contentratechnologies.com
Follow us on
 
image linkedin Facebook twitter flickr Youtube
Copyright © 2013. All rights reserved Contentra Technologies.