[an error occurred while processing this directive] An error occured whilst processing this directive

LFCS Seminar


Programming Language Ideas Escape the Lab: A Declarative Data Description Language for Managing Ad Hoc Data

Kathleen Fisher

AT & T Labs Research, USA

4pm Thursday, 14th January, 2009
Room 4.31/33, Informatics Forum


Abstract

XML. HTML. CSV. JPEG. MPEG. These data formats represent vast quantities of scientific, governmental, industrial, and private data. Because the formats have been standardized and are widely used, many reliable, efficient, and convenient tools exist for processing such data. In an ideal world, all data would be in such formats. In reality, however, we are not so fortunate.  Instead, vast amounts of data exist in ad hoc formats, which forces domain-experts to waste valuable time on low-level parsing tasks.

In this talk, I will describe the PADS data description language, which addresses this problem.  PADS allows users to describe the physical layout of ad hoc data sources and semantic properties of that data.  The descriptions are concise enough to serve as ``living'' documentation while flexible enough to describe most of the formats that we have seen in practice.  In addition, we have developed a multi-phase machine-learning algorithm that can automatically infer a PADS description from sample data.  Given a PADS description, the PADS compiler generates libraries and tools for manipulating the associated data, including parsing routines, statistical profiling tools, translation programs to produce well-behaved formats such as XML, and tools for running queries over raw PADS data sources. As I describe the PADS and its associated tools, I will highlight how various ideas from the programming language research community have informed the design and implementation of the PADS system.

Information about PADS and a list of all the people who have contributed to the project is available from the project web site: www.padsproj.org.


An error occured whilst processing this directive