ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Extracting Schema from Semistructured Data.

Svetlozar Nestorov, Serge Abiteboul, Rajeev Motwani: Extracting Schema from Semistructured Data. SIGMOD Conference 1998: 295-306
@inproceedings{DBLP:conf/sigmod/NestorovAM98,
  author    = {Svetlozar Nestorov and
               Serge Abiteboul and
               Rajeev Motwani},
  editor    = {Laura M. Haas and
               Ashutosh Tiwary},
  title     = {Extracting Schema from Semistructured Data},
  booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
               on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
  publisher = {ACM Press},
  year      = {1998},
  isbn      = {0-89791-995-5},
  pages     = {295-306},
  ee        = {http://doi.acm.org/10.1145/276304.276331, db/conf/sigmod/NestorovAM98.html},
  crossref  = {DBLP:conf/sigmod/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

Semistructured data is characterized by the lack of any fixed and rigid schema, although typically the data has some implicit structure. While the lack of fixed schema makes extracting semistructured data fairly easy and an attractive goal, presenting and querying such data is greatly impaired. Thus, a critical problem is the discovery of the structure implicit in semistructured data and, subsequently, the recasting of the raw data in terms of this structure. In this paper, we consider a very general form of semistructured data based on labeled, directed graphs. We show that such data can be typed using the greatest fixpoint semantics of monadic datalog programs. We present an algorithm for approximate typing of semistructured data. We establish that the general problem of finding an optimal such typing is NP-hard, but present some heuristics and techniques based on clustering that allow efficient and near-optimal treatment of the problem. We also present some preliminary experimental results.

Copyright © 1998 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ... Online Version (ACM WWW Account required): Full Text in PDF Format

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Laura M. Haas, Ashutosh Tiwary (Eds.): SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA. ACM Press 1998, ISBN 0-89791-995-5 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML, SIGMOD Record 27(2), June 1998
Contents

Online Edition: ACM SIGMOD

[Abstract]
[Full Text (Postscript)]

Long Version

http://www-db.stanford.edu/pub/papers/extract-schema.ps

References

[1]
Serge Abiteboul: Querying Semi-Structured Data. ICDT 1997: 1-18 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[2]
Serge Abiteboul, Richard Hull, Victor Vianu: Foundations of Databases. Addison-Wesley 1995, ISBN 0-201-53771-0
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
Antonio Albano, Roberto Bergamini, Giorgio Ghelli, Renzo Orsini: An Object Data Model with Roles. VLDB 1993: 39-51 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
...
[5]
...
[6]
Peter Buneman: Semistructured Data. PODS 1997: 117-121 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[7]
Peter Buneman, Susan B. Davidson, Mary F. Fernandez, Dan Suciu: Adding Structure to Unstructured Data. ICDT 1997: 336-350 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
Peter Buneman, Susan B. Davidson, Gerd G. Hillebrand, Dan Suciu: A Query Language and Optimization Techniques for Unstructured Data. SIGMOD Conference 1996: 505-516 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[9]
R. G. G. Cattell: The Object Database Standard: ODMG-93 (Release 1.1). Morgan Kaufmann 1994
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[10]
Roy Goldman, Jennifer Widom: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. VLDB 1997: 436-445 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[11]
...
[12]
...
[13]
...
[14]
...
[15]
Svetlozar Nestorov, Jeffrey D. Ullman, Janet L. Wiener, Sudarshan S. Chawathe: Representative Objects: Concise Representations of Semistructured, Hierarchial Data. ICDE 1997: 79-90 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[16]
Dallan Quass, Anand Rajaraman, Yehoshua Sagiv, Jeffrey D. Ullman, Jennifer Widom: Querying Semistructured Heterogeneous Information. DOOD 1995: 319-344 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[17]
Dan Suciu: Query Decomposition and View Maintenance for Query Languages for Unstructured Data. VLDB 1996: 227-238 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[18]
Jeffrey D. Ullman: Principles of Database and Knowledge-Base Systems, Volume I. Computer Science Press 1988, ISBN 0-7167-8158-1
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[18-2]
Jeffrey D. Ullman: Principles of Database and Knowledge-Base Systems, Volume II. Computer Science Press 1989, ISBN 0-7167-8162-X
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[19]
Moshé M. Zloof: Query-by-Example: A Data Base Language. IBM Systems Journal 16(4): 324-343(1977) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Mon Dec 21 21:56:53 2009 by Michael Ley (ley@uni-trier.de)