ACM SIGMOD Anthology TODS dblp.uni-trier.de

Duplicate Record Elimination in Large Data Files.

Dina Bitton, David J. DeWitt: Duplicate Record Elimination in Large Data Files. ACM Trans. Database Syst. 8(2): 255-265(1983)
@article{DBLP:journals/tods/BittonD83,
  author    = {Dina Bitton and
               David J. DeWitt},
  title     = {Duplicate Record Elimination in Large Data Files},
  journal   = {ACM Trans. Database Syst.},
  volume    = {8},
  number    = {2},
  year      = {1983},
  pages     = {255-265},
  ee        = {http://doi.acm.org/10.1145/319983.319987, db/journals/tods/BittonD83.html},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

The issue of duplicate elimination for large data files in which many occurrences of the same record may appear is addressed. A comprehensive cost analysis of the duplicate elimination operation is presented. This analysis is based on a combinatorial model developed for estimating the size of intermediate runs produced by a modified merge-sort procedure. The performance of this modified merge-sort procedure is demonstrated to be significantly superior to the standard duplicate elimination technique of sorting followed by a sequential pass to locate duplicate records. The results can also be used to provide critical input to a query optimizer in a relational database system.

Copyright © 1983 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


Joint ACM SIGMOD / IEEE Computer Society Anthology

CDROM Version: Load the CDROM "Volume 3 Issue 1, TODS 1976-1990" and ... DVD Version: Load ACM SIGMOD Anthology DVD 2" and ... BibTeX

References

[1]
Morton M. Astrahan, Mike W. Blasgen, Donald D. Chamberlin, Kapali P. Eswaran, Jim Gray, Patricia P. Griffiths, W. Frank King III, Raymond A. Lorie, Paul R. McJones, James W. Mehl, Gianfranco R. Putzolu, Irving L. Traiger, Bradford W. Wade, Vera Watson: System R: Relational Approach to Database Management. ACM Trans. Database Syst. 1(2): 97-137(1976) BibTeX
[2]
Edward Babb: Implementing a Relational Database by Means of Specialized Hardware. ACM Trans. Database Syst. 4(1): 1-29(1979) BibTeX
[3]
Donald E. Knuth: The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley 1973, ISBN 0-201-03803-X
BibTeX
[4]
J. Ian Munro, Philip M. Spira: Sorting and Searching in Multisets. SIAM J. Comput. 5(1): 1-8(1976) BibTeX
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
TODS, ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Tue Jun 24 20:11:46 2008