E-Book, Englisch, Band 35, 494 Seiten
Reihe: Advances in Database Systems
Aggarwal Managing and Mining Uncertain Data
1. Auflage 2010
ISBN: 978-0-387-09690-2
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, Band 35, 494 Seiten
Reihe: Advances in Database Systems
ISBN: 978-0-387-09690-2
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
Managing and Mining Uncertain Data, a survey with chapters by a variety of well known researchers in the data mining field, presents the most recent models, algorithms, and applications in the uncertain data mining field in a structured and concise way. This book is organized to make it more accessible to applications-driven practitioners for solving real problems. Also, given the lack of structurally organized information on this topic, Managing and Mining Uncertain Data provides insights which are not easily accessible elsewhere. Managing and Mining Uncertain Data is designed for a professional audience composed of researchers and practitioners in industry. This book is also suitable as a reference book for advanced-level students in computer science and engineering, as well as the ACM, IEEE, SIAM, INFORMS and AAAI Society groups.
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;6
2;Contents;8
3;List of Figures;16
4;List of Tables;21
5;Chapter 1 AN INTRODUCTION TO UNCERTAIN DATA ALGORITHMS AND APPLICATIONS;22
5.1;1. Introduction;22
5.2;2. Algorithms for Uncertain Data;24
5.3;3. Conclusions;27
5.4;Acknowledgements;28
5.5;References;28
6;Chapter 2 MODELS FOR INCOMPLETE AND PROBABILISTIC INFORMATION;30
6.1;1. Introduction;30
6.2;2. Incomplete Information and Representation Systems;34
6.3;3. RA-Completeness and Finite Completeness;35
6.4;4. Closure Under Relational Operations;39
6.5;5. Algebraic Completion;40
6.6;6. Probabilistic Databases and Representation Systems;42
6.7;7. Probabilistic ?-Tables and Probabilistic Or-Set Tables;43
6.8;8. Probabilistic c-tables;45
6.9;9. Queries on Annotated Relations;46
6.10;10. K-Relations;48
6.11;11. Polynomials for Provenance;51
6.12;12. Query Containment;54
6.13;13. RelatedWork;55
6.14;14. Conclusion and Further Work;55
6.15;Acknowledgments;56
6.16;References;57
6.17;15. Appendix;61
7;Chapter 3 RELATIONAL MODELS AND ALGEBRA FOR UNCERTAIN DATA;64
7.1;1. Introduction;64
7.2;2. Different Probabilistic Data Models;67
7.2.1;2.1 Point-Valued Probability Measures Assigned to Each Tuple;67
7.2.2;2.2 Point-Valued Probability Measures Assigned to Attributes and Attribute Sets;71
7.2.3;2.3 Interval-Valued Probability Measures Assigned to Attribute-Values;73
7.2.4;2.4 Interval-valued Probability Measures Assigned to Tuples;75
7.3;3. Probabilistic Relational Algebra;75
7.3.1;3.1 Basic Definitions;75
7.3.2;3.2 Primary and Foreign Keys;77
7.3.3;3.3 Relational Operations;78
7.3.4;3.4 Relational Algebra;81
7.3.5;3.5 Incomplete Distribution and Null Values;82
7.4;4. Algebraic Implications of the Different Representations and Associated Assumptions;86
7.4.1;4.1 Models Assigning Point-Valued Probability Measures at the Tuple Level;87
7.4.2;4.2 Models Assigning Point-Valued Probability Measures at the Attribute Level;88
7.4.3;4.3 Models Assigning Interval-Valued Probability Measures at the Attribute Level;89
7.4.4;4.4 Models Assigning Interval-Valued Probability Measures to Tuples;90
7.4.5;4.5 Some Observations on the Independence Assumption Across Tuples;91
7.5;5. Concluding Remarks;91
7.6;References;93
8;Chapter 4 GRAPHICAL MODELS FOR UNCERTAIN DATA;95
8.1;1. Introduction;96
8.2;2. Graphical Models: Overview;98
8.2.1;2.1 Directed Graphical Models: Bayesian Networks;100
8.2.2;2.2 Undirected Graphical Models: Markov Networks;102
8.2.3;2.3 Inference Queries;103
8.3;3. Representing Uncertainty using Graphical Models;105
8.3.1;3.1 PossibleWorld Semantics;106
8.3.2;3.2 Shared Factors;109
8.3.3;3.3 Representing Probabilistic Relations;109
8.4;4. Query Evaluation over Uncertain Data;110
8.4.1;4.1 Example;112
8.4.2;4.2 Generating Factors during Query Evaluation;114
8.4.3;4.3 Query Evaluation as Inference;117
8.4.4;4.4 Optimizations;118
8.5;5. RelatedWork and Discussion;119
8.5.1;5.1 Safe Plans;119
8.5.2;5.2 Representing Uncertainty using Lineage;120
8.5.3;5.3 Probabilistic Relational Models;120
8.5.4;5.4 Lifted Inference;122
8.5.5;5.5 Scalable Inference using a Relational Database;122
8.6;6. Conclusions;123
8.7;Acknowledgments;124
8.8;References;125
9;Chapter 5 TRIO:ASYSTEMFORDATA,UNCERTAINTY,AND LINEAGE;131
9.1;Introduction;131
9.2;1. ULDBs: Uncertainty-Lineage Databases;134
9.2.1;1.1 Alternatives;135
9.2.2;1.2 ‘?’ (Maybe) Annotations;135
9.2.3;1.3 Confidences;136
9.2.4;1.4 Lineage;137
9.2.5;1.5 Relational Queries;139
9.3;2. TriQL: The Trio Query Language;140
9.3.1;2.1 Operational Semantics;140
9.3.2;2.2 Querying Confidences;142
9.3.3;2.3 Querying Lineage;142
9.3.4;2.4 Duplicate Elimination;143
9.3.5;2.5 Aggregation;144
9.3.6;2.6 Reorganizing Alternatives;145
9.3.7;2.7 Horizontal Subqueries;146
9.3.8;2.8 Query-Defined Result Confidences;146
9.3.9;2.9 Other TriQL Query Constructs;147
9.4;3. Data Modifications in Trio;148
9.4.1;3.1 Inserts;148
9.4.2;3.2 Deletes;149
9.4.3;3.3 Updates;149
9.4.4;3.4 Data Modifications and Versioning;151
9.5;4. Confidence Computation;151
9.6;5. Additional Trio Features;153
9.7;6. The Trio System;154
9.7.1;6.1 Encoding ULDB Data;156
9.7.2;6.2 Basic Query Translation Scheme;158
9.7.3;6.3 Duplicate Elimination;159
9.7.4;6.4 Aggregation;159
9.7.5;6.5 Reorganizing Alternatives;160
9.7.6;6.6 Horizontal Subqueries;160
9.7.7;6.7 Built-In Predicates and Functions;161
9.7.8;6.8 Query-Defined Result Confidences;162
9.7.9;6.9 Remaining Constructs;162
9.8;References;164
10;Chapter 6 MAYBMS: A SYSTEM FOR MANAGING LARGE UNCERTAIN AND PROBABILISTIC DATABASES;166
10.1;1. Introduction;166
10.2;2. Probabilistic Databases;168
10.3;3. Query Language Desiderata;169
10.4;4. The Algebra;170
10.5;5. Representing Probabilistic Data;176
10.6;6. Conceptual Query Evaluation, Rewritings, and Asymptotic Efficiency;181
10.7;7. The MayBMS Query and Update Language;187
10.8;8. The MayBMS System;192
10.9;9. Conclusions and Outlook;195
10.10;References;197
11;Chapter 7 UNCERTAINTY IN DATA INTEGRATION;200
11.1;1. Introduction;200
11.2;2. Overview of the System;202
11.2.1;2.1 Uncertainty in data integration;202
11.2.2;2.2 System architecture;203
11.2.3;2.3 Source of probabilities;204
11.2.4;2.4 Outline of the chapter;205
11.3;3. Uncertainty in Mappings;205
11.3.1;3.1 Motivating probabilistic mappings;205
11.3.2;3.2 Definition and Semantics;207
11.3.3;3.3 Query Answering;212
11.3.4;3.4 Creating P-mappings;216
11.3.5;3.5 Broader Classes of Mappings;219
11.3.6;3.6 Other Types of Approximate Schema Mappings;221
11.4;4. Uncertainty in Mediated Schema;222
11.4.1;4.1 P-Med-Schema Motivating Example;222
11.4.2;4.2 Probabilistic Mediated Schema;225
11.4.3;4.3 P-med-schema Creation;227
11.4.4;4.4 Consolidation;229
11.4.5;4.5 Other approaches;231
11.5;5. Future Directions;232
11.6;References;233
12;Chapter 8 SKETCHING AGGREGATES OVER PROBABILISTIC STREAMS;236
12.1;1. Introduction;236
12.1.1;1.1 Aggregates over probabilistic streams;238
12.1.2;1.2 Organization;239
12.2;2. The Probabilistic Stream Model;239
12.2.1;2.1 Problem definitions;241
12.2.2;2.2 Frequency Moments and Quantiles;242
12.3;3. Overview of techniques and summary of results;244
12.4;4. Universal Sampling;248
12.5;5. Frequency moments: DISTINCT and REPEAT-RATE;249
12.5.1;5.1 DISTINCT;249
12.5.2;5.2 REPEAT-RATE;251
12.6;6. Heavy-Hitters, Quantiles, and MEDIAN;253
12.7;7. A Binning Technique for MIN and MAX;254
12.8;8. Estimating AVG using generating functions;257
12.8.1;8.1 Generating functions;257
12.8.2;8.2 Estimating AVG;258
12.8.3;8.3 Approximating AVG by SUM/COUNT;261
12.9;9. Discussion;265
12.10;References;266
13;Chapter 9 PROBABILISTIC JOIN QUERIES IN UNCERTAIN DATABASES;269
13.1;1. Introduction;269
13.2;2. Traditional Join Approaches;270
13.2.1;2.1 Simple Nested-Loop Join;271
13.2.2;2.2 Nested-Block-Loop Join;272
13.2.3;2.3 Sort-Merge-Loop Join;272
13.2.4;2.4 Other Join Methods;273
13.2.5;2.5 Spatial Join Algorithms;273
13.2.6;2.6 Spatial Join using a spatial index structure for both relations;274
13.2.7;2.7 Spatial Join using a spatial index structure on one relation;274
13.2.8;2.8 Spatial Join using no Spatial-Index Structure;275
13.3;3. Uncertainty Models and Join Predicates;275
13.3.1;3.1 The Continuous Uncertainty Model;276
13.3.2;3.2 The Discrete Uncertainty Model;278
13.3.3;3.3 Join Predicates and Score;283
13.3.4;3.4 Probabilistic Join Query Types;284
13.3.5;3.5 Example;285
13.3.6;3.6 Overview of UncertaintyModels and Probabilistic Join Queries;286
13.4;4. Approaches for Efficient Join Processing on Uncertain Data;289
13.4.1;4.1 Confidence-Based Join Methods;289
13.4.2;4.2 Probabilistic Similarity Joins;292
13.4.3;4.3 Probabilistic Spatial Join;300
13.5;5. Summary;301
13.6;References;304
14;Chapter 10 INDEXING UNCERTAIN DATA;310
14.1;1. Introduction;311
14.2;2. Data Models and Query Semantics;313
14.2.1;2.1 Uncertain Attribute types;314
14.3;3. Uncertainty Index for Continuous Domains;314
14.3.1;3.1 Probability Threshold Indexing;315
14.3.2;3.2 Special Case: Uniform PDFS;318
14.3.3;3.3 2D mapping of intervals;318
14.3.4;3.4 Join Queries;320
14.3.5;3.5 Multi-dimensional Indexing;322
14.4;4. Uncertainty Index for discrete domains;322
14.4.1;4.1 Data Model and Problem Definition;323
14.4.2;4.2 Probabilistic Inverted Index;324
14.4.3;4.3 Probabilistic Distribution R-tree;327
14.5;5. Indexing for Nearest Neighbor Queries;329
14.6;References;333
15;Chapter 11 RANGEANDNEARESTNEIGHBORQUERIESON UNCERTAIN SPATIOTEMPORAL DATA;336
15.1;1. Introduction;336
15.2;2. Range Search;340
15.2.1;2.1 Query Definitions;340
15.2.2;2.2 Filter and Refinement;342
15.2.3;2.3 Nonfuzzy Range Search;343
15.2.4;2.4 Fuzzy Range Search;345
15.2.5;2.5 Indexing;348
15.3;3. Nearest Neighbor Retrieval;349
15.3.1;3.1 Query Definition;349
15.3.2;3.2 Query Processing;350
15.3.3;3.3 Variations of Nearest Neighbor Retrieval;352
15.3.4;4. Summary;356
15.4;References;357
16;Chapter 12 PROBABILISTIC XML;360
16.1;1. Introduction;360
16.2;2. Sources of Uncertainty in XML Data;361
16.3;3. Modeling Uncertainty using Tags;362
16.4;4. Modeling Uncertainty using Semi-structured Data;365
16.5;5. Modeling Uncertainty in XML with Independent or Mutually Exclusive Distribution in Point Probability;367
16.6;6. Formal Model of Probabilistic Semi-structured Data with Arbitrary Probabilistic Distributions;370
16.6.1;6.1 Motivating Examples;372
16.6.2;6.2 Probabilistic Semi-structured Data Model;374
16.6.3;6.3 Semantics;379
16.6.4;6.4 PXML Algebra and Comparison with PreviousWork;382
16.6.5;6.5 Probabilistic Aggregate Operations;384
16.6.6;6.6 Modeling Interval Uncertainty in Semi-structured Data;386
16.7;7. Summary;389
16.8;Acknowledgements;390
16.9;References;391
17;Chapter 13 ON CLUSTERING ALGORITHMS FOR UNCERTAIN DATA;394
17.1;1. Introduction;394
17.2;2. Density Based Clustering Algorithms;396
17.3;3. The UK-means and CK-means Algorithms;399
17.4;4. UMicro: Streaming Algorithms for Clustering Uncertain Data;400
17.4.1;4.1 The UMicro Algorithm: Overview;402
17.4.2;4.2 Computing Expected Similarity;403
17.4.3;4.3 Computing the Uncertain Boundary;407
17.4.4;4.4 Further Enhancements;407
17.5;5. Approximation Algorithms for Clustering UncertainData;408
17.6;6. Conclusions and Summary;408
17.7;Acknowledgements;409
17.8;References;409
18;Chapter 14 ON APPLICATIONS OF DENSITY TRANS FORMSFOR UNCERTAIN DATA MINING;412
18.1;1. Introduction;412
18.2;2. Kernel Density Estimation with Errors;414
18.2.1;2.1 Scalability for Large Data Sets;417
18.3;3. Leveraging Density Estimation for Classification;421
18.4;4. Application of Density Based Approach to Outlier Detection;424
18.4.1;4.1 Outlier Detection Approach;425
18.4.2;4.2 Subspace Exploration for Outlier Detection;426
18.5;5. Conclusions;428
18.6;Acknowledgements;428
18.7;References;429
19;Chapter 15 FREQUENT PATTERN MINING ALGORITHMS WITH UNCERTAIN DATA;431
19.1;1. Introduction;432
19.2;2. Frequent Pattern Mining of Uncertain Data Sets;433
19.3;3. Apriori-style Algorithms;434
19.3.1;3.1 Pruning Methods for Apriori-Style Algorithms;435
19.4;4. Set-Enumeration Methods;438
19.5;5. Pattern Growth based Mining Algorithms;438
19.5.1;5.1 Extending the H-mine algorithm;439
19.5.2;5.2 Extending the FP-growth Algorithm;440
19.5.3;5.3 Another Variation of the FP-growth Algorithm;450
19.6;6. A Comparative Study on Challenging Cases;450
19.6.1;6.1 Performance Comparison;454
19.6.2;6.2 Scalability Comparison;456
19.7;7. Generalization to the PossibleWorlds Model;458
19.8;8. Discussion and Conclusions;459
19.9;Acknowledgements;460
19.10;References;461
20;Chapter 16 PROBABILISTIC QUERYING AND MINING OF BIOLOGICAL IMAGES;464
20.1;1. Introduction;464
20.1.1;1.1 An Illustrative Example;465
20.2;2. RelatedWork;468
20.3;3. Probabilistic Image Analyses;469
20.3.1;3.1 Probabilistic Segmentation;469
20.3.2;3.2 Measuring Neurite Thickness;472
20.3.3;3.3 Ganglion Cell Features;473
20.4;4. Querying Probabilistic Image Data;474
20.4.1;4.1 Range Queries on Uncertain Data;475
20.4.2;4.2 k-NN Queries on Uncertain Data;475
20.4.3;4.3 Adaptive, Piecewise-Linear Approximations;477
20.4.4;4.4 Indexing the APLA;477
20.4.5;4.5 Experimental Results;478
20.5;5. Mining Probabilistic Image Data;479
20.5.1;5.1 Defining Probabilistic Spatial Join (PSJ);480
20.5.2;5.2 Threshold PSJ Query;481
20.5.3;5.3 Top-k PSJ Query;482
20.5.4;5.4 Experimental Results;483
20.6;6. Conclusion;484
20.7;Acknowledgements;485
20.8;References;486
21;Index;492




