E-Book, Englisch, Band 40, 600 Seiten
Reihe: Advances in Database Systems
Aggarwal / Wang Managing and Mining Graph Data
1. Auflage 2010
ISBN: 978-1-4419-6045-0
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, Band 40, 600 Seiten
Reihe: Advances in Database Systems
ISBN: 978-1-4419-6045-0
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
Managing and Mining Graph Data is a comprehensive survey book in graph management and mining. It contains extensive surveys on a variety of important graph topics such as graph languages, indexing, clustering, data generation, pattern mining, classification, keyword search, pattern matching, and privacy. It also studies a number of domain-specific scenarios such as stream mining, web graphs, social networks, chemical and biological data. The chapters are written by well known researchers in the field, and provide a broad perspective of the area. This is the first comprehensive survey book in the emerging topic of graph data processing.
Managing and Mining Graph Data is designed for a varied audience composed of professors, researchers and practitioners in industry. This volume is also suitable as a reference book for advanced-level database students in computer science and engineering.
Autoren/Hrsg.
Weitere Infos & Material
1;Contents;6
2;List of Figures;15
3;List of Tables;21
4;Preface;22
5;AN INTRODUCTION TO GRAPH DATA;24
5.1;1. Introduction;24
5.2;2. Graph Management and Mining Applications;26
5.3;3. Summary;31
5.4;References;32
6;GRAPH DATA MANAGEMENT AND MINING: A SURVEYOF ALGORITHMS AND APPLICATIONS;35
6.1;1. Introduction;35
6.2;2. Graph Data Management Algorithms;38
6.2.1;2.1 Indexing and Query Processing Techniques;38
6.2.2;2.2 Reachability Queries;41
6.2.3;2.3 Graph Matching;43
6.2.4;2.4 Keyword Search;46
6.2.5;2.5 Synopsis Construction of Massive Graphs;49
6.3;3. Graph Mining Algorithms;51
6.3.1;3.1 Pattern Mining in Graphs;51
6.3.2;3.2 Clustering Algorithms for Graph Data;54
6.3.3;3.3 Classification Algorithms for Graph Data;59
6.3.4;3.4 The Dynamics of Time-Evolving Graphs;62
6.4;4. Graph Applications;65
6.4.1;4.1 Chemical and Biological Applications;65
6.4.2;4.2 Web Applications;67
6.4.3;4.3 Software Bug Localization;73
6.5;5. Conclusions and Future Research;77
6.6;Notes;77
6.7;References;77
7;GRAPH MINING: LAWS AND GENERATORS;91
7.1;1. Introduction;92
7.2;2. Graph Patterns;93
7.2.1;2.1 Power Laws and Heavy-Tailed Distributions;94
7.2.2;2.2 Small Diameters;99
7.2.3;2.3 Other Static Graph Patterns;101
7.2.4;2.4 Patterns in Evolving Graphs;104
7.2.5;2.5 The Structure of Specific Graphs;106
7.3;3. Graph Generators;108
7.3.1;3.1 Random Graph Models;110
7.3.2;3.2 Preferential Attachment and Variants;114
7.3.3;3.3 Optimization-based generators;123
7.3.4;3.4 Tensor-based;130
7.3.5;3.5 Generators for specific graphs;135
7.3.6;3.6 Graph Generators: A summary;137
7.4;4. Conclusions;137
7.5;Notes;138
7.6;References;139
8;QUERY LANGUAGE AND ACCESS METHODS FOR GRAPH DATABASES;146
8.1;1. Introduction;147
8.1.1;1.1 Graphs-at-a-time Queries;147
8.1.2;1.2 Graph Specific Optimizations;148
8.1.3;1.3 GraphQL;149
8.2;2. Operations on Graph Structures;150
8.2.1;2.1 Concatenation;151
8.2.2;2.2 Disjunction;152
8.2.3;2.3 Repetition;152
8.3;3. Graph Query Language;153
8.3.1;3.1 Data Model;153
8.3.2;3.2 Graph Patterns;154
8.3.3;3.3 Graph Algebra;155
8.3.4;3.4 FLWR Expressions;158
8.3.5;3.5 Expressive Power;159
8.4;4. Implementation of the Selection Operator;161
8.4.1;4.1 Graph Pattern Matching;161
8.4.2;4.2 Local Pruning and Retrieval of Feasible Mates;163
8.4.3;4.3 Joint Reduction of Search Space;165
8.4.4;4.4 Optimization of Search Order;167
8.5;5. Experimental Study;169
8.5.1;5.1 Biological Network;169
8.5.2;5.2 Synthetic Graphs;171
8.6;6. RelatedWork;173
8.6.1;6.1 Graph Query Languages;173
8.6.2;6.2 Graph Indexing;176
8.7;7. Future Research Directions;176
8.8;8. Conclusion;177
8.9;Acknowledgments;177
8.10;Appendix: Query Syntax of GraphQL;177
8.11;References;178
9;GRAPH INDEXING;182
9.1;1. Introduction;182
9.2;2. Feature-Based Graph Index;183
9.2.1;2.1 Paths;184
9.2.2;2.2 Frequent Structures;185
9.2.3;2.3 Discriminative Structures;187
9.2.4;2.4 Closed Frequent Structures;188
9.2.5;2.5 Trees;188
9.2.6;2.6 Hierarchical Indexing;189
9.3;3. Structure Similarity Search;190
9.3.1;3.1 Feature-Based Structural Filtering;191
9.3.2;3.2 Feature Miss Estimation;192
9.3.3;3.3 Frequency Difference;193
9.3.4;3.4 Feature Set Selection;194
9.3.5;3.5 Structures with Gaps;195
9.4;4. Reverse Substructure Search;196
9.5;5. Conclusions;198
9.6;References;199
10;GRAPH REACHABILITY QUERIES: A SURVEY;202
10.1;1. Introduction;202
10.2;2. Traversal Approaches;207
10.2.1;2.1 Tree+SSPI;208
10.2.2;2.2 GRIPP;208
10.3;3. Dual-Labeling;209
10.4;4. Tree Cover;211
10.5;5. Chain Cover;212
10.5.1;5.1 Computing the Optimal Chain Cover;214
10.6;6. Path-Tree Cover;215
10.7;7. 2-HOP Cover;217
10.7.1;7.1 A Heuristic Ranking;218
10.7.2;7.2 A Geometrical-Based Approach;219
10.7.3;7.3 Graph Partitioning Approaches;220
10.7.4;7.4 2-Hop Cover Maintenance;223
10.8;8. 3-Hop Cover;225
10.9;9. Distance-Aware 2-Hop Cover;226
10.10;10. Graph Pattern Matching;228
10.10.1;10.1 A Special Case:;229
10.10.2;10.2 The General Cases;232
10.11;11. Conclusions and Summary;233
10.12;References;233
11;EXACT AND INEXACT GRAPH MATCHING: METHODOLOGY AND APPLICATIONS;237
11.1;1. Introduction;238
11.2;2. Basic Notations;239
11.3;3. Exact Graph Matching;241
11.4;4. Inexact Graph Matching;246
11.4.1;4.1 Graph Edit Distance;247
11.4.2;4.2 Other Inexact Graph Matching Techniques;249
11.5;5. Graph Matching for Data Mining and Information Retrieval;251
11.6;6. Vector Space Embeddings of Graphs via Graph Matching;255
11.7;7. Conclusions;259
11.8;References;260
12;A SURVEY OF ALGORITHMS FOR KEYWORD SEARCH ON GRAPH DATA;268
12.1;1. Introduction;269
12.2;2. Keyword Search on XML Data;271
12.2.1;2.1 Query Semantics;272
12.2.2;2.2 Answer Ranking;273
12.2.3;2.3 Algorithms for LCA-based Keyword Search;277
12.3;3. Keyword Search on Relational Data;279
12.3.1;3.1 Query Semantics;279
12.3.2;3.2 DBXplorer and DISCOVER;280
12.4;4. Keyword Search on Schema-Free Graphs;282
12.4.1;4.1 Query Semantics and Answer Ranking;282
12.4.2;4.2 Graph Exploration by Backward Search;284
12.4.3;4.3 Graph Exploration by Bidirectional Search;285
12.4.4;4.4 Index-based Graph Exploration – the BLINKS Algorithm;286
12.4.5;4.5 The ObjectRank Algorithm;288
12.5;5. Conclusions and Future Research;290
12.6;References;290
13;A SURVEY OF CLUSTERING ALGORITHMS FOR GRAPH DATA;293
13.1;1. Introduction;293
13.2;2. Node Clustering Algorithms;295
13.2.1;2.1 The Minimum Cut Problem;295
13.2.2;2.2 Multi-way Graph Partitioning;299
13.2.3;2.3 Conventional Generalizations and Network Structure Indices;300
13.2.4;2.4 The Girvan-Newman Algorithm;302
13.2.5;2.5 The Spectral Clustering Method;303
13.2.6;2.6 Determining Quasi-Cliques;306
13.2.7;2.7 The Case of Massive Graphs;307
13.3;3. Clustering Graphs as Objects;309
13.3.1;3.1 Extending Classical Algorithms to Structural Data;309
13.3.2;3.2 The XProj Approach;311
13.4;4. Applications of Graph Clustering Algorithms;313
13.4.1;4.1 Community Detection in Web Applications and Social Networks;314
13.4.2;4.2 Telecommunication Networks;315
13.4.3;4.3 Email Analysis;315
13.5;5. Conclusions and Future Research;315
13.6;References;317
14;A SURVEY OF ALGORITHMS FOR DENSE SUBGRAPH DISCOVERY;320
14.1;1. Introduction;321
14.2;2. Types of Dense Components;322
14.2.1;2.1 Absolute vs. Relative Density;322
14.2.2;2.2 Graph Terminology;323
14.2.3;2.3 Definitions of Dense Components;324
14.2.4;2.4 Dense Component Selection;325
14.2.5;2.5 Relationship between Clusters and Dense Components;326
14.3;3. Algorithms for Detecting Dense Components in a Single Graph;328
14.3.1;3.1 Exact Enumeration Approach;328
14.3.2;3.2 Heuristic Approach;331
14.3.3;3.3 Exact and Approximation Algorithms for Discovering Densest Components;339
14.4;4. Frequent Dense Components;344
14.4.1;4.1 Frequent Patterns with Density Constraints;344
14.4.2;4.2 Dense Components with Frequency Constraint;345
14.4.3;4.3 Enumerating Cross-Graph Quasi-Cliques;345
14.5;5. Applications of Dense Component Analysis;346
14.6;6. Conclusions and Future Research;348
14.7;References;350
15;GRAPH CLASSIFICATION;354
15.1;1. Introduction;354
15.2;2. Graph Kernels;357
15.2.1;2.1 RandomWalks on Graphs;358
15.2.2;2.2 Label Sequence Kernel;359
15.2.3;2.3 Efficient Computation of Label Sequence Kernels;360
15.2.4;2.4 Extensions;366
15.3;3. Graph Boosting;366
15.3.1;3.1 Formulation of Graph Boosting;368
15.3.2;3.2 Optimal Pattern Search;370
15.3.3;3.3 Computational Experiments;371
15.3.4;3.4 RelatedWork;372
15.4;4. Applications of Graph Classification;375
15.5;5. Label Propagation;375
15.6;6. Concluding Remarks;376
15.7;References;376
16;MINING GRAPH PATTERNS;381
16.1;1. Introduction;382
16.2;2. Frequent Subgraph Mining ;382
16.2.1;2.1 Problem Definition;382
16.2.2;2.2 Apriori-based Approach;383
16.2.3;2.3 Pattern-Growth Approach;384
16.2.4;2.4 Closed and Maximal Subgraphs;385
16.2.5;2.5 Mining Subgraphs in a Single Graph;386
16.2.6;2.6 The Computational Bottleneck;387
16.3;3. Mining Significant Graph Patterns ;388
16.3.1;3.1 Problem Definition;388
16.3.2;3.2 gboost: A Branch-and-Bound Approach;389
16.3.3;3.3 gPLS: A Partial Least Squares Regression Approach;391
16.3.4;3.4 LEAP: A Structural Leap Search Approach;394
16.3.5;3.5 GraphSig: A Feature Representation Approach;398
16.4;4. Mining Representative Orthogonal Graphs;401
16.4.1;4.1 Problem Definition;402
16.4.2;4.2 Randomized Maximal Subgraph Mining;403
16.4.3;4.3 Orthogonal Representative Set Generation;404
16.5;5. Conclusions;405
16.6;References;405
17;A SURVEY ON STREAMING ALGORITHMS FOR MASSIVE GRAPHS;409
17.1;1. Introduction;409
17.2;2. Streaming Model for Massive Graphs;411
17.3;3. Statistics and Counting Triangles;413
17.4;4. Graph Matching;416
17.4.1;4.1 Unweighted Matching;416
17.4.2;4.2 Weighted Matching;419
17.5;5. Graph Distance;421
17.5.1;5.1 Distance Approximation using Multiple Passes;422
17.5.2;5.2 Distance Approximation in One Pass;427
17.6;6. Random Walks on Graphs;428
17.7;7. Conclusions;432
17.8;References;433
18;A SURVEY OF PRIVACY-PRESERVATION OF GRAPHS AND SOCIAL NETWORKS;437
18.1;1. Introduction;438
18.1.1;1.1 Privacy in Publishing Social Networks;438
18.1.2;1.2 Background Knowledge;439
18.1.3;1.3 Utility Preservation;440
18.1.4;1.4 Anonymization Approaches;440
18.1.5;1.5 Notations;441
18.2;2. Privacy Attacks on Naive Anonymized Networks;442
18.2.1;2.1 Active Attacks and Passive Attacks;442
18.2.2;2.2 Structural Queries;443
18.2.3;2.3 Other Attacks;444
18.3;3. K-Anonymity Privacy Preservation via Edge Modification;444
18.3.1;3.1 K-Degree Generalization;445
18.3.2;3.2 K-Neighborhood Anonymity;446
18.3.3;3.3 K-Automorphism Anonymity;447
18.4;4. Privacy Preservation via Randomization;449
18.4.1;4.1 Resilience to Structural Attacks;450
18.4.2;4.2 Link Disclosure Analysis;451
18.4.3;4.3 Reconstruction;453
18.4.4;4.4 Feature Preserving Randomization;454
18.5;5. Privacy Preservation via Generalization;456
18.6;6. Anonymizing Rich Graphs;457
18.6.1;6.1 Link Protection in Rich Graphs;458
18.6.2;6.2 Anonymizing Bipartite Graphs;459
18.6.3;6.3 Anonymizing Rich Interaction Graphs;460
18.6.4;6.4 Anonymizing Edge-Weighted Graphs;461
18.7;7. Other Privacy Issues in Online Social Networks;462
18.7.1;7.1 Deriving Link Structure of the Entire Network;462
18.7.2;7.2 Deriving Personal Identifying Information from Social Networking Sites;464
18.8;8. Conclusion and Future Work;464
18.9;Acknowledgments;465
18.10;References;465
19;A SURVEY OF GRAPH MINING FOR WEB APPLICATIONS;470
19.1;1. Introduction;471
19.2;2. Preliminaries;472
19.2.1;2.1 Link Analysis Ranking Algorithms;474
19.3;3. Mining High-Quality Items;476
19.3.1;3.1 Prediction of Successful Items in a Co-citation Network;478
19.3.2;3.2 Finding High-Quality Content in Question-Answering Portals;480
19.4;4. Mining Query Logs;484
19.4.1;4.1 Description of Query Logs;485
19.4.2;4.2 Query Log Graphs;485
19.4.3;4.3 Query Recommendations;492
19.5;5. Conclusions;495
19.6;References;496
20;GRAPH MINING APPLICATIONS TO SOCIAL NETWORK ANALYSIS;501
20.1;1. Introduction;501
20.2;2. Graph Patterns in Large-Scale Networks;503
20.2.1;2.1 Scale-Free Networks;503
20.2.2;2.2 Small-World Effect;505
20.2.3;2.3 Community Structures;506
20.2.4;2.4 Graph Generators;508
20.3;3. Community Detection;508
20.3.1;3.1 Node-Centric Community Detection;509
20.3.2;3.2 Group-Centric Community Detection;512
20.3.3;3.3 Network-Centric Community Detection;513
20.3.4;3.4 Hierarchy-Centric Community Detection;518
20.4;4. Community Structure Evaluation;519
20.5;5. Research Issues;521
20.6;References;522
21;SOFTWARE-BUG LOCALIZATION WITH GRAPH MINING;528
21.1;1. Introduction;529
21.2;2. Basics of Call Graph Based Bug Localization;530
21.2.1;2.1 Dynamic Call Graphs;530
21.2.2;2.2 Bugs in Software;531
21.2.3;2.3 Bug Localization with Call Graphs;532
21.2.4;2.4 Graph and Tree Mining;533
21.3;3. RelatedWork;534
21.4;4. Call-Graph Reduction;538
21.4.1;4.1 Total Reduction;538
21.4.2;4.2 Iterations;539
21.4.3;4.3 Temporal Order;541
21.4.4;4.4 Recursion;542
21.4.5;4.5 Comparison;544
21.5;5. Call Graph Based Bug Localization;545
21.5.1;5.1 Structural Approaches;545
21.5.2;5.2 Frequency-based Approach;548
21.5.3;5.3 Combined Approaches;551
21.5.4;5.4 Comparison;552
21.6;6. Conclusions and Future Directions;555
21.7;Acknowledgments;556
21.8;References;556
22;A SURVEY OF GRAPH MINING TECHNIQUES FOR BIOLOGICAL DATASETS;560
22.1;1. Introduction;561
22.2;2. Mining Trees;562
22.2.1;2.1 Frequent Subtree Mining;563
22.2.2;2.2 Tree Alignment and Comparison;565
22.2.3;2.3 Statistical Models;567
22.3;3. Mining Graphs for the Discovery of Frequent Substructures;568
22.3.1;3.1 Frequent Subgraph Mining;568
22.3.2;3.2 Motif Discovery in Biological Networks;573
22.4;4. Mining Graphs for the Discovery of Modules;575
22.4.1;4.1 Extracting Communities;577
22.4.2;4.2 Clustering;579
22.5;5. Discussion;582
22.6;References;584
23;TRENDS IN CHEMICAL GRAPH DATA MINING;594
23.1;1. Introduction;595
23.2;2. Topological Descriptors for Chemical Compounds;596
23.2.1;2.1 Hashed Fingerprints (FP);597
23.2.2;2.2 Maccs Keys (MK);597
23.2.3;2.3 Extended Connectivity Fingerprints (ECFP);597
23.2.4;2.4 Frequent Subgraphs (FS);598
23.2.5;2.5 Bounded-Size Graph Fragments (GF);598
23.2.6;2.6 Comparison of Descriptors;598
23.3;3. Classification Algorithms for Chemical Compounds;601
23.3.1;3.1 Approaches based on Descriptors;601
23.3.2;3.2 Approaches based on Graph Kernels;602
23.4;4. Searching Compound Libraries;603
23.4.1;4.1 Methods Based on Direct Similarity;604
23.4.2;4.2 Methods Based on Indirect Similarity;605
23.4.3;4.3 Performance of Indirect Similarity Methods;607
23.5;5. Identifying Potential Targets for Compounds;608
23.5.1;5.1 Model-based Methods For Target Fishing;609
23.5.2;5.2 Performance of Target Fishing Strategies;613
23.6;6. Future Research Directions;613
23.7;References;615
24;Index;620




