E-Book, Englisch, Band 33, 150 Seiten
Reihe: Advances in Database Systems
Dong / Pei Sequence Data Mining
1. Auflage 2007
ISBN: 978-0-387-69937-0
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, Band 33, 150 Seiten
Reihe: Advances in Database Systems
ISBN: 978-0-387-69937-0
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
Understanding sequence data, and the ability to utilize this hidden knowledge, will create a significant impact on many aspects of our society. Examples of sequence data include DNA, protein, customer purchase history, web surfing history, and more. This book provides thorough coverage of the existing results on sequence data mining as well as pattern types and associated pattern mining methods. It offers balanced coverage on data mining and sequence data analysis, allowing readers to access the state-of-the-art results in one place.
Autoren/Hrsg.
Weitere Infos & Material
1;Foreword;7
2;Biography;9
3;Preface;10
4;Contents;12
5;Introduction;15
5.1;Examples and Applications of Sequence Data;15
5.1.1;Examples of Sequence Data;16
5.1.2;Examples of Sequence Mining Applications;18
5.2;Basic Definitions;20
5.2.1;Sequences and Sequence Types;20
5.2.2;Characteristics of Sequence Data;21
5.2.3;Sequence Patterns and Sequence Models;22
5.3;General Data Mining Processes and Research Issues;25
5.4;Overview of the Book;26
6;Frequent and Closed Sequence Patterns;28
6.1;Sequential Patterns;28
6.2;GSP: An Apriori-like Method;31
6.3;PrefixSpan: A Pattern-growth, Depth-first Search Method;33
6.3.1;Apriori-like, Breadth-first Search versus Pattern-growth, Depth-first Search;33
6.3.2;PrefixSpan;35
6.3.3;Pseudo-Projection;39
6.4;Mining Sequential Patterns with Constraints;41
6.4.1;Categories of Constraints;42
6.4.2;Mining Sequential Patterns with Prefix-Monotone Constraints;46
6.4.3;Prefix-Monotone Property;46
6.4.4;Pushing Prefix-Monotone Constraints into Sequential Pattern Mining;48
6.4.5;Handling Tough Aggregate Constraints by Prefix-growth;52
6.5;Mining Closed Sequential Patterns;55
6.5.1;Closed Sequential Patterns;55
6.5.2;Efficiently Mining Closed Sequential Patterns;57
6.6;Summary;58
7;Classification, Clustering, Features and Distances of Sequence Data;60
7.1;Three Tasks on Sequence Classification/Clustering;60
7.2;Sequence Features;61
7.2.1;Sequence Feature Types;61
7.2.2;Sequence Feature Selection;63
7.3;Distance Functions over Sequences;64
7.3.1;Overview on Sequence Distance Functions;64
7.3.2;Edit, Hamming, and Alignment based Distances;65
7.3.3;Conditional Probability Distribution based Distance;66
7.3.4;An Example of Feature based Distance: d2;66
7.3.5;Web Session Similarity;67
7.4;Classification of Sequence Data;68
7.4.1;Support Vector Machines;68
7.4.2;Artificial Neural Networks;70
7.4.3;Other Methods;71
7.4.4;Evaluation of Classifiers and Classification Algorithms;71
7.5;Clustering Sequence Data;73
7.5.1;Popular Sequence Clustering Approaches;73
7.5.2;Quality Evaluation of Clustering Results;78
8;Sequence Motifs: Identifying and Characterizing Sequence Families;79
8.1;Motivations and Problems;80
8.1.1;Motivations;80
8.1.2;Four Motif Analysis Problems;81
8.2;Motif Representations;82
8.2.1;Consensus Sequence;83
8.2.2;Position Weight Matrix (PWM);83
8.2.3;Markov Chain Model;86
8.2.4;Hidden Markov Model (HMM);89
8.3;Representative Algorithms for Motif Problems;91
8.3.1;Dynamic Programming for Sequence Scoring and Explanation with HMM;92
8.3.2;Gibbs Sampling for Constructing PWM-based Motif;94
8.3.3;Expectation Maximization for Building HMM;96
8.4;Discussion;98
9;Mining Partial Orders from Sequences;100
9.1;Mining Frequent Closed Partial Orders;102
9.1.1;Problem Definition;102
9.1.2;How Is Frequent Closed Partial Order Mining Different from Other Data Mining Tasks?;105
9.1.3;TranClose: A Rudimentary Method;108
9.1.4;Algorithm Frecpo;111
9.1.5;Applications;117
9.2;Mining Global Partial Orders;118
9.2.1;Motivation and Preliminaries;118
9.2.2;Mining Algorithms;119
9.2.3;Mixture Models;122
9.3;Summary;123
10;Distinguishing Sequence Patterns;124
10.1;Categories of Distinguishing Sequence Patterns;124
10.2;Class-Characteristics Distinguishing Sequence Patterns;126
10.2.1;Definitions and Terminology;126
10.2.2;The ConSGapMiner Algorithm;128
10.2.3;Extending ConSGapMiner: Minimum Gap Constraints;135
10.2.4;Extending ConSGapMiner: Coverage and Prefix-Based Pattern Minimization;137
10.3;Surprising Sequence Patterns;139
11;Related Topics;142
11.1;Structured-Data Mining;142
11.2;Partial Periodic Pattern Mining;143
11.3;Bioinformatics;145
11.4;Sequence Alignment;146
11.5;Biological Sequence Databases and Biological Data Analysis Resources;148
12;References;149
13;Index;157




