E-Book, Englisch, 328 Seiten
Reihe: Chapman & Hall/CRC Data Mining and Knowledge Discovery Series
Srivastava / Sahami Text Mining
Erscheinungsjahr 2010
ISBN: 978-1-4200-5945-8
Verlag: Taylor & Francis
Format: PDF
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
Classification, Clustering, and Applications
E-Book, Englisch, 328 Seiten
Reihe: Chapman & Hall/CRC Data Mining and Knowledge Discovery Series
ISBN: 978-1-4200-5945-8
Verlag: Taylor & Francis
Format: PDF
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
The Definitive Resource on Text Mining Theory and Applications from Foremost Researchers in the Field
Giving a broad perspective of the field from numerous vantage points, Text Mining: Classification, Clustering, and Applications focuses on statistical methods for text mining and analysis. It examines methods to automatically cluster and classify text documents and applies these methods in a variety of areas, including adaptive information filtering, information distillation, and text search.
The book begins with chapters on the classification of documents into predefined categories. It presents state-of-the-art algorithms and their use in practice. The next chapters describe novel methods for clustering documents into groups that are not predefined. These methods seek to automatically determine topical structures that may exist in a document corpus. The book concludes by discussing various text mining applications that have significant implications for future research and industrial use.
There is no doubt that text mining will continue to play a critical role in the development of future information systems and advances in research will be instrumental to their success. This book captures the technical depth and immense practical potential of text mining, guiding readers to a sound appreciation of this burgeoning field.
Autoren/Hrsg.
Weitere Infos & Material
Analysis of Text Patterns Using Kernel Methods
Marco Turchi, Alessia Mammone, and Nello Cristianini
Introduction
General Overview on Kernel Methods
Kernels for Text
Example
Conclusion and Further Reading
Detection of Bias in Media Outlets with Statistical Learning Methods
Blaz Fortuna, Carolina Galleguillos, and Nello Cristianini
Introduction
Overview of the Experiments
Data Collection and Preparation
News Outlet Identification
Topic-Wise Comparison of Term Bias
News Outlets Map
Related Work
Conclusion
Appendix A: Support Vector Machines
Appendix B: Bag of Words and Vector Space Models
Appendix C: Kernel Canonical Correlation Analysis
Appendix D: Multidimensional Scaling
Collective Classification for Text Classification
Galileo Namata, Prithviraj Sen, Mustafa Bilgic, and Lise Getoor
Introduction
Collective Classification: Notation and Problem Definition
Approximate Inference Algorithms for Approaches Based on Local Conditional Classifiers
Approximate Inference Algorithms for Approaches Based on Global Formulations
Learning the Classifiers
Experimental Comparison
Related Work
Conclusion
Topic Models
David M. Blei and John D. Lafferty
Introduction
Latent Dirichlet Allocation (LDA)
Posterior Inference for LDA
Dynamic Topic Models and Correlated Topic Models
Discussion
Nonnegative Matrix and Tensor Factorization for Discussion Tracking
Brett W. Bader, Michael W. Berry, and Amy N. Langville
Introduction
Notation
Tensor Decompositions and Algorithms
Enron Subset
Observations and Results
Visualizing Results of the NMF Clustering
Future Work
Text Clustering with Mixture of von Mises–Fisher Distributions
Arindam Banerjee, Inderjit Dhillon, Joydeep Ghosh, and Suvrit Sra
Introduction
Related Work
Preliminaries
EM on a Mixture of vMFs (moVMF)
Handling High-Dimensional Text Datasets
Algorithms
Experimental Results
Discussion
Conclusions and Future Work
Constrained Partitional Clustering of Text Data: An Overview
Sugato Basu and Ian Davidson
Introduction
Uses of Constraints
Text Clustering
Partitional Clustering with Constraints
Learning Distance Function with Constraints
Satisfying Constraints and Learning Distance Functions
Experiments
Conclusions
Adaptive Information Filtering
Yi Zhang
Introduction
Standard Evaluation Measures
Standard Retrieval Models and Filtering Approaches
Collaborative Adaptive Filtering
Novelty and Redundancy Detection
Other Adaptive Filtering Topics
Utility-Based Information Distillation
Yiming Yang and Abhimanyu Lad
Introduction
A Sample Task
Technical Cores
Evaluation Methodology
Data
Experiments and Results
Concluding Remarks
Text Search Enhanced with Types and Entities
Soumen Chakrabarti, Sujatha Das, Vijay Krishnan, and Kriti Puniyani
Entity-Aware Search Architecture
Understanding the Question
Scoring Potential Answer Snippets
Indexing and Query Processing
Conclusion
Index