E-Book, Englisch, Band 42, 362 Seiten, eBook
Computational Models and Empirical Studies
E-Book, Englisch, Band 42, 362 Seiten, eBook
Reihe: Text, Speech and Language Technology
ISBN: 978-90-481-9178-9
Verlag: Springer Netherland
Format: PDF
Kopierschutz: 1 - PDF Watermark
The book covers a wide range of web-genre focused subjects, such as:
• The identification of the sources of web genres
• Automatic web genre identification
• The presentation of structure-oriented models
• Empirical case studies
One of the driving forces behind genre research is the idea ofa genre-sensitive information system, which incorporates genre cues complementing the current keyword-based search and retrieval applications.
Zielgruppe
Research
Autoren/Hrsg.
Weitere Infos & Material
1;Foreword;6
2;Personal Note;9
3;Contents;10
4;Contributors;12
5;Part I Introduction;14
5.1;1 Riding the Rough Waves of Genre on the Web ;15
5.1.1;1.1 Why Is Genre Important?;15
5.1.1.1;1.1.1 Zooming In: Information on the Web;16
5.1.2;1.2 Trying to Grasp the Ungraspable?;18
5.1.2.1;1.2.1 In Quest of a Definition of Web Genre for Empirical Studies and Computational Applications;20
5.1.3;1.3 Empirical and Computational Approaches to Genre: Open Issues;21
5.1.3.1;1.3.1 Web Documents;21
5.1.3.2;1.3.2 Corpora, Genres and the Web;26
5.1.3.3;1.3.3 Empirical and Computational Models of Web Genres;30
5.1.4;1.4 Conclusions;34
5.1.5;1.5 Outline of the Volume;35
5.1.6;References;37
6;Part II Identifying the Sources of Web Genres;43
6.1;2 Conventions and Mutual Expectations ;44
6.1.1;2.1 Genres Are Not Rule-Bound;44
6.1.2;2.2 So, Let's Ask the Readers;46
6.1.3;2.3 An Editorial, Third Party, View of Genres on the Web;51
6.1.4;2.4 Data Source: Observation of User Actions;53
6.1.5;2.5 Conclusions;56
6.1.6;References;56
6.2;3 Identification of Web Genres by User Warrant ;58
6.2.1;3.1 Introduction;58
6.2.2;3.2 Criteria for the Identification of Web Genre;60
6.2.3;3.3 Operationalizing Traditional Genre Theory for the World Wide Web;61
6.2.3.1;3.3.1 A Genre's User Group;61
6.2.3.2;3.3.2 Genre: Function, Form and Substance;63
6.2.3.3;3.3.3 Genres on the Web: Further Implications for Research;66
6.2.4;3.4 Developing a Web Genre Palette;66
6.2.4.1;3.4.1 Collecting Genre Terminology in the Users' Own Words;67
6.2.4.2;3.4.2 Users Choose the Best of the Collected Genre Terminology;69
6.2.4.3;3.4.3 User Validation of the Genre Palette;72
6.2.4.4;3.4.4 A Fourth Study: Determining the Genres' Usefulness for Web Search;75
6.2.5;3.5 Conclusion;76
6.2.6;References;77
6.3;4 Problems in the Use-Centered Development of a Taxonomy of Web Genres ;79
6.3.1;4.1 Introduction;79
6.3.1.1;4.1.1 What Is the Purpose of a Genre Taxonomy?;80
6.3.2;4.2 Why Is It Hard to Develop a Web Genre Taxonomy?;81
6.3.2.1;4.2.1 Difficulties in Defining Genres;81
6.3.2.2;4.2.2 Difficulties in Developing the Scope and Expressiveness of the Taxonomy;83
6.3.3;4.3 A Use-Centered Development of a Taxonomy of Web Genres;85
6.3.3.1;4.3.1 Research Design: Naturalistic Field Study;85
6.3.3.2;4.3.2 Research Informants;85
6.3.3.3;4.3.3 Data Elicitation;86
6.3.3.4;4.3.4 Data Analysis;87
6.3.4;4.4 Results;88
6.3.5;4.5 Discussion;89
6.3.6;4.6 Conclusions;92
6.3.7;References;93
7;Part III Automatic Web Genre Identification;95
7.1;5 Cross-Testing a Genre Classification Model for the Web ;96
7.1.1;5.1 Introduction;96
7.1.2;5.2 Approximating Genre Population on the Web;99
7.1.2.1;5.2.1 Noise;100
7.1.2.2;5.2.2 Description of the Corpora Used for Cross-Testing;101
7.1.3;5.3 The Web as Communication;105
7.1.3.1;5.3.1 Genre Palette;105
7.1.3.2;5.3.2 Linguistically- and Functionally-Motivated Features;107
7.1.4;5.4 The Genre Model;107
7.1.4.1;5.4.1 Methodology;110
7.1.4.2;5.4.2 Flow and Hypotheses;111
7.1.5;5.5 Results;113
7.1.5.1;5.5.1 Cross-Testing Performance on Single Labels: BBC and 7-Webgenre Collections;114
7.1.5.2;5.5.2 Performances of Other Single-Label Models on the 7-Webgenre Collection;117
7.1.5.3;5.5.3 Cross-Testing Performance on Single Labels: Mapped Web Genres;120
7.1.5.4;5.5.4 Cross-Testing Performance on Single Labels: HCG and MCG in Isolation;122
7.1.5.5;5.5.5 The SPIRIT Sample: An Attempt to Assess Multilabelling;122
7.1.6;5.6 Discussion;126
7.1.7;5.7 Conclusion and Future Work;127
7.1.8;References;135
7.2;6 Formulating Representative Features with Respect to Genre Classification;138
7.2.1;6.1 Introduction;138
7.2.2;6.2 Defining Genre Classification;141
7.2.2.1;6.2.1 Document Representation in Conventional Text Classification;141
7.2.2.2;6.2.2 Harmonic Descriptor Representation (HDR) of Documents;141
7.2.2.3;6.2.3 Defining Genre;145
7.2.3;6.3 Classifiers;146
7.2.4;6.4 Dataset;147
7.2.5;6.5 Features;149
7.2.6;6.6 Results;151
7.2.6.1;6.6.1 Overall Accuracy;151
7.2.6.2;6.6.2 Precision and Recall;152
7.2.7;6.7 Conclusions;154
7.2.8;References;155
7.3;7 In the Garden and in the Jungle ;157
7.3.1;7.1 Introduction;157
7.3.2;7.2 Text Typology for the Web;159
7.3.3;7.3 An Experiment in Automatic Classification of the Web;163
7.3.4;7.4 Analysis of Results;167
7.3.4.1;7.4.1 Qualitative Assessment of Texts in Each Category;167
7.3.4.2;7.4.2 Assessing the Composition of ukWac;169
7.3.5;7.5 Conclusions and Future Research;170
7.3.6;References;173
7.4;8 Web Genre Analysis: Use Cases, Retrieval Models, and Implementation Issues ;175
7.4.1;8.1 Introduction;175
7.4.1.1;8.1.1 Contributions;176
7.4.2;8.2 Use Cases: Genre Analysis in the Retrieval Practice;176
7.4.2.1;8.2.1 Genre-Enabled Web Search;177
7.4.2.2;8.2.2 Information Extraction Based on Genre Information;177
7.4.2.3;8.2.3 Organizing Collections in Both Topic and Genre Dimensions;179
7.4.2.4;8.2.4 Empower Web Page Abstraction with Genre Information;180
7.4.3;8.3 Construction of Genre Retrieval Models;181
7.4.3.1;8.3.1 Problems of Genre Retrieval Models and Lessons Learned;182
7.4.3.2;8.3.2 New Elements for Genre Retrieval Models;184
7.4.4;8.4 Evaluation;186
7.4.4.1;8.4.1 Improving Generalization Capability;187
7.4.4.2;8.4.2 Measuring Generalization Capability;187
7.4.4.3;8.4.3 Experiments;188
7.4.5;8.5 Implementing Genre-Enabled Web Search;191
7.4.6;8.6 Conclusion;194
7.4.7;References;195
7.5;9 Marrying Relevance and Genre Rankings: An Exploratory Study ;198
7.5.1;9.1 Introduction;198
7.5.2;9.2 Related Work;200
7.5.2.1;9.2.1 Genre Classification;200
7.5.2.2;9.2.2 Readability Scores;201
7.5.2.3;9.2.3 Genres in Relevance Ranking;202
7.5.3;9.3 Data;203
7.5.3.1;9.3.1 Functional Styles Sample;203
7.5.3.2;9.3.2 ROMIP Collection;204
7.5.4;9.4 Formality Score;205
7.5.5;9.5 Results;208
7.5.5.1;9.5.1 Genre-Related Rankings;208
7.5.5.2;9.5.2 Merged Rankings;210
7.5.6;9.6 Conclusion;212
7.5.7;References;213
8;Part IV Structure-Oriented Models of Web Genres;216
8.1;10 Classification of Web Sites at Super-Genre Level ;217
8.1.1;10.1 Introduction;217
8.1.2;10.2 Related Work;220
8.1.3;10.3 Dataset;221
8.1.4;10.4 Features for Classification;224
8.1.4.1;10.4.1 Features Derived from Structure;224
8.1.4.2;10.4.2 Features Derived from Content;231
8.1.5;10.5 Classification of Web Sites;232
8.1.5.1;10.5.1 Classification by Structure;233
8.1.5.2;10.5.2 Classification by Content;235
8.1.5.3;10.5.3 Classification by Structure and Content;236
8.1.6;10.6 Conclusion;239
8.1.7;References;239
8.2;11 Mining Graph Patterns in Web-Based Systems: A Conceptual View ;242
8.2.1;11.1 Introduction;242
8.2.2;11.2 Mathematical Preliminaries;244
8.2.3;11.3 Structural Graph Measures;246
8.2.4;11.4 Graph Similarity Measures for Web Mining;247
8.2.4.1;11.4.1 Classical Similarity and Distance Measures for Graphs;247
8.2.4.2;11.4.2 Graph Similarity Measures Based on Trees;249
8.2.4.3;11.4.3 Structural Similarity of Generalized Trees;249
8.2.5;11.5 Applications;253
8.2.6;11.6 Conclusion;254
8.2.7;References;255
8.3;12 Genre Connectivity and Genre Drift in a Web of Genres ;259
8.3.1;12.1 Introduction;259
8.3.2;12.2 Methodology;260
8.3.2.1;12.2.1 Source Pages and Target Pages;262
8.3.2.2;12.2.2 Genre Categorization;263
8.3.3;12.3 Results and Discussion;266
8.3.3.1;12.3.1 Source Genres, Target Genres and Genre Pairs;266
8.3.3.2;12.3.2 Web of Genres;273
8.3.3.3;12.3.3 ``Hook'' Genres and ``Lug'' Genres;274
8.3.3.4;12.3.4 Genre Drift, Topic Drift and Small-World Implications;274
8.3.4;12.4 Conclusion;276
8.3.5;References;277
9;Part V Case Studies of Web Genres;279
9.1;13 Genre Emergence in Amateur Flash ;280
9.1.1;13.1 Genres, Multimedia and the Web;280
9.1.2;13.2 Flash and Newgrounds in Amateur Multimedia;283
9.1.3;13.3 Method;285
9.1.3.1;13.3.1 Sampling;285
9.1.3.2;13.3.2 Identifying Potential Emergent Genres;286
9.1.3.3;13.3.3 Cultural References and Message Content;288
9.1.4;13.4 Results;291
9.1.4.1;13.4.1 Network Analysis;291
9.1.4.2;13.4.2 Genre Features;293
9.1.4.3;13.4.3 Cultural References;297
9.1.4.4;13.4.4 Genre, Emergence and Social Network;300
9.1.5;13.5 Discussion and Conclusions;302
9.1.6;References;304
9.2;14 Variation Among Blogs: A Multi-Dimensional Analysis ;306
9.2.1;14.1 Introduction;306
9.2.2;14.2 Corpus Compilation and Analysis;308
9.2.3;14.3 Factor Analysis;309
9.2.3.1;14.3.1 Method;310
9.2.3.2;14.3.2 Results;310
9.2.3.3;14.3.3 Interpretation of Factors;311
9.2.4;14.4 Text Type Analysis;318
9.2.4.1;14.4.1 Method;318
9.2.4.2;14.4.2 Results;319
9.2.4.3;14.4.3 Interpretation of Clusters;320
9.2.5;14.5 Summary of Findings;323
9.2.6;References;324
9.3;15 Evolving Genres in Online Domains: The Hybrid Genre of the Participatory News Article ;326
9.3.1;15.1 Introduction;326
9.3.1.1;15.1.1 The Systemic Functional Approach to Genre;328
9.3.1.2;15.1.2 The English for Specific Purposes Approach to Genre;329
9.3.1.3;15.1.3 Problems with these Existing Approaches to Genre;331
9.3.1.4;15.1.4 A Solution: Social Genre and Cognitive Genre;332
9.3.1.5;15.1.5 A Web Genre: The Participatory News Article;336
9.3.2;15.2 Methodology;337
9.3.3;15.3 Results;340
9.3.3.1;15.3.1 The News Article;340
9.3.3.2;15.3.2 Reader Comments;343
9.3.4;15.4 Discussion;345
9.3.5;15.5 Conclusion;347
9.3.6;References;348
10;Part VI Prospect;352
10.1;16 Any Land in Sight? ;353
10.1.1;16.1 Web Genre Benchmarks;353
10.1.1.1;16.1.1 Genre Labels;354
10.1.1.2;16.1.2 Annotation;354
10.1.1.3;16.1.3 Representativeness;355
10.1.2;16.2 Work Plan;355
10.1.2.1;16.2.1 Benefits;355
11;Index;357