E-Book, Englisch, Band 42, 362 Seiten, eBook
Mehler / Sharoff / Santini Genres on the Web
1. Auflage 2010
ISBN: 978-90-481-9178-9
Verlag: Springer Netherland
Format: PDF
Kopierschutz: 1 - PDF Watermark
Computational Models and Empirical Studies
E-Book, Englisch, Band 42, 362 Seiten, eBook
Reihe: Text, Speech and Language Technology
            ISBN: 978-90-481-9178-9 
            Verlag: Springer Netherland
            
 Format: PDF
    Kopierschutz: 1 - PDF Watermark
Zielgruppe
Research
Autoren/Hrsg.
Weitere Infos & Material
1;Foreword;6
2;Personal Note;9
3;Contents;10
4;Contributors;12
5;Part I Introduction;14
5.1;1  Riding the Rough Waves of Genre on the Web ;15
5.1.1;1.1  Why Is Genre Important?;15
5.1.1.1;1.1.1  Zooming In: Information on the Web;16
5.1.2;1.2  Trying to Grasp the Ungraspable?;18
5.1.2.1;1.2.1  In Quest of a Definition of Web Genre for Empirical Studies and Computational Applications;20
5.1.3;1.3  Empirical and Computational Approaches to Genre: Open Issues;21
5.1.3.1;1.3.1  Web Documents;21
5.1.3.2;1.3.2  Corpora, Genres and the Web;26
5.1.3.3;1.3.3  Empirical and Computational Models of Web Genres;30
5.1.4;1.4  Conclusions;34
5.1.5;1.5  Outline of the Volume;35
5.1.6;References;37
6;Part II Identifying the Sources of Web Genres;43
6.1;2  Conventions and Mutual Expectations ;44
6.1.1;2.1  Genres Are Not Rule-Bound;44
6.1.2;2.2  So, Let's Ask the Readers;46
6.1.3;2.3  An Editorial, Third Party, View of Genres on the Web;51
6.1.4;2.4  Data Source: Observation of User Actions;53
6.1.5;2.5  Conclusions;56
6.1.6;References;56
6.2;3  Identification of Web Genres by User Warrant ;58
6.2.1;3.1  Introduction;58
6.2.2;3.2  Criteria for the Identification of Web Genre;60
6.2.3;3.3  Operationalizing Traditional Genre Theory for the World Wide Web;61
6.2.3.1;3.3.1  A Genre's User Group;61
6.2.3.2;3.3.2  Genre: Function, Form and Substance;63
6.2.3.3;3.3.3  Genres on the Web: Further Implications for Research;66
6.2.4;3.4  Developing a Web Genre Palette;66
6.2.4.1;3.4.1  Collecting Genre Terminology in the Users' Own Words;67
6.2.4.2;3.4.2  Users Choose the Best of the Collected Genre Terminology;69
6.2.4.3;3.4.3  User Validation of the Genre Palette;72
6.2.4.4;3.4.4  A Fourth Study: Determining the Genres' Usefulness for Web Search;75
6.2.5;3.5  Conclusion;76
6.2.6;References;77
6.3;4  Problems in the Use-Centered Development of a Taxonomy of Web Genres ;79
6.3.1;4.1  Introduction;79
6.3.1.1;4.1.1  What Is the Purpose of a Genre Taxonomy?;80
6.3.2;4.2  Why Is It Hard to Develop a Web Genre Taxonomy?;81
6.3.2.1;4.2.1  Difficulties in Defining Genres;81
6.3.2.2;4.2.2  Difficulties in Developing the Scope and Expressiveness of the Taxonomy;83
6.3.3;4.3  A Use-Centered Development of a Taxonomy of Web Genres;85
6.3.3.1;4.3.1  Research Design: Naturalistic Field Study;85
6.3.3.2;4.3.2  Research Informants;85
6.3.3.3;4.3.3  Data Elicitation;86
6.3.3.4;4.3.4  Data Analysis;87
6.3.4;4.4  Results;88
6.3.5;4.5  Discussion;89
6.3.6;4.6  Conclusions;92
6.3.7;References;93
7;Part III Automatic Web Genre Identification;95
7.1;5  Cross-Testing a Genre Classification Model for the Web ;96
7.1.1;5.1  Introduction;96
7.1.2;5.2  Approximating Genre Population on the Web;99
7.1.2.1;5.2.1  Noise;100
7.1.2.2;5.2.2  Description of the Corpora Used for Cross-Testing;101
7.1.3;5.3  The Web as Communication;105
7.1.3.1;5.3.1  Genre Palette;105
7.1.3.2;5.3.2  Linguistically- and Functionally-Motivated Features;107
7.1.4;5.4  The Genre Model;107
7.1.4.1;5.4.1  Methodology;110
7.1.4.2;5.4.2  Flow and Hypotheses;111
7.1.5;5.5  Results;113
7.1.5.1;5.5.1  Cross-Testing Performance on Single Labels: BBC and 7-Webgenre Collections;114
7.1.5.2;5.5.2  Performances of Other Single-Label Models on the 7-Webgenre Collection;117
7.1.5.3;5.5.3  Cross-Testing Performance on Single Labels: Mapped Web Genres;120
7.1.5.4;5.5.4  Cross-Testing Performance on Single Labels: HCG and MCG in Isolation;122
7.1.5.5;5.5.5  The SPIRIT Sample: An Attempt to Assess Multilabelling;122
7.1.6;5.6  Discussion;126
7.1.7;5.7  Conclusion and Future Work;127
7.1.8;References;135
7.2;6  Formulating Representative Features with Respect to Genre Classification;138
7.2.1;6.1  Introduction;138
7.2.2;6.2  Defining Genre Classification;141
7.2.2.1;6.2.1  Document Representation in Conventional Text Classification;141
7.2.2.2;6.2.2  Harmonic Descriptor Representation (HDR) of Documents;141
7.2.2.3;6.2.3  Defining Genre;145
7.2.3;6.3  Classifiers;146
7.2.4;6.4  Dataset;147
7.2.5;6.5  Features;149
7.2.6;6.6  Results;151
7.2.6.1;6.6.1  Overall Accuracy;151
7.2.6.2;6.6.2  Precision and Recall;152
7.2.7;6.7  Conclusions;154
7.2.8;References;155
7.3;7  In the Garden and in the Jungle ;157
7.3.1;7.1  Introduction;157
7.3.2;7.2  Text Typology for the Web;159
7.3.3;7.3  An Experiment in Automatic Classification of the Web;163
7.3.4;7.4  Analysis of Results;167
7.3.4.1;7.4.1  Qualitative Assessment of Texts in Each Category;167
7.3.4.2;7.4.2  Assessing the Composition of ukWac;169
7.3.5;7.5  Conclusions and Future Research;170
7.3.6;References;173
7.4;8  Web Genre Analysis: Use Cases, Retrieval Models, and Implementation Issues ;175
7.4.1;8.1  Introduction;175
7.4.1.1;8.1.1  Contributions;176
7.4.2;8.2  Use Cases: Genre Analysis in the Retrieval Practice;176
7.4.2.1;8.2.1  Genre-Enabled Web Search;177
7.4.2.2;8.2.2  Information Extraction Based on Genre Information;177
7.4.2.3;8.2.3  Organizing Collections in Both Topic and Genre Dimensions;179
7.4.2.4;8.2.4  Empower Web Page Abstraction with Genre Information;180
7.4.3;8.3  Construction of Genre Retrieval Models;181
7.4.3.1;8.3.1  Problems of Genre Retrieval Models and Lessons Learned;182
7.4.3.2;8.3.2  New Elements for Genre Retrieval Models;184
7.4.4;8.4  Evaluation;186
7.4.4.1;8.4.1  Improving Generalization Capability;187
7.4.4.2;8.4.2  Measuring Generalization Capability;187
7.4.4.3;8.4.3  Experiments;188
7.4.5;8.5  Implementing Genre-Enabled Web Search;191
7.4.6;8.6  Conclusion;194
7.4.7;References;195
7.5;9  Marrying Relevance and Genre Rankings: An Exploratory Study ;198
7.5.1;9.1  Introduction;198
7.5.2;9.2  Related Work;200
7.5.2.1;9.2.1  Genre Classification;200
7.5.2.2;9.2.2  Readability Scores;201
7.5.2.3;9.2.3  Genres in Relevance Ranking;202
7.5.3;9.3  Data;203
7.5.3.1;9.3.1  Functional Styles Sample;203
7.5.3.2;9.3.2  ROMIP Collection;204
7.5.4;9.4  Formality Score;205
7.5.5;9.5  Results;208
7.5.5.1;9.5.1  Genre-Related Rankings;208
7.5.5.2;9.5.2  Merged Rankings;210
7.5.6;9.6  Conclusion;212
7.5.7;References;213
8;Part IV Structure-Oriented Models of Web Genres;216
8.1;10  Classification of Web Sites at Super-Genre Level ;217
8.1.1;10.1  Introduction;217
8.1.2;10.2  Related Work;220
8.1.3;10.3  Dataset;221
8.1.4;10.4  Features for Classification;224
8.1.4.1;10.4.1  Features Derived from Structure;224
8.1.4.2;10.4.2  Features Derived from Content;231
8.1.5;10.5  Classification of Web Sites;232
8.1.5.1;10.5.1  Classification by Structure;233
8.1.5.2;10.5.2  Classification by Content;235
8.1.5.3;10.5.3  Classification by Structure and Content;236
8.1.6;10.6  Conclusion;239
8.1.7;References;239
8.2;11  Mining Graph Patterns in Web-Based Systems: A Conceptual View ;242
8.2.1;11.1  Introduction;242
8.2.2;11.2  Mathematical Preliminaries;244
8.2.3;11.3  Structural Graph Measures;246
8.2.4;11.4  Graph Similarity Measures for Web Mining;247
8.2.4.1;11.4.1  Classical Similarity and Distance Measures for Graphs;247
8.2.4.2;11.4.2  Graph Similarity Measures Based on Trees;249
8.2.4.3;11.4.3  Structural Similarity of Generalized Trees;249
8.2.5;11.5  Applications;253
8.2.6;11.6  Conclusion;254
8.2.7;References;255
8.3;12  Genre Connectivity and Genre Drift in a Web of Genres ;259
8.3.1;12.1  Introduction;259
8.3.2;12.2  Methodology;260
8.3.2.1;12.2.1  Source Pages and Target Pages;262
8.3.2.2;12.2.2  Genre Categorization;263
8.3.3;12.3  Results and Discussion;266
8.3.3.1;12.3.1  Source Genres, Target Genres and Genre Pairs;266
8.3.3.2;12.3.2  Web of Genres;273
8.3.3.3;12.3.3  ``Hook'' Genres and ``Lug'' Genres;274
8.3.3.4;12.3.4  Genre Drift, Topic Drift and Small-World Implications;274
8.3.4;12.4  Conclusion;276
8.3.5;References;277
9;Part V Case Studies of Web Genres;279
9.1;13  Genre Emergence in Amateur Flash ;280
9.1.1;13.1  Genres, Multimedia and the Web;280
9.1.2;13.2  Flash and Newgrounds in Amateur Multimedia;283
9.1.3;13.3  Method;285
9.1.3.1;13.3.1  Sampling;285
9.1.3.2;13.3.2  Identifying Potential Emergent Genres;286
9.1.3.3;13.3.3  Cultural References and Message Content;288
9.1.4;13.4  Results;291
9.1.4.1;13.4.1  Network Analysis;291
9.1.4.2;13.4.2  Genre Features;293
9.1.4.3;13.4.3  Cultural References;297
9.1.4.4;13.4.4  Genre, Emergence and Social Network;300
9.1.5;13.5  Discussion and Conclusions;302
9.1.6;References;304
9.2;14  Variation Among Blogs: A Multi-Dimensional Analysis ;306
9.2.1;14.1  Introduction;306
9.2.2;14.2  Corpus Compilation and Analysis;308
9.2.3;14.3  Factor Analysis;309
9.2.3.1;14.3.1  Method;310
9.2.3.2;14.3.2  Results;310
9.2.3.3;14.3.3  Interpretation of Factors;311
9.2.4;14.4  Text Type Analysis;318
9.2.4.1;14.4.1  Method;318
9.2.4.2;14.4.2  Results;319
9.2.4.3;14.4.3  Interpretation of Clusters;320
9.2.5;14.5  Summary of Findings;323
9.2.6;References;324
9.3;15  Evolving Genres in Online Domains: The Hybrid Genre of the Participatory News Article ;326
9.3.1;15.1  Introduction;326
9.3.1.1;15.1.1  The Systemic Functional Approach to Genre;328
9.3.1.2;15.1.2  The English for Specific Purposes Approach to Genre;329
9.3.1.3;15.1.3  Problems with these Existing Approaches to Genre;331
9.3.1.4;15.1.4  A Solution: Social Genre and Cognitive Genre;332
9.3.1.5;15.1.5  A Web Genre: The Participatory News Article;336
9.3.2;15.2  Methodology;337
9.3.3;15.3  Results;340
9.3.3.1;15.3.1  The News Article;340
9.3.3.2;15.3.2  Reader Comments;343
9.3.4;15.4  Discussion;345
9.3.5;15.5  Conclusion;347
9.3.6;References;348
10;Part VI Prospect;352
10.1;16  Any Land in Sight? ;353
10.1.1;16.1  Web Genre Benchmarks;353
10.1.1.1;16.1.1  Genre Labels;354
10.1.1.2;16.1.2  Annotation;354
10.1.1.3;16.1.3  Representativeness;355
10.1.2;16.2  Work Plan;355
10.1.2.1;16.2.1  Benefits;355
11;Index;357





