E-Book, Englisch, 256 Seiten
Talburt Entity Resolution and Information Quality
1. Auflage 2011
ISBN: 978-0-12-381973-4
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
E-Book, Englisch, 256 Seiten
ISBN: 978-0-12-381973-4
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
Entity Resolution and Information Quality presents topics and definitions, and clarifies confusing terminologies regarding entity resolution and information quality. It takes a very wide view of IQ, including its six-domain framework and the skills formed by the International Association for Information and Data Quality {IAIDQ). The book includes chapters that cover the principles of entity resolution and the principles of Information Quality, in addition to their concepts and terminology. It also discusses the Fellegi-Sunter theory of record linkage, the Stanford Entity Resolution Framework, and the Algebraic Model for Entity Resolution, which are the major theoretical models that support Entity Resolution. In relation to this, the book briefly discusses entity-based data integration (EBDI) and its model, which serve as an extension of the Algebraic Model for Entity Resolution. There is also an explanation of how the three commercial ER systems operate and a description of the non-commercial open-source system known as OYSTER. The book concludes by discussing trends in entity resolution research and practice. Students taking IT courses and IT professionals will find this book invaluable. - First authoritative reference explaining entity resolution and how to use it effectively - Provides practical system design advice to help you get a competitive advantage - Includes a companion site with synthetic customer data for applicatory exercises, and access to a Java-based Entity Resolution program.
Dr. John R. Talburt is Professor of Information Science at the University of Arkansas at Little Rock (UALR) where he is the Coordinator for the Information Quality Graduate Program and the Executive Director of the UALR Center for Advanced Research in Entity Resolution and Information Quality (ERIQ). He is also the Chief Scientist for Black Oak Partners, LLC, an information quality solutions company. Prior to his appointment at UALR he was the leader for research and development and product innovation at Acxiom Corporation, a global leader in information management and customer data integration. Professor Talburt holds several patents related to customer data integration and the author of numerous articles on information quality and entity resolution, and is the author of Entity Resolution and Information Quality (Morgan Kaufmann, 2011). He also holds the IAIDQ Information Quality Certified Professional (IQCP) credential.
Autoren/Hrsg.
Weitere Infos & Material
1;Front Cover;1
2;Entity Resolution and Information Quality;4
3;Copyright;5
4;Dedication;6
5;Contents;8
6;Foreword;10
7;Preface;14
7.1;Motivation for the Book;14
7.2;Audience;15
7.3;Organization of the Material;16
8;Acknowledgements;18
9;Chapter 1: Principles of Entity Resolution;20
9.1;Entity Resolution;20
9.2;Entity Resolution Activities;25
9.3;Summary;54
9.4;Review Questions;55
10;Chapter 2: Principles of Information Quality;58
10.1;Information Quality;58
10.2;IQ and the Quality of Information;61
10.3;Two IP Examples;68
10.4;IQ Management;69
10.5;Information versus Process;72
10.6;IQ and HPC;73
10.7;The Evolution of Information Quality;74
10.8;IQ as an Academic Discipline;78
10.9;IQ and ER;80
10.10;Summary;80
10.11;Review Questions;81
11;Chapter 3: Entity Resolution Models;82
11.1;Overview;82
11.2;The Fellegi-Sunter Model;82
11.3;SERF Model;90
11.4;Algebraic Model;98
11.5;ENRES Meta-Model;117
11.6;Summary;118
11.7;Review Questions;119
12;Chapter 4: Entity-Based Data Integration;122
12.1;Introduction;122
12.2;Formal Framework for Describing EBDI;123
12.3;Optimizing Selection Operator Accuracy;127
12.4;More Complex Selection Rules;133
12.5;Summary;136
12.6;Review Questions;137
13;Chapter 5: Entity Resolution Systems;140
13.1;Introduction;140
13.2;DataFlux dfPowerStudio;140
13.3;Infoglide Identity Resolution Engine;156
13.4;Acxiom AbiliTec;169
13.5;Summary;173
13.6;Review Questions;173
14;Chapter 6: The Oyster Project;176
14.1;Background;176
14.2;OYSTER Logic;177
14.3;Transitive Equivalence Example;184
14.4;Asserted Equivalence Example;188
14.5;Febrl: Open-Source Project;190
14.6;Summary;191
14.7;Review Questions;191
15;Chapter 7: Trends in Entity Resolution Research and Applications;192
15.1;Introduction;192
15.2;ER and Information Hubs;192
15.3;Association Analysis and Social Networks;195
15.4;HPC in ER;198
15.5;Integration of ER and IQ;200
15.6;Entity-Based Data Integration;202
15.7;Fundamental ER Research;206
15.8;Summary;208
15.9;Review Questions;209
16;Appendix A;232
16.1;OYSTER Configurations;232
16.2;OYSTER Setup for Students;233
16.3;OYSTER Merge-Purge Configuration;233
16.4;OYSTER Identity Capture Configuration;240
16.5;OYSTER Identity Build Configuration with Assertions;243
16.6;OYSTER Identity Resolution Configuration;247
17;Glossary;222
18;Bibliography;210
19;Index;250