Mukhopadhyay | Web Searching and Mining | E-Book | www2.sack.de
E-Book

E-Book, Englisch, 176 Seiten

Reihe: Cognitive Intelligence and Robotics

Mukhopadhyay Web Searching and Mining


1. Auflage 2018
ISBN: 978-981-13-3053-7
Verlag: Springer Nature Singapore
Format: PDF
Kopierschutz: 1 - PDF Watermark

E-Book, Englisch, 176 Seiten

Reihe: Cognitive Intelligence and Robotics

ISBN: 978-981-13-3053-7
Verlag: Springer Nature Singapore
Format: PDF
Kopierschutz: 1 - PDF Watermark



This book presents the basics of search engines and their components. It introduces, for the first time, the concept of Cellular Automata in Web technology and discusses the prerequisites of Cellular Automata. In today's world, searching data from the World Wide Web is a common phenomenon for virtually everyone. It is also a fact that searching the tremendous amount of data from the Internet is a mammoth task - and handling the data after retrieval is even more challenging. In this context, it is important to understand the need for space efficiency in data storage. Though Cellular Automata has been utilized earlier in many fields, in this book the authors experiment with employing its strong mathematical model to address some critical issues in the field of Web Mining.


Dr. Debajyoti Mukhopadhyay is currently the Dean (R&D) and Professor & Head of Computer Engineering at NHITM affiliated to Mumbai University (India). He previously worked in the IT industry for nineteen years, including at the well-known Bell Communications Research, USA, and in academia for sixteen years, including as the Dean (R&D) of Maharashtra Institute of Technology, Pune, India. He has published over 190 research papers and holds three patents. Dr. Mukhopadhyay previously worked in the corporate sector, holding top-level positions, such as the President & CEO, Director and General Manager and oversaw a large number of professionals managing multiple off-shore projects from India. Dr. Mukhopadhyay has been elected as the Distinguished Speaker of the Computer Society of India. He had held Visiting Positions at: Chonbuk National University (South Korea), George Mason University (USA), Thapar University (India). He holds a PhD (Engineering) from Jadavpur University (India), an MS in Computer Science from Stevens Institute of Technology (USA), Post Graduate Diploma in Computer Science from The Queen's University of Belfast (UK) and a BE (Electronics & Telecommunications Engineering) from Bengal Engineering College under the University of Calcutta. Dr. Mukhopadhyay is an FIE, FIETE, SMIEEE (USA), SMACM (USA), CEngg., MIMA (India), and Elected Member of Eta-Kappa-Nu (the EE Honor Society of the USA).

Mukhopadhyay Web Searching and Mining jetzt bestellen!

Autoren/Hrsg.


Weitere Infos & Material


1;Preface;6
2;Contents;7
3;About the Editor;8
4;List of Figures;9
5;List of Tables;12
6;1 Introduction;14
6.1;1 Why Web Search Engine?;14
6.2;2 Web Search Engine: Some Basic Facts;15
6.3;3 Domain-Specific Web Search Engine Concepts;18
6.4;4 Survey of Existing Methodologies;19
6.4.1;4.1 Web Crawling;19
6.4.1.1;4.1.1 Domain-Specific Web Crawling;21
6.4.1.2;4.1.2 Ontology Basics;22
6.4.1.3;4.1.3 WordNet;24
6.4.1.4;4.1.4 Resource Structuring;24
6.4.2;4.2 Predicting Web-Pages at Runtime;25
6.4.3;4.3 Lucky Searching;26
6.4.3.1;4.3.1 Domain-Specific Lucky Searching;26
6.4.4;4.4 Indexing Web-Pages at Runtime;27
6.4.4.1;4.4.1 Back-of-the-Book-Style;28
6.4.4.2;4.4.2 Human-Produced Web-Page Index;28
6.4.4.3;4.4.3 Meta Search Web-Page Indexing;28
6.4.4.4;4.4.4 Cache-Based Web-Page Indexing;28
6.4.5;4.5 Product Searching;29
6.4.6;4.6 Image Searching;30
6.4.6.1;4.6.1 Existing Text to Image Search;31
6.4.6.2;4.6.2 Existing Image to Image Search;31
6.5;References;33
7;2 Preliminaries on Cellular Automata;41
7.1;1 What is Cellular Automata;41
7.2;2 Conceptualization of Cellular Automata;43
7.3;3 Applications of Cellular Automata;45
7.4;4 Conclusion;46
7.5;References;46
8;3 Design of SMACA;48
8.1;1 Introduction;48
8.2;2 Generation of SMACA;49
8.3;3 Synthesis of SMACA;51
8.4;4 Analysis of SMACA Through RVG;54
8.5;5 SLA Detection in RVG;59
8.6;6 Conclusion;60
8.7;References;61
9;4 SMACA Usage in Indexing Storage of a Search Engine;62
9.1;1 Introduction;62
9.2;2 Background of Search Engine;63
9.3;3 Existing Mechanism to Store Web-Data;63
9.4;4 Formation of Indexing Storage Using SMACA;64
9.5;5 Generation of SMACA for Each Website;65
9.6;6 Generation of Inverted Indexed File;66
9.7;7 Replacing Inverted Indexed File by SMACA;67
9.8;8 Searching Mechanism;68
9.9;9 Experimental Results;69
9.10;10 Conclusion;73
9.11;References;74
10;5 Cellular Automata in Web-Page Ranking;75
10.1;1 Introduction;75
10.2;2 Page Ranking Concept;76
10.3;3 Concept of Galois Field: GF(2) & GF(2P) Using CA;76
10.4;4 Mapping Link Structure of Web-Pages with Cellular Automata;79
10.5;5 Indexing in Ranking;81
10.6;6 Conclusion;83
10.7;References;84
11;6 Web-Page Indexing Based on the Prioritize Ontology Terms;85
11.1;1 Introduction;85
11.2;2 Rules and Definitions;86
11.3;3 Proposed Approach;86
11.3.1;3.1 Extraction of Dominating and Sub-dominating Ontology Terms;87
11.3.2;3.2 Proposed Algorithm of Web-Page Indexing;88
11.3.3;3.3 Complexity of Indexing Web-Pages;88
11.3.4;3.4 User Interface;89
11.3.5;3.5 Web-Page Retrieval Mechanism Based on the User Input;90
11.4;4 Experimental Analysis;91
11.4.1;4.1 Experiment Procedure;91
11.4.2;4.2 Time Complexity to Produce Resultant Web-Page List;91
11.4.3;4.3 Experimental Result;92
11.5;5 Conclusions;93
11.6;References;93
12;7 Domain-Specific Crawler Design;95
12.1;1 Introduction;95
12.2;2 Proposed Approach;96
12.2.1;2.1 Single Domain-Specific Web Search Crawler;97
12.2.1.1;2.1.1 Proposed Web-Page Content Relevance Calculation Algorithm for Single Domain;97
12.2.1.2;2.1.2 Domain-Specific Web-Page Repository Building;98
12.2.1.3;2.1.3 Challenges Faced While Crawling;98
12.2.1.4;2.1.4 Relevance Page Tree;100
12.2.1.5;2.1.5 Searching a Web-Page from RPaT Model;100
12.2.1.6;2.1.6 Generation of RPaT;100
12.2.2;2.2 Multiple Domains Specific Web Search Crawler;101
12.2.2.1;2.2.1 Proposed Web-Page Content Relevance Calculation Algorithm for Multiple Domains;102
12.2.2.2;2.2.2 Multiple Domains Specific Web-Page Repository Building;104
12.2.2.3;2.2.3 Relevance Page Graph;105
12.2.2.4;2.2.4 Searching a Web-Page from RPaG Model;106
12.2.3;2.3 Multilevel Domains Specific Web Search Crawler;107
12.2.3.1;2.3.1 Classifier 1: Web-Page Content Classifier;107
12.2.3.2;2.3.2 Classifier 2: Web-Page URL Classifier;108
12.2.3.3;2.3.3 User Interface;108
12.2.3.4;2.3.4 Proposed Multilevel Domain Specific Web Search Crawler Design Algorithm;110
12.2.3.5;2.3.5 Web-Page Retrieval Mechanism Based on the User Input;111
12.3;3 Experimental Analyzes;111
12.3.1;3.1 Single Domain-Specific Web Search Crawler;111
12.3.1.1;3.1.1 Test Settings;112
12.3.1.1.1;Seed URLs;112
12.3.1.1.2;Weight Table;112
12.3.1.2;3.1.2 Test Results;113
12.3.1.2.1;Harvest Rate for Unfocused Crawling;113
12.3.1.2.2;Harvest Rate for Single Domain-Specific Web-Page Crawling;113
12.3.2;3.2 Multiple Domains Specific Web Search Crawler;114
12.3.2.1;3.2.1 Test Settings;115
12.3.2.1.1;Seed URLs;115
12.3.2.1.2;Syntable;115
12.3.2.1.3;Weight Table;116
12.3.2.2;3.2.2 Test Results;116
12.3.2.2.1;Page Distribution in Different Domains;117
12.3.2.2.2;Multiple Domains Crawler Performance Over Single Domain Crawler;117
12.3.3;3.3 Multilevel Domains Specific Web Search Crawler;118
12.3.3.1;3.3.1 Experiment Procedure;118
12.3.3.2;3.3.2 Complexity Analysis;118
12.3.3.3;3.3.3 Experimental Result;118
12.3.3.3.1;Accuracy Testing of Our Prototype;118
12.3.3.3.2;Parallel Crawling Performance Report;119
12.4;4 Conclusions;120
12.5;References;121
13;8 Structural Change of Domain-Specific Web-Page Repository for Efficient Searching;123
13.1;1 Introduction;123
13.2;2 Proposed Approach;124
13.2.1;2.1 HERT Model;124
13.2.1.1;2.1.1 Searching a Web-Page from HERT Model;125
13.2.1.2;2.1.2 Challenges Faced While Constructing HERT;126
13.2.1.3;2.1.3 Algorithm for Construction of HERT from RPaT;126
13.2.2;2.2 IBAG Model;132
13.2.2.1;2.2.1 Searching a Web-Page from IBAG Model;134
13.2.2.2;2.2.2 Construction of IBAG from RPaG;134
13.2.2.3;2.2.3 User Interface;139
13.2.2.4;2.2.4 Procedure for Web-Page Selection and Its Related Dynamic Ranking;140
13.2.2.5;2.2.5 Reason of Introducing Multilevel Indexing Concept;141
13.2.3;2.3 M-IBAG Model;142
13.2.3.1;2.3.1 Construction of M-IBAG Model from IBAG Model;144
13.3;3 Experimental Analysis;145
13.3.1;3.1 Sample HERT Construction;145
13.3.1.1;3.1.1 RPaT Web-Pages;145
13.3.1.2;3.1.2 HERT Web-Pages;146
13.3.2;3.2 Performance of HERT Searching Over RPaT Searching;147
13.3.3;3.3 Comparative Study of Time Complexity for Different Models;148
13.3.3.1;3.3.1 RPaG Model Complexity;148
13.3.3.1.1;Best-case Time Complexity;148
13.3.3.1.2;Worst-Case Time Complexity;149
13.3.3.1.3;Average-case Time Complexity;149
13.3.3.1.4;IBAG Model Complexity: Ideal Case;149
13.3.3.1.5;Best-case Time Complexity;149
13.3.3.1.6;Worst-case Time Complexity;149
13.3.3.1.7;Average-case Time Complexity;150
13.3.3.2;3.3.2 IBAG Model Complexity: While All the Web-pages Belong to Same Level;150
13.3.3.2.1;Best-case Time Complexity;150
13.3.3.2.2;Worst-Case Time Complexity;150
13.3.3.2.3;Average-Case Time Complexity;151
13.3.3.3;3.3.3 M-IBAG Model Complexity;151
13.3.3.3.1;Best-case Time Complexity;151
13.3.3.3.2;Worst-Case Time Complexity;151
13.3.3.3.3;Average-Case Time Complexity;152
13.3.4;3.4 Comparative Study of Time Complexity for the Above Given Models;152
13.4;4 Conclusions;153
13.5;References;154
14;9 Domain-Specific Web-Page Prediction;155
14.1;1 Introduction;155
14.2;2 Web-Page Prediction;156
14.3;3 Proposed Approach;156
14.3.1;3.1 Bit Pattern Generation Algorithm;156
14.3.2;3.2 Find Predicted Web-Page List;157
14.4;4 Performance Analysis;159
14.4.1;4.1 Testing Procedure;159
14.4.2;4.2 Test Results;159
14.4.2.1;4.2.1 Average Number of Predicted Web-Page List for a Set of Search String;160
14.4.2.2;4.2.2 Accuracy Measure;160
14.4.2.3;4.2.3 Discussion of Average-Case Time Complexity for Generating Search Results from Both IBAG Model;162
14.4.2.4;4.2.4 Average Time Taken for a Set of Search String;163
14.5;5 Conclusions;163
14.6;References;163
15;10 Domain-Specific Lucky Searching;165
15.1;1 Introduction;165
15.2;2 Proposed Approach;166
15.2.1;2.1 DSLSDB Construction;166
15.2.1.1;2.1.1 Ontology Terms;166
15.2.1.2;2.1.2 DSLSDB Construction Algorithm;166
15.2.2;2.2 Lucky URL Search from DSLSDB;168
15.2.3;2.3 User Interface;170
15.3;3 Experimental Results;170
15.3.1;3.1 Test Settings;171
15.3.1.1;3.1.1 Seed URLs;171
15.3.1.2;3.1.2 Ontology Terms;171
15.3.1.3;3.1.3 Weight Value;171
15.3.1.4;3.1.4 Syntable;172
15.3.1.5;3.1.5 Web-Page Content;172
15.3.2;3.2 Test Results;173
15.3.2.1;3.2.1 DSLSDB Records;173
15.3.2.2;3.2.2 Testing Procedure;173
15.3.2.3;3.2.3 Lucky Searching for Invalid Search String;173
15.3.2.4;3.2.4 Lucky Search for Valid Search String;174
15.3.2.5;3.2.5 Comparative Study Between Regular Search Engine and Domain-Specific Search Engine;174
15.4;4 Conclusion;175
15.5;References;175



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.