Buff | Using Additional Information in Streaming Algorithms | E-Book | www2.sack.de
E-Book

E-Book, Englisch, 127 Seiten

Buff Using Additional Information in Streaming Algorithms


1. Auflage 2017
ISBN: 978-3-96067-594-5
Verlag: Diplomica Verlag
Format: PDF
Kopierschutz: 0 - No protection

E-Book, Englisch, 127 Seiten

ISBN: 978-3-96067-594-5
Verlag: Diplomica Verlag
Format: PDF
Kopierschutz: 0 - No protection



Streaming problems are algorithmic problems that are mainly characterized by their massive input streams. Because of these data streams, the algorithms for these problems are forced to be space-efficient, as the input stream length generally exceeds the available storage.
The goal of this study is to analyze the impact of additional information (more specifically, a hypothesis of the solution) on the algorithmic space complexities of several streaming problems. To this end, different streaming problems are analyzed and compared. The two problems “most frequent item” and “number of distinct items”, with many configurations of different result accuracies and probabilities, are deeply studied. Both lower and upper bounds for the space and time complexity for deterministic and probabilistic environments are analyzed with respect to possible improvements due to additional information. The general solution search problem is compared to the decision problem where a solution hypothesis has to be satisfied.

Buff Using Additional Information in Streaming Algorithms jetzt bestellen!

Autoren/Hrsg.


Weitere Infos & Material


1;List of Figures;7
2;List of Tables;9
3;Introduction;11
3.1;Overview and Structure of the Thesis;13
3.1.1;Structure of the Thesis;14
4;Related Work;15
5;Definitions;17
5.1;Streaming Problems and Algorithms;17
5.2;Determinism and Randomization;21
5.3;Space and Time Complexity;23
5.4;Approximation Ratio and Success Probability;24
5.4.1;Approximation Ratio of a Streaming Counting Problem;24
5.4.2;Success Probability;26
5.5;Solvability;27
5.6;Basic Calculations;29
6;Lower Bounds on Space Complexity;33
6.1;Introduction to Communication Complexity;33
6.2;Lower Bound for General Streaming Problems;35
6.2.1;Lower Bound for an Exact Deterministic Streaming Problem;35
6.2.2;Lower Bound for an Exact Probabilistic Streaming Problem;36
6.2.3;Lower Bound for an Approximative Deterministic Streaming Problem;38
6.2.4;Lower Bound for an Approximative Probabilistic Streaming Problem;39
6.3;Lower Bounds for Hypotheses Verifications;41
7;Most Frequent Item Problem;45
7.1;General Streaming Problem Analysis;45
7.1.1;Exact Deterministic Most Frequent Item Problem;45
7.1.2;Exact Probabilistic Most Frequent Item Problem;55
7.1.3;Approximative Deterministic Most Frequent Item Problem;62
7.1.4;Approximative Probabilistic Most Frequent Item Problem;74
7.1.5;Summary of the General Most Frequent Item Problem;80
7.2;Hypothesis Verification Analysis;82
7.3;Analysis Conclusion;88
8;Number of Distinct Items Problem;89
8.1;General Streaming Problem Analysis;89
8.1.1;Exact Deterministic Number of Distinct Items Problem;89
8.1.2;Exact Probabilistic Number of Distinct Items Problem;93
8.1.3;Approximative Deterministic Number of Distinct Items Problem;95
8.1.4;Approximative Probabilistic Number of Distinct Items Problem;100
8.1.5;Summary of the General Number of Distinct Items Problem;110
8.2;Hypothesis Verification Analysis;112
8.2.1;Verification of All Possible Hypotheses;112
8.2.2;Justification of One Correct Hypothesis;116
8.3;Analysis Conclusion;117
9;Conclusion;119
9.1;Summary;119
9.2;Conclusion;121
9.3;Future Work;121


Text sample:

Chapter 4 Lower Bounds on Space Complexity:

For every streaming problem S, we want to analyze both upper and lower bounds for the achievable space complexity. For the upper bound, one has to find a certain streaming algorithm that solves S. Such an algorithm can afterwards be analyzed with respect to its required space and time complexities. To prove a lower bound on the space complexity, the technique of communication complexity theory is used, which is described by Hromkovic [Hro97] or in the initial paper by Yao [Yao79].
4.1 Introduction to Communication Complexity:

Communication complexity is a study area that tries to identify the required communication for a distributed algorithmic problem. The most common model is the two-computer model, in which two distributed computers have to calculate a certain function value. Both computers have a part of the input value and have unbounded computation power and storage resources, but a limited communication channel to each other. The main question is: How many communication bits do the two computers have to exchange to calculate the function value for a given input value separation? With this model, it is possible to mathematically analyze lower bounds on the communication complexity.
One special type of the communication complexity is the one-way communication. In this model, we have two computers, named as the left one CL and the right one CR, both having a part of the input value. In the one-way communication model, only the left computer is allowed to communicate to the other computer. To evaluate a certain function on the distributed input value, the left computer processes its input value part and transmits some communication. The right computer uses this communication and the second input value part to compute the function.
Obviously, if the left computer transmits its whole input value part, the right computer can easily evaluate the correct output value, as it has unbounded storage and computation resources. For some functions, the output value can be computed with a significantly lower communication complexity. For example, if one wants to identify the highest integer of a certain set of integers as input values, that is distributed among the two computers, the left computer can just communicate its maximal integer and the right computer outputs the correct output value easily. With this approach, we have a space complexity of one integer instead of communicating the whole input value part.
For this model of one-way communication, we can define a fooling set that allows us to prove a certain communication complexity lower bound. The idea of a fooling set is to list some input value candidates, that force any algorithm to use different communications from CL to CR in order to distinguish these input value candidates. This concept of one-way communication between two computers can be transformed to a streaming problem setting where we can prove a certain space complexity lower bound. In the following, we will define two fooling set concepts and prove in the following section why these fooling sets imply a space complexity lower bound for streaming problems.
But first, we have to define the deterministic one-way communication more formally: […].
One can define a fooling set for the one-way communication complexity as the combination of a set of representatives and a test set. The representatives are located at the left computer, the tests at the right one. Such a combination is called a fooling set if, for every two representatives, there is at least one test such that the communication from CL to CR has to be different. This is the case if the correct algorithm has to compute two different output values for these two representatives combined with the same test entry. This communication complexity concept can be used to prove space complexity lower bounds for streaming problems. Formally, a fooling set for a streaming problem is defined as follows [.].
4.2 Lower Bound for General Streaming Problems:

In the following, we show how the techniques of communication complexity can be used for lower bounds on the space complexity of deterministic and probabilistic streaming problems that require an exact result or may allow approximative output values.
Lower Bound for an Exact Deterministic Streaming Problem:

With the following theorem, one can prove space complexity lower bounds for an exact deterministic streaming problem, i.e., the approximation ratio is a = 1 and the success probability is p = 1. Such problems are abbreviated by S1,1.


Raffael Buff, born in 1990, holds a BSc and a MSc in Computer Science from the Swiss Federal Institute of Technology/ETH Zürich. His studies focused on topics such as Machine Learning, Big Data, Approximation Algorithms, Software Engineering and Distributed Systems.
While studying, the author served as a project manager at ETH juniors – a student run consulting agency at ETH Zürich – from 2010 to 2012 and was head of the department IT at ETH juniors from 2011 to 2012. He holds the certificates ITIL Foundation and HERMES Advanced. On a sidenote, he also was Swiss Champion in Online Sudoku in 2006.
Currently, the author works as a Business Consultant for an ICT company in Zürich, Switzerland.



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.