Maloof | Machine Learning and Data Mining for Computer Security | E-Book | www2.sack.de
E-Book

E-Book, Englisch, 210 Seiten

Reihe: Advanced Information and Knowledge Processing

Maloof Machine Learning and Data Mining for Computer Security

Methods and Applications
1. Auflage 2006
ISBN: 978-1-84628-253-9
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark

Methods and Applications

E-Book, Englisch, 210 Seiten

Reihe: Advanced Information and Knowledge Processing

ISBN: 978-1-84628-253-9
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark



'Machine Learning and Data Mining for Computer Security' provides an overview of the current state of research in machine learning and data mining as it applies to problems in computer security. This book has a strong focus on information processing and combines and extends results from computer security. The first part of the book surveys the data sources, the learning and mining methods, evaluation methodologies, and past work relevant for computer security. The second part of the book consists of articles written by the top researchers working in this area. These articles deals with topics of host-based intrusion detection through the analysis of audit trails, of command sequences and of system calls as well as network intrusion detection through the analysis of TCP packets and the detection of malicious executables. This book fills the great need for a book that collects and frames work on developing and applying methods from machine learning and data mining to problems in computer security.

Maloof Machine Learning and Data Mining for Computer Security jetzt bestellen!

Autoren/Hrsg.


Weitere Infos & Material


1;Foreword;7
2;Preface;9
3;List of Contributors;13
4;Contents;15
5;1 Introduction;17
6;Part I Survey Contributions;21
6.1;2 An Introduction to Information Assurance;23
6.1.1;2.1 Introduction;23
6.1.2;2.2 The Security Process;24
6.1.2.1;2.2.1 Protection;24
6.1.2.2;2.2.2 Detection;24
6.1.2.3;2.2.3 Response;25
6.1.3;2.3 Information Assurance;26
6.1.3.1;2.3.1 Security Properties;26
6.1.3.2;2.3.2 Information Location;30
6.1.3.3;2.3.3 System Processes;31
6.1.4;2.4 Attackers and the Threats Posed;32
6.1.4.1;2.4.1 Worker with a Backhoe;33
6.1.4.2;2.4.2 Ignorant Users;33
6.1.4.3;2.4.3 Criminals;33
6.1.4.4;2.4.4 Script Kiddies;34
6.1.4.5;2.4.5 Automated Agents;34
6.1.4.6;2.4.6 Professional System Crackers;35
6.1.4.7;2.4.7 Insiders;35
6.1.5;2.5 Opportunities for Machine Learning Approaches;36
6.1.6;2.6 Conclusion;37
6.2;3 Some Basic Concepts of Machine Learning and Data Mining;39
6.2.1;3.1 Introduction;39
6.2.2;3.2 From Data to Examples;40
6.2.3;3.3 Representations, Models, and Algorithms;43
6.2.3.1;3.3.1 Instance-Based Learning;45
6.2.3.2;3.3.2 Naive Bayes;45
6.2.3.3;3.3.3 Kernel Density Estimation;45
6.2.3.4;3.3.4 Learning Coe.cients of a Linear Function;46
6.2.3.5;3.3.5 Learning Decision Rules;46
6.2.3.6;3.3.6 Learning Decision Trees;47
6.2.3.7;3.3.7 Mining Association Rules;47
6.2.4;3.4 Evaluating Models;48
6.2.4.1;3.4.1 Problems with Simple Performance Measures;51
6.2.4.2;3.4.2 ROC Analysis;52
6.2.4.3;3.4.3 Principled Evaluations and Their Importance;54
6.2.5;3.5 Ensemble Methods and Sequence Learning;55
6.2.5.1;3.5.1 Ensemble Methods;56
6.2.5.2;3.5.2 Sequence Learning;56
6.2.6;3.6 Implementations and Data Sets;58
6.2.7;3.7 Further Reading;58
6.2.8;3.8 Concluding Remarks;59
7;Part II Research Contributions;61
7.1;4 Learning to Detect Malicious Executables;63
7.1.1;4.1 Introduction;63
7.1.2;4.2 Related Work;65
7.1.3;4.3 Data Collection;68
7.1.4;4.4 Classification Methodology;68
7.1.4.1;4.4.1 Instance-Based Learner;69
7.1.4.2;4.4.2 The TFIDF Classi.er;69
7.1.4.3;4.4.3 Naive Bayes;70
7.1.4.4;4.4.4 Support Vector Machines;70
7.1.4.5;4.4.5 Decision Trees;71
7.1.4.6;4.4.6 Boosted Classi.ers;71
7.1.5;4.5 Experimental Design;72
7.1.6;4.6 Experimental Results;72
7.1.6.1;4.6.1 Pilot Studies;72
7.1.6.2;4.6.2 Experiment with a Small Collection;73
7.1.6.3;4.6.3 Experiment with a Larger Collection;73
7.1.7;4.7 Discussion;76
7.1.8;4.8 Concluding Remarks;79
7.2;5 Data Mining Applied to Intrusion Detection: MITRE Experiences;81
7.2.1;5.1 Introduction;81
7.2.1.1;5.1.1 Related Work;82
7.2.1.2;5.1.2 MITRE Intrusion Detection;83
7.2.2;5.2 Initial Feature Selection, Aggregation, Classification, and Ranking;84
7.2.2.1;5.2.1 Feature Selection and Aggregation;85
7.2.2.2;5.2.2 HOMER;86
7.2.2.3;5.2.3 BART Algorithm and Implementation;86
7.2.2.4;5.2.4 Other Anomaly Detection Efforts;89
7.2.3;5.3 Classifier to Reduce False Alarms;90
7.2.3.1;5.3.1 Incremental Classifier Algorithm;90
7.2.3.2;5.3.2 Classifier Experiments;92
7.2.4;5.4 Clustering to Detect Anomalies;94
7.2.4.1;5.4.1 Clustering with a Reference Model on KDD Cup Data;95
7.2.4.2;5.4.2 Clustering without a Reference Model on MITRE Data;97
7.2.5;5.5 Conclusion;97
7.3;6 Intrusion Detection Alarm Clustering;105
7.3.1;6.1 Introduction;105
7.3.2;6.2 Root Causes and Root Cause Analysis;106
7.3.3;6.3 The CLARAty Alarm Clustering Method;108
7.3.3.1;6.3.1 Motivation;108
7.3.3.2;6.3.2 The CLARAty Algorithm;109
7.3.3.3;6.3.3 CLARAty Use Case;111
7.3.4;6.4 Cluster Validation;112
7.3.4.1;6.4.1 The Validation Dilemma;112
7.3.4.2;6.4.2 Cluster Validation in Brief;113
7.3.4.3;6.4.3 Validation of Alarm Clusters;115
7.3.5;6.5 Cluster Tendency;116
7.3.5.1;6.5.1 Test of Cluster Tendency;116
7.3.5.2;6.5.2 Experimental Setup and Results;119
7.3.5.3;6.5.3 Derivation of Probabilities;120
7.3.6;6.6 Conclusion;122
7.4;7 Behavioral Features for Network Anomaly Detection;123
7.4.1;7.1 Introduction;123
7.4.2;7.2 Inter-Flow versus Intra-Flow Analysis;125
7.4.3;7.3 Operationally Variable Attributes;127
7.4.3.1;7.3.1 Size of Normal Value Space;127
7.4.3.2;7.3.2 Data Mining on Operationally Variable Attributes;128
7.4.4;7.4 Deriving Behavioral Features;130
7.4.5;7.5 Authentication Using Behavioral Features;131
7.4.5.1;7.5.1 The Need for Authentication of Server Flows;131
7.4.5.2;7.5.2 Classification of Server Flows;132
7.4.5.3;7.5.3 An Empirical Evaluation;133
7.4.5.4;7.5.4 Aggregate Server Flow Model;133
7.4.5.5;7.5.5 Host-Speci.c Models;135
7.4.5.6;7.5.6 Models from Real Network Trafic;136
7.4.5.7;7.5.7 Classification for Intrusion and Misuse Detection;137
7.4.6;7.6 Related Work;138
7.4.7;7.7 Conclusion;140
7.5;8 Cost-Sensitive Modeling for Intrusion Detection;141
7.5.1;8.1 Introduction;141
7.5.2;8.2 Cost Factors, Models, and Metrics in IDSs;142
7.5.2.1;8.2.1 Cost Factors;142
7.5.2.2;8.2.2 Cost Models;142
7.5.2.3;8.2.3 Cost Metrics;143
7.5.3;8.3 Cost-Sensitive Modeling;144
7.5.3.1;8.3.1 Reducing Operational Cost;144
7.5.3.2;8.3.2 Reducing Consequential Cost;146
7.5.4;8.4 Experiments;146
7.5.4.1;8.4.1 Design;146
7.5.4.2;8.4.2 Measurements;147
7.5.4.3;8.4.3 Results;147
7.5.4.4;8.4.4 Comparison with fcs-RIPPER;151
7.5.5;8.5 Related Work;151
7.5.6;8.6 Conclusion and Future Work;151
7.6;9 Data Cleaning and Enriched Representations for Anomaly Detection in System Calls;153
7.6.1;9.1 Introduction;153
7.6.2;9.2 Related Work;155
7.6.3;9.3 Data Cleaning;156
7.6.3.1;9.3.1 Representation with Motifs and Their Locations;156
7.6.3.2;9.3.2 Unsupervised Training with Local Outlier Factor (LOF);160
7.6.3.3;Automating the Parameters;161
7.6.4;9.4 Anomaly Detection;163
7.6.4.1;9.4.1 Representation with Arguments;163
7.6.4.2;9.4.2 Supervised Training with LERAD;164
7.6.5;9.5 Experimental Evaluations;167
7.6.5.1;9.5.1 Data Cleaning Evaluation Procedures and Criteria;168
7.6.5.2;9.5.2 Anomaly Detection with Arguments Evaluation Procedures and Criteria;169
7.6.5.3;9.5.3 Anomaly Detection with Cleaned Data vs. Raw Data Evaluation Procedures and Criteria;171
7.6.6;9.6 Concluding Remarks;171
7.7;10 A Decision-Theoretic, Semi-Supervised Model for Intrusion Detection;173
7.7.1;10.1 Introduction;173
7.7.2;10.2 Related Work;176
7.7.3;10.3 A New Model of Intrusion Detection;176
7.7.3.1;10.3.1 Generative Data Model;177
7.7.3.2;10.3.2 Inference and Learning;178
7.7.3.3;10.3.3 Action Selection;182
7.7.3.4;10.3.4 Relaxing the Cost Function;183
7.7.4;10.4 Experiments;187
7.7.4.1;10.4.1 Data Set;188
7.7.4.2;10.4.2 Results;189
7.7.5;10.5 Conclusions and Future Work;192
8;References;195
9;Index;215


2 An Introduction to Information Assurance (p. 7)

Clay Shields

2.1 Introduction

The intuitive function of computer security is to limit access to a computer system. With a perfect security system, information would never be compromised because unauthorized users would never gain access to the system. Unfortunately, it seems beyond our current abilities to build a system that is both perfectly secure and useful.

Instead, the security of information is often compromised through technical flaws and through user actions. The realization that we cannot build a perfect system is important, because it shows that we need more than just protection mechanisms. We should expect the system to fail, and be prepared for failures.

As described in Sect. 2.2, system designers not only use mechanisms that protect against policy violations, but also detect when violations occur, and respond to the violation. This response often includes analyzing why the protection mechanisms failed and improving them to prevent future failures.

It is also important to realize that security systems do not exist just to limit access to a system. The true goal of implementing security is to protect the information on the system, which can be far more valuable than the system itself or access to its computing resources.

Because systems involve human users, protecting information requires more than just technical measures. It also requires that the users be aware of and follow security policies that support protection of information as needed.

This chapter provides a wider view of information security, with the goal of giving machine learning researchers and practitioners an overview of the area and suggesting new areas that might benefit from machine learning approaches. This wider view of security is called information assurance.

It includes the technical aspects of protecting information, as well as defining policies thoroughly and correctly and ensuring proper behavior of human users and operators. I will first describe the security process.

I will then explain the standard model of information assurance and its components, and, finally, will describe common attackers and the threats they pose. I will conclude with some examples of problems that fall outside much of the normal technical considerations of computer security that may be amenable to solution by machine learning methods.

2.2 The Security Process

Human beings are inherently fallible. Because we will make mistakes, our security process must reflect that fact and attempt to account for it. This recognition leads to the cycle of security shown in Fig. 2.1. This cycle is really very familiar and intuitive, and is common in everyday life, and is illustrated here with a running example of securing an automobile.

2.2.1 Protection

Protection mechanisms are used to enforce a particular policy. The goal is to prevent things that are undesirable from occurring. A familiar example is securing an automobile and its contents. A car comes with locks to prevent anyone without a key from gaining access to it, or from starting it without the key. These locks constitute the car’s protection mechanisms.

2.2.2 Detection

Since we anticipate that our protection mechanisms will be imperfect, we attempt to determine when that occurs by adding detection mechanisms.



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.