Akanbi / Amiri / Fazeldehkordi | A Machine-Learning Approach to Phishing Detection and Defense | E-Book | www2.sack.de
E-Book

E-Book, Englisch, 100 Seiten

Akanbi / Amiri / Fazeldehkordi A Machine-Learning Approach to Phishing Detection and Defense


1. Auflage 2014
ISBN: 978-0-12-802946-6
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark

E-Book, Englisch, 100 Seiten

ISBN: 978-0-12-802946-6
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark



Phishing is one of the most widely-perpetrated forms of cyber attack, used to gather sensitive information such as credit card numbers, bank account numbers, and user logins and passwords, as well as other information entered via a web site. The authors of A Machine-Learning Approach to Phishing Detetion and Defense have conducted research to demonstrate how a machine learning algorithm can be used as an effective and efficient tool in detecting phishing websites and designating them as information security threats. This methodology can prove useful to a wide variety of businesses and organizations who are seeking solutions to this long-standing threat. A Machine-Learning Approach to Phishing Detetion and Defense also provides information security researchers with a starting point for leveraging the machine algorithm approach as a solution to other information security threats. - Discover novel research into the uses of machine-learning principles and algorithms to detect and prevent phishing attacks - Help your business or organization avoid costly damage from phishing sources - Gain insight into machine-learning strategies for facing a variety of information security threats

O.A. Akanbi received his B. Sc. (Hons, Information Technology - Software Engineering) from Kuala Lumpur Metropolitan University, Malaysia, M. Sc. in Information Security from University Teknologi Malaysia (UTM), and he is presently a graduate student in Computer Science at Texas Tech University His area of research is in CyberSecurity.
Akanbi / Amiri / Fazeldehkordi A Machine-Learning Approach to Phishing Detection and Defense jetzt bestellen!

Weitere Infos & Material


Chapter 2

Literature Review


Abstract


This chapter discusses the various studies spanning across various researches carried out on phishing detection and related works. The chapter is organized as follows: first and foremost, a quick dive-in to the meaning of phishing in details to enlighten the reader on why phishing is an important area of research is given; second, different existing anti-phishing approaches are examined in terms of accuracy and limitations; third, a brief acknowledgment of existing techniques and how these techniques serve as a baseline to our research is presented. Furthermore, their advantages as well as the setbacks experienced in the implementation of these techniques are discussed. Fourth, we discuss the close technicalities of our work as implemented by other researchers in the same domain. This also attributed to the basic knowledge behind the choice of algorithms and approaches used. In addition, the main data preprocessing method used is also introduced in this chapter. In concluding this chapter, a tabulated summary of the most relevant and recent work that served as enlightenment to our cause on the study is also included.

Keywords


phishing
anti-phishing
ensemble
classifier
blacklist
website
email

2.1. Introduction


This chapter primarily reviews the available literature in the field under study. Accordingly, it will account for the definitions of concepts and issues that affect website phishing detection using different techniques and approaches. The first part of this chapter will describe phishing and its various classifications. The second part of this chapter will deal with existing techniques and approaches that are related to detecting phishing websites. The third part discusses three types of classifier designs and their impact on website phishing detection. The fourth and the final part of this chapter reviews earlier works related to phishing detection in websites.

2.2. Phishing


The definition of phishing in this context is essentially not so fixed but can be seen like an indisputable fact that changes with respect to the way in which phishing is carried out. More particularly, the use of email and website are the two methods of phishing. Although there are some differences between this two methods but they both share their goals in common.
In addition, phishing can be said to be an online attack used by perpetrators in committing fraud through social engineering schemes via instant messages, emails, or online advertisement to lure users to phishing websites similar to a legitimate website for gaining confidential information about the victim such as password, financial account, personal identification, and financial account numbers, which can then be used for illegal profit (Liu et al., 2010). As explained by Abbasi and Chen (2009b), phishing websites can be divided into two common types, namely; spoof and concocted websites. Spoof sites are sham replica of existing commercial websites (Dhamija et al., 2006, Dinev, 2006). Commonly spoofed websites include eBay, PayPal, various banking and escrow service providers (Abbasi and Chen, 2009a), and e-tailers. Spoof websites attempt to steal unsuspecting users’ identities; account logins, personal information, credit card numbers, and so forth. (Dinev, 2006). Online phish repositories such as PhishTank maintain URLs for millions of verified spoof websites used in phishing attacks intended to mimic thousands of legitimate entities. Fictitious websites mislead users by attempting to give the impression of unique, legitimate commercial entities such as investment banks, escrow services, shipping companies, and online pharmacies (Abbasi and Chen, 2009b; Abbasi et al., 2012; Abbasi et al., 2010). The aim of fictitious websites is failure-to-ship scam; swindling customers’ of their money without keeping to their own end of the bargain (Chua and Wareham, 2004). Both spoof and concocted websites are also commonly used to propagate malware and viruses (Willis, 2009).
In a personal fraud survey carried out by Jamieson et al. (2012) indicate the percentage of phishing in identity crime reclassification using publicly available data by Australia Bureau of Statistic (ABS) as a case study. The outcome showed that phishing constitutes a fraction of 0.4% which corresponded to 57,800 victims. Figures 2.1 and 2.2 represent the survey information.
Fig. 2.1Identity crime reclassification of ABS (personal crime survey 2008). (Jamieson et al., 2012)
Fig. 2.2Experience of selected personal frauds. (Jamieson et al., 2012)

2.3. Existing anti-phishing approaches


In a study review published by Anti-Phishing Working Group (APWG), there were at least 67, 677 phishing attacks in the last 6 months of 2010 (A.P.W.G, 2010). A lot of research has been done on anti-phishing in designing various anti-phishing approaches. Afroz and Greenstadt (2009), categorized the current phishing detection into three main types: (1) non-content-based approaches that do not make use of site content to classify it as authentic or phishing, (2) content-based approaches that make use of site contents to catch phishing, and (3) visual similarity-based approaches that uses visual similarity with known sites to recognize phishing. These approaches are discussed in subsequent sections.
Other anti-phishing approaches include detecting phishing emails (Fette et al., 2007) (rather than sites) and educating users about phishing attacks and human detection methods (Kumaraguru et al., 2007).

2.3.1. Non-Content-Based Approaches


In a study carried out by Afroz and Greenstadt (2009), it was claimed that non-content-based approaches include URL and host information based classification of phishing sites, blacklisting, and whitelisting methods. In URL-based schemes, URLs are classified on the basis of both lexical and host features. Lexical features describe lexical patterns of malicious URLs. These include features such as length of the URL, the number of dots, special characters it contains. Host features of the URL include properties of IP address, the owner of the site, DNS properties such as TTL, and geographical location (Ma et al., 2009). Using these features, a matrix is built and run through multiple classification algorithms. In real-time processing trials, this approach has success rates between 95% and 99%. According to Afroz and Greenstadt (2009), they used lexical features of URL along with site contents and image analysis to improve performance and reduce false positive cases.
In blacklisting approaches, reports made by users or companies are used to detect phishing websites which are stored in a database. Perhaps the use of this approach by commercial toolbars such as Netcraft, Internet explorer 7, CallingID Toolbar, EarthLink Toolbar, Cloudmark Anti-Fraud Toolbar, GeoTrust TrustWatch Toolbar, Netscape Browser 8.1 has made it very popular amongst other anti-phishing approaches (Afroz and Greenstadt, 2009). Nonetheless, as most phishing sites are temporary and often times exist for less than 20 hours (Moore and Clayton, 2007), or change URLs frequently (fast-flux), the URL blacklisting approach fails to identify majority of phishing incidents. Furthermore, a blacklisting approach will fail to detect an attack that is aim at a specific user (spear-phishing), especially those that aim profitable but not extensively used sites such as small brokerages, company intranets, and so forth (Afroz and Greenstadt, 2009).
Whitelisting approaches seek to identify known good sites (Chua and Wareham, 2004, Close, 2009; Herzberg and Jbara, 2008), but a user must remember to inspect the interface whenever he visits any site. Some whitelisting approaches use server-side validation to add additional authentication metrics (beyond SSL) to client browsers as a proof of its benign nature, For example, dynamic security skins (Kumaraguru et al., 2007), TrustBar (Herzberg and Gbara, 2004), SRD (Synchronized Random Dynamic Boundaries) (Ye et al., 2005).

2.3.2. Content-Based Approaches


According to content-based approach, phishing attacks are detected by investigating site contents. Features used in this approach comprise of password fields, spelling errors, source of the images, links, embedded links, and so forth alongside URL and host-based features. SpoofGuard (Chou et al., 2004) and CANTINA (Zhang et al., 2007) are two examples of content-based approach. In addition, Google’s anti-phishing filter detects phishing and malware by examining page URL, page rank, WHOIS information and contents of a page including HTML, javascript, images, iframe, and so forth (Whittaker et al., 2010). The classifier is frequently retrained with new phishing sites to learn new trends in phishing. This classifier has high accuracy but is presently implemented offline as it takes 76 seconds on average to detect phishing. Some researchers studied fingerprinting and fuzzy logic-based approaches that use a series of hashes of websites to identify phishing sites (Aburrous et al., 2008; Zdziarski et al., 2006). Furthermore, experimentation of Afroz and...



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.