Tuesday, August 20, 2019
Security Issues in Peer-to-peer Networking
Security Issues in Peer-to-peer Networking ACKNOWLEDGEMENTS: The interest in the field of networking, driven me to take the computer networking as my course in M.Sc. there are many different types of networks. Out of them the more popularized and upcoming trend of networks are peer-to-peer networks. This report of my final dissertation for the partial fulfilment of my M.Sc, computer networking, would not have been possible without the support of my supervisor, Mr. Harry Benetatos. He helped me a lot by guiding me and pin-pointing the key mistakes which I have done during my research. My course leader Mr. Nicholas Ioannides also helped me a lot to complete this dissertation. His advises and suggestions gave me a lot of encouragement and support which made me do this research and finish it in time. I am very thankful to my university, LONDON METROPOLITAN UNIVERSITY which provided me the free access to the IEEE library which helped me to find the key papers which are very useful for my research. I also thank my parents for their support given to me in all walks of my life. DEDICATION: I dedicate this report to my parents and my well wisher Sakshi for their constant support and encouragement throughout my education and life. CHAPTER 1 PROJECT INTRODUCTION 1.1 INTRODUCTION TO THE PROJECT: This dissertation is all about the security issues in the peer-to-peer networks. There are many security issues in peer-to-peer networks. I have chosen to do research on worm intrusions in peer-to-peer networks. In this document I have mentioned how the worm propagates in the network from one peer to another peer, how the worm can be detected and how the detected worm can be attacked and save the network from getting infected. 1.2 AIM: Security issue in Peer-to-peer networks: Securing the peer-to-peer network from worms.à 1.3 OBJECTIVES: ÃË To understand how the peers communicate with each other in the peer-to-peer network ÃË To analyse the propagation of worms in the network. ÃË To detect the worms near the nodes of the network ÃË To defence the worms in the network. 1.4 RESEARCH QUESTION: This document briefly discusses about how the worms propagates in the network and how can it be detected and attacked in order to save the peer-to-peer network 1.5 APPROACH: My approach for this dissertation is as follows: ÃË Understanding peer-to-peer networks ÃË Defining the problem ÃË Data collection and analysis ÃË Study and understanding the existing solutions for the problem ÃË Comparing different solutions ÃË conclusion 1.6 METHODOLOGY: This section of my document contains what important steps to be followed in order to achieve the mentioned objectives. It also helps to schedule how to develop and complete different parts of the dissertation. In this dissertation firstly I will study and understand about the peer-to-peer networks and how the peers in the networks communicate and share information with the remaining peer in the network. Then I do research on how the worm propagates in the network, how can the worm be detected and how the detected worm can be attacked and restore the network.à In the pictorial form the different stages of my dissertation are 1.7 PREVIEW ABOUT THE COMING CHAPTERS IN THE REPORT: The rest of the report is organised as follows: in the chapter 2, there is brief discussion about the peer-to-peer networks, different types of peer-to-peer networks, advantages and disadvantages of the peer-to-peer networks. There is also some information about the worms, its nature and different types of worms. In chapter 3, there is a discussion about the methods given by the different person to detect the worm in the network by the method of matching the characteristic string of the worm. In section 4, there is a solution for this issue. That is mathematical method of detecting the worm in the network and defending it. Chapter 5 consists of a critical appraisal and suggestions for the further work. Finally, I concluded in chapter 6. CHAPTER 2 OVERVIEW OF THE GENERIC AREA AND IDENTIFICATION OF PROBLEM: 2.1 NETWORK: Network is a group of electronic devices which are connected to each other in order to communicate which each other.à The devices can be computers, laptops, printers etc. networks can be wired or wireless. Wired networks are networks in which the devices are connected with the help of wires. Wireless networks are the networks in which the devices are connected without the wires. There are many different types of networks and peer-to-peer is one of the important and special types of networks. 2.2 PEER-TO-PEER NETWORKS: Peer-to-peer networks are emerged in 1990 because of the development of the peer-to-peer file sharing like Napster [1].à Peer-to-peer networks abbreviated as p2p networks are the networks in which all the nodes or peers in the network acts as servers as well as clients on demand. This is unlike typical client server model, in which the clients requests the services and server supplies the resources. But in case of peer-to-peer networks every node in the networks requests services like a client and every node will supply the resources like server on demand. Peer-to-peer network doesnââ¬â¢t need any centralized server coordination.à Peer-to-peer network is scalable. Addition of new nodes to the network or removal of already existing nodes on the network doesnââ¬â¢t affect the network. That means addition or removal of nodes can be done dynamically. All the nodes connected in a peer-to-peer network run on the same network protocol and software. Resources available on a node in the network are available to the remaining nodes of the network and they can access this information easily. Peer-to-peer networks provide robustness and scalability. All the wired and wireless networks can be configured as peer-to-peer networks. Home networks and small enterprise networks are preferable to configure in a peer-to-peer networks. Most the networks are not pure peer-to-peer networks because of they use some network interface devices. In the beginning, the information is stored at all the nodes by making a copy of it. But this increases the flow of traffic in the network. But now, a centralised system is maintained by the network and the requests are directed to the nodes which contains the relevant information. This will save the time and the traffic flow in the network. 2.3 WIRELESS NETWORKS: Devices connected to each other without any wires can also be configured like peer-to-peer networks. In a case of small of number of devices it is preferable to configure the network in wireless peer-to-peer networks because it will be easy to share the data in both the directions. It is even cheaper to connect the networks in wireless peer-to-peer because we do not need to spend on the wires. Peer-to-peer networks are divided into three types. They are: Instant messaging networks Collaborative networks Affinity community networks[2] Instant messaging networks: In this type of peer-to-peer networks, the users can chat with each other in real time by installing some software such as MSN messenger, AOL instant messenger etc. Collaborative networks: This type of peer-to-peer networks are also called as distributed computing.à This is widely used in the field of science and biotechnology where the intense computer processing is needed. Affinity community peer-to-peer networks: It is a type of p2p network, where the group of devices are connected only for the purpose of sharing the data among them. Peer to peer networks are basically classified into two types. They are: ÃË Structured peer-to-peer networks ÃË Unstructured peer-to-peer networks 2.4 STRUCTURED PEER-TO-PEER NETWORKS: In the structured peer-to-peer nodes connected in the network are fixed. They use distributed hashing table (DHT) for indexing [4]. In DHT data is stored in the form of hash table like (key, value). Any node willing to retrieve the data can easily do that using the keys. The mapping of values to the keys are maintained by all the nodes present in the network such that there will be very less disruption in case of change in the set of participants DHT-based networks are very efficient in retrieving the resources. 2.5 UNSTRUCTURED PEER-TO-PEER NETWORKS: In unstructured p2p network nodes are established arbitrarily. There are three types of unstructured p2p networks. They are Pure peer-to-peer Hybrid peer-to-peer Centralized peer-to-peer In Pure p2p networks all the nodes in the network are equal. There wonââ¬â¢t be any preferred node with special infrastructure function. In hybrid p2p networks there will be a special node called ââ¬Å"supernodesâ⬠[3] . This supernode can be any node in the network depending on the momentary need of the network. Centralized p2p network is a type of hybrid network in which there will be one central system which manages the network. The network cannot be able to work without this centralized system Basically, all the nodes in the peer-to-peer networks contain the information of the neighbour in its routing table. The rate of propagation of worms in the peer-to-peer networks is larger than compared to the other networks. This is because the information of the neighbour peers can easily achieved from the routing table of the infected node. Different types of files are shared between the nodes in the peer-to-peer networks. These files can be the audio files, video files, music files, text documents, books; articles etc. there are a lot of peer-to-peer software available these days in the market for sharing the files. Some of them are bittorrent, limeware, shareaza, kazaa, Imesh, bearshare Lite, eMule, KCeasy, Ares Galaxy, Soulseek, WinMX, Piolet, Gnutella, Overnet, Azureus (vuze), FrostWire, uTorrent, Morpheus, Ants, Acquisition[5]. There are lot more file sharing softwares in the market but these are the top 20 file sharing softwares for peer-to-peer networks. Basically, all the nodes connected together in the network should configure with the same network protocol and the same software should be installed in all the nodes in order to communicate with each other. Else the nodes in the network cannot communicate if they are configured with the different software or protocol. 2.6 ADVANTAGES OF PEER-TO-PEER NETWORKS [6]: It is more useful for the small business network comprising of very small number of computer systems or devices. Computers in this network can be configured easily. Full time network administrator is not required for the p2p networks. Easy maintenance of the network. Only a single operating system and less number of cables needed to get connected Can be installed easily Users can control the shared resources Distributed nature of the network increases the robustness of the network. 2.7 DISADVANTAGES OF THE PEER-TO-PEER NETWORKS [12]: No centralised administration Back-up should be performed on the each computer individually. Peer-to-peer networks are not secure Every computer in the network behaves as server and client which can slow down the performance of the system Legal controversy with the copyrights. 2.8 WORM: Worm is a computer malware program or it can be called as a mischievous code which can multiple itselfà into several replicas or it duplicate itself into several copies. Worm in simple can be called as ââ¬Å"autonomous intrusion agentâ⬠[19] .It doesnââ¬â¢t actually alters the function of the system but it pass through i.e., worm is unlike virus.à It intrudes the network without the mediation of the user. This is first detected by Robert T Morris in 1988[18]. Today we have some billions of systems connected to internet. Bu during 1988 there were only 60,000 systems connected to the internet. During that period 10% of the internet systems i.e., 6000 of the systems are infected and almost clogged because of the worms [8]. Worms when enters the system it hides in the operating system where it cannot be noticeable [18] . It drastically slows down the system the effect the other programs in the system. In worst cases it could even effect the entire network and slow down the internet across whole world. As it is said earlier that it replicates itself into multiple copies and attach itself to the emails and corrupt them and sometimes deleting the file without the user interaction. If it enters our email, it can able to send itself to all the contacts in our email book and then to all the contacts of the emails of our email book and likewise it propagates, grow and spread at the higher rate. Worms will even create the ââ¬Å"backdoorâ⬠into the computer [11]. This will make the attackers to send spam easily. Some famous worms discovered in 2003 and 2004 are ââ¬Å"Mydoomâ⬠, ââ¬Å" Sobigâ⬠and ââ¬Å"Sasserâ⬠[7].à ââ¬Å"Sasserâ⬠worm has recently affected the computers which are using Windows 2000 or Windows XP operating system. It restarts the system automatically and crashes it. It is spread to all the nodes in the network. There are some worms which are unlike the normal worms. These worms are very useful to the user some times. Hence, these are called the ââ¬Å"helpful wormsâ⬠[9]. Sometimes they help users without the interaction with the user. But most of the known worms are harmful and will always tries to infect the nodes in the network and affect the performance of the network. When the peer-to-peer networks are attacked by the worms, it slows down the efficiency of the network. So there is a need to save the networks from entering into the network and spreading itself all over the network. The worms should be detected and defended. If we delay in defending these worms, they replicate itself and makes many copies of itself and spread all through the network. This is very dangerous to the network as it affects the performance and efficiency of the network [10]. CHAPTER 3 RELEVANT WORK DONE BY OTHERS IN ORDER TO SOLVE THE PROBLEM: Many people proposed solutions to this problem. First Zhou L gave solution to p2p worm and he observed that propagation of worm in p2p network is very speed when compared to other networks[13] . Jayanthkumar performed some simulations on worm propagation from infected node to other node[10]. Wei yu researched on the behaviour of worms in p2p networks[14]. In my research I found one more interesting method of detecting the worms in the peer-to-peer network. This is indeed a special method of detecting the worms in network because the authors Yu Yao, Yong Li, Fu-xiang Gao, Ge Yu in their paper titled ââ¬Å"A Signature-behaviour-based P2P worm detection approachâ⬠they proposed a mechanism of detecting the known worms in the peer-to-peer networks based on characteristic string matching. Worm make use of vulnerabilities in the network and +Spreads[15]. They also proposed the detection mechanism for the unknown worms based on their behaviour. They technique mainly consists of the te chnology of characteristic string matching, identifying the application and the unknown worm detection technology. They have given the algorithm for the matching the characteristics string of the worm called suffix-tree algorithm- suffix array algorithm. This is efficient and simple with very less time complexity. As peer-to-peer network follows fragment transfer technique there is chance of assigning the characteristics string of the worm to the other blocks of data. And again during the reorganisation process this characteristic string can identify the worm. These authors even validated their results by simulation. They proved that their method is also one of the efficient methods of p2p worm detection. As mentioned above this method detects the known worm and also the unknown worms based on characteristic string matching and their behaviour respectively. In this method they initially capture the network packets using the library function called ââ¬Å"LibPcapâ⬠. ââ¬Å"LibPcapâ⬠is the library function that captures the network packets in UNIX and Linux platforms. This function contains many functions that will be useful for capturing the network packets. After capturing the data packets with help of these functions the non-P2P packets are filtered out. So now the P2P packets are filtered. In these P2P packets the known worms are detected by using the characteristic string matching. This is implemented by the couple of algorithms. They are the ââ¬Å"suffix array algorithmâ⬠and the ââ¬Å"dichotomy algorithmâ⬠. These algorithms are very accurate and are capable of detecting the worms in very less time. As I mentioned above peer-to-peer networks follow fragment transfer mechanism. Hence the characteristic string of the worm can be assigned to the other blocks of data. So, in this situation it is difficult to detect the worm if the characteristic string of the worm is based on the single packet. But if the characteristic string is present in the block then there is a chance of detecting the worm because it will assign it to the two packets. At this time the worm characteristic string present in the two different data packets need to restructure. After restructuring, the worm can be detected by using the matching mechanism. In this way the known worm in the network is detected by using the characteristic string matching. The unknown worms in the p2p network can be detected with the help of the act characteristics of the worm at the initial stage of its propagation. This can be called as the behaviour based detection of the unknown p2p worms. Like this all the known and unknown worms in the network are detected. 3.1 P2P KNOWN WORM DETECTION: There are four steps in detecting the p2p known worms. They are: Deal flow Technology of identifying the application Characteristic string matching Reorganising the characteristic string 3.1.1 DEAL FLOW: In this step of deal flow the flow of data is divided into four steps[16]. Step 1: Extracting the p2p data stream from the original data stream. Step 2: check the extracted p2p data stream for worms using characteristic string matching with the worms already existing in the library function. Step 3: data is flow is reorganised. It now contains worm characteristic string as well. Go to step 2. Step 4: check the data flow for unknown worms using unknown worm detection techniques. After performing the four steps update the library function. All the four steps is representedà pictorially as in the next page. Figure 4: flow chart representing four steps to detect worms à yes à normalà à Normal noà à à à Abnormal abnormal 3.1.2 TECHNOLOGY OF IDENTIFYING THE APPLICATION: As said earlier, this paper uses the method of capturing the data packets and sca it for the worms which are known with the help of a function library called ââ¬Å"LibPcapâ⬠[17] . For this there should be already some assigned rules in the network interface devices. So assigning these rules to those devices is done in stepwise procedure as: Identify the available network interface devices Open the network interface device Compile the rules that we are willing to attach to the devices Setup the rules of filtering to the device Now operate the equipment Start the process of capturing the packets There are some rules for identifying the p2p application. They are: Characteristic information of the known p2p is used Sometimes, if source-destination IP pairs donââ¬â¢t use the known P2P and they may use TCP and UDP at same time, then they are p2p. At a particular time source pairs {srcIP, srcport}[27] and the destination pairs {dstIP, dstport}[27] are checked Here we can identify whether itââ¬â¢s a p2p or not. If the number of connection port is equal to the number of connection IP, then we can say that it is a p2p. There are the situations where these rules have been used unruly. So the there were some amendments made to these rules. The amendments are rule (2) can identify even the mazes which are present and rule (3) is modified in such a way that in the detect cycle {srcIP, srcport}[27] pairs at the source and the {dstIP,à dstport }[27] pairs at the destination are checked. From this they derived that if the number of connection port is equal to the number of connection IP, the protocols which are used are same. If they are different then the protocols are different. 3.1.3 CHARACTERISTIC STRING MATCHING: This is the most important section of the paper. Here authors have given some definitions to the terms which we are going to use, the algorithms which we are going to use to detect the worm. Couple of algorithms are mentioned. They are suffix-array algorithm and the dichotomy algorithm. So the entire process of detecting the worm depends on the efficiency and the accuracy of these algorithms. First of all before using and understanding suffix-array algorithm we will try to understand some keywords and rules. Suffix: suffix is the part of a string or a substring which starts at a particular location to the end of the string. If a suffix in the string S starts at the location ââ¬Ëiââ¬â¢ to the end of the string S, then the suffix can be represented as Suffix(i)=S[i,Len(S) ][27] . Let us understand how the strings can be compared. The comparison in this paper followed ââ¬Å"dictionary comparisonâ⬠If u and v are the two different strings. Comparing the strings u and v is same like comparing u[i] and v[i], where ââ¬Ëiââ¬â¢ starts with the value 1. ÃË Here string u is equal to string v i.e., u=v when u[i]=v[i] ÃË String u is greater then string v i.e., u>v when u[i]>v[i] ÃË String u is less than string v i.e., u But the results were still not obtained for i>len(u) or i>len(v) Also if len(u)>len(v) then u >v, if len(u) Suffix-array: suffix-array is denoted by SA. It is a one-dimensional array. It is an array of SA[1], S[2], SA[3],â⬠¦. And so on. Here s[i] Rank-array: rank-array is nothing but SA-1. If SA[i]=j, then Rank[j]=i. we can say that the rank[i] saves the rank of Suffix(i) in an ascending order for all the suffixes. In this paper the author has taken the example of string ââ¬Å"scienceâ⬠and explained everything clearly. The string ââ¬Å"scienceâ⬠can generate seven suffixes. They are: Suffix(1): science Suffix(2): cience Suffix(3): ience Suffix(4): ence Suffix(5): nce Suffix(6): ce Suffix(7): e When we sort out everything in a dictionary order it will be in the order as follow Suffix(6)= ce Suffix(2)= cience Suffix(7)= e Suffix(4)= ence Suffix(3)= ience Suffix(5)= nce Suffix(1)= science Suffix-array algorithm follows multiplier ideas. Firstly get SA1 and Rank1 by comparing every character in the string. Comparing string is similar to comparing the every character sequentially. So by comparing every character, SA1 and Rank1 can derive SA2 and Rank2. And this SA2 and Rank2 will derive SA4 and Rank4. And this will again derive SA8 and Rank8. So finally suffix-array and rank-array are derived from this process. The main process of the suffix-array algorithm is ÃË Calculating SA1 and Rank1. Firstly all the suffixes are arranged in the first letter order and then suffix-array (SA1) is generated by using quick sorry algorithm and then Rank1 is also generated. ÃË Comparing 2k-prefix Suffix(i) and Suffix(j) using SAk and Rankk. 2k-Suffix(i) = 2k-Suffixes(j), this is equivalent to Rankk[SAk[i]] = Rankk[SAk[j]] and Rankk[SAk[i+k]] = Rankk[SAk[j+k]] 2k-Suffix(i) Suffix-array algorithm is a sorting algorithm which sorts out the characteristic string. So, this uses binary search algorithm. The algorithm follows Step 1: in the first step values are assigned like left=1, right=n and max_match=0 Step 2: the middle value i.e., mid= (left +right)/2. Step 3: comparing the characters corresponding to Suffix (SA[mid]) and P. the longest public prefix r can be helpful in implantation and comparison. If r > max_match, then max_match=r. Step 4: if Suffix(SA[mid]) à If Suffix(SA[mid])>P, then right=mid-1 à If Suffix(SA[mid])=P, then go to step 6 Step 5: if left Step 6: if max_match= m, then print ââ¬Å"match is successfulâ⬠. 3.1.4 REORGANISING THE CHARACTERISTIC STRING: In this step the characteristic string is reorganised. If the character string is divided into two different data blocks, then the data block with the partial characteristic string is stored. Basically, all the information about the data block like index, beginning offset, length of the block and so on are contained at the head of the each block. Here a structure piece is defined which consists of index of the block, beginning offset of the block offset, length of the character array head and the length of the character array end[18]. Initially each and every data packet is compared with the characteristic string for matching. If it is matched then the warning or an alert is sent to all the users about the worm. Here if the tail of the characteristic string of the worm matches with the head of the data block, then it will be stored in the character array end. And if the head of the characteristic string of the worm matches with the tail of the data block then it is stored in the corr esponding character array head. Suppose if the neighbouring data block contains a partial characteristic string of the worm then the neighbour string in the array head as well as in the end will be reorganised. Now this reorganised string will again perform the characteristic string matching and if any worm is detected then again the warning is sent to all users saying that the worm have found. If it is not matched then it wonââ¬â¢t perform any operation. If in a case that the characteristic string is present in the block but is divided into two different data packets, then a special term called ââ¬Å"character arrayâ⬠is introduced. First the matching mechanism is performed in both the data packet. If the matching characteristic string is found then the warning is sent to the users that there is a worm present. But if only part of the characteristic string is found then it will be enough if it meets some of the requirements like the head of the data packet should match wit h the tail of the characteristic string or the tail of the data packet should match with the head of the characteristic string. But if these conditions are not satisfied then no operation is performed. Now, if the tail of the data packet contains the partial characteristic string then the data packet is stored in the array. If the length of the characteristic string is m, then the Array[m] is set as ââ¬â¢Ã¢â¬â¢. And if the head of the data packet contains a part of the characteristic string then that data packet is stored in the n consecutive units of array. Finally, this array will be the characteristic string matching and if the worm is detected then the warning is sent to all the users. If it is not matched then nothing is done. 3.2 DETECTING UNKNOWN P2P WORM: In the above section we have seen how the known worm is detected. But that algorithm or mechanism are meant to detect the unknown p2p worms. So here in this section we will understand how the unknown worms can be detected and restrain the network. As we know in p2p networks a node can able to send the information to multiple hosts at a same time. Anyhow same protocol is used by all the nodes in the network[27]. These characteristics of the network helps worm to propagate easily. As we discussed above, only the known worms can be detected by using the characteristic string matching method. Here we will see how the unknown worms can be detected. The unknown worms are detected based on the behaviour of the node. Some of the detection rules are: same content files are transferred to multiple hosts in a very short time. Same protocol is used and the destination port is same. If these rules are satisfies by the source port then it allows the p2p worm to propagate. Now, it is necessary to e xtract the characteristics of worm near the worm propagation nodes. When these characteristics are extracted, they are added to the feature library. This data similarity comparison and extracting the characteristics are done using the LCSeq algorithm. But the LCSeq algorithm based on generalized suffix tree (GST) is the more efficient. The overall idea is that all the suffixes are represented as a tree. And this tree will have some characteristics like: ÃË Every node in a tree is a string and root is the empty string ÃË Every suffix can be represented as a path from the root. ÃË Every substring can be considered as a prefix of a suffix. ÃË To achieve the searching public sub sequence, every node should be set the information of its subordinate source string. 3.3 EXPERIMENT: We know that the worm body tries to infect the other nodes in the network by sending the worm to the specific ports of p2p node. So here the author tried to prove the efficiency of his method by performing an experiment. In this experiment he prepared a multiple group worm body and sent it repeatedly at regular intervals of time. Then he captured these packets and extracted their characteristics and compared it with the one that already exist in the feature library. P2p worm is detected separately using different algorithms like BF algorithm, KMP algorithm and suffix-array algorithm and compared their results doing three experiments. In the experiment 1, worm characteristics are in the same packet.. in the experiment Security Issues in Peer-to-peer Networking Security Issues in Peer-to-peer Networking ACKNOWLEDGEMENTS: The interest in the field of networking, driven me to take the computer networking as my course in M.Sc. there are many different types of networks. Out of them the more popularized and upcoming trend of networks are peer-to-peer networks. This report of my final dissertation for the partial fulfilment of my M.Sc, computer networking, would not have been possible without the support of my supervisor, Mr. Harry Benetatos. He helped me a lot by guiding me and pin-pointing the key mistakes which I have done during my research. My course leader Mr. Nicholas Ioannides also helped me a lot to complete this dissertation. His advises and suggestions gave me a lot of encouragement and support which made me do this research and finish it in time. I am very thankful to my university, LONDON METROPOLITAN UNIVERSITY which provided me the free access to the IEEE library which helped me to find the key papers which are very useful for my research. I also thank my parents for their support given to me in all walks of my life. DEDICATION: I dedicate this report to my parents and my well wisher Sakshi for their constant support and encouragement throughout my education and life. CHAPTER 1 PROJECT INTRODUCTION 1.1 INTRODUCTION TO THE PROJECT: This dissertation is all about the security issues in the peer-to-peer networks. There are many security issues in peer-to-peer networks. I have chosen to do research on worm intrusions in peer-to-peer networks. In this document I have mentioned how the worm propagates in the network from one peer to another peer, how the worm can be detected and how the detected worm can be attacked and save the network from getting infected. 1.2 AIM: Security issue in Peer-to-peer networks: Securing the peer-to-peer network from worms.à 1.3 OBJECTIVES: ÃË To understand how the peers communicate with each other in the peer-to-peer network ÃË To analyse the propagation of worms in the network. ÃË To detect the worms near the nodes of the network ÃË To defence the worms in the network. 1.4 RESEARCH QUESTION: This document briefly discusses about how the worms propagates in the network and how can it be detected and attacked in order to save the peer-to-peer network 1.5 APPROACH: My approach for this dissertation is as follows: ÃË Understanding peer-to-peer networks ÃË Defining the problem ÃË Data collection and analysis ÃË Study and understanding the existing solutions for the problem ÃË Comparing different solutions ÃË conclusion 1.6 METHODOLOGY: This section of my document contains what important steps to be followed in order to achieve the mentioned objectives. It also helps to schedule how to develop and complete different parts of the dissertation. In this dissertation firstly I will study and understand about the peer-to-peer networks and how the peers in the networks communicate and share information with the remaining peer in the network. Then I do research on how the worm propagates in the network, how can the worm be detected and how the detected worm can be attacked and restore the network.à In the pictorial form the different stages of my dissertation are 1.7 PREVIEW ABOUT THE COMING CHAPTERS IN THE REPORT: The rest of the report is organised as follows: in the chapter 2, there is brief discussion about the peer-to-peer networks, different types of peer-to-peer networks, advantages and disadvantages of the peer-to-peer networks. There is also some information about the worms, its nature and different types of worms. In chapter 3, there is a discussion about the methods given by the different person to detect the worm in the network by the method of matching the characteristic string of the worm. In section 4, there is a solution for this issue. That is mathematical method of detecting the worm in the network and defending it. Chapter 5 consists of a critical appraisal and suggestions for the further work. Finally, I concluded in chapter 6. CHAPTER 2 OVERVIEW OF THE GENERIC AREA AND IDENTIFICATION OF PROBLEM: 2.1 NETWORK: Network is a group of electronic devices which are connected to each other in order to communicate which each other.à The devices can be computers, laptops, printers etc. networks can be wired or wireless. Wired networks are networks in which the devices are connected with the help of wires. Wireless networks are the networks in which the devices are connected without the wires. There are many different types of networks and peer-to-peer is one of the important and special types of networks. 2.2 PEER-TO-PEER NETWORKS: Peer-to-peer networks are emerged in 1990 because of the development of the peer-to-peer file sharing like Napster [1].à Peer-to-peer networks abbreviated as p2p networks are the networks in which all the nodes or peers in the network acts as servers as well as clients on demand. This is unlike typical client server model, in which the clients requests the services and server supplies the resources. But in case of peer-to-peer networks every node in the networks requests services like a client and every node will supply the resources like server on demand. Peer-to-peer network doesnââ¬â¢t need any centralized server coordination.à Peer-to-peer network is scalable. Addition of new nodes to the network or removal of already existing nodes on the network doesnââ¬â¢t affect the network. That means addition or removal of nodes can be done dynamically. All the nodes connected in a peer-to-peer network run on the same network protocol and software. Resources available on a node in the network are available to the remaining nodes of the network and they can access this information easily. Peer-to-peer networks provide robustness and scalability. All the wired and wireless networks can be configured as peer-to-peer networks. Home networks and small enterprise networks are preferable to configure in a peer-to-peer networks. Most the networks are not pure peer-to-peer networks because of they use some network interface devices. In the beginning, the information is stored at all the nodes by making a copy of it. But this increases the flow of traffic in the network. But now, a centralised system is maintained by the network and the requests are directed to the nodes which contains the relevant information. This will save the time and the traffic flow in the network. 2.3 WIRELESS NETWORKS: Devices connected to each other without any wires can also be configured like peer-to-peer networks. In a case of small of number of devices it is preferable to configure the network in wireless peer-to-peer networks because it will be easy to share the data in both the directions. It is even cheaper to connect the networks in wireless peer-to-peer because we do not need to spend on the wires. Peer-to-peer networks are divided into three types. They are: Instant messaging networks Collaborative networks Affinity community networks[2] Instant messaging networks: In this type of peer-to-peer networks, the users can chat with each other in real time by installing some software such as MSN messenger, AOL instant messenger etc. Collaborative networks: This type of peer-to-peer networks are also called as distributed computing.à This is widely used in the field of science and biotechnology where the intense computer processing is needed. Affinity community peer-to-peer networks: It is a type of p2p network, where the group of devices are connected only for the purpose of sharing the data among them. Peer to peer networks are basically classified into two types. They are: ÃË Structured peer-to-peer networks ÃË Unstructured peer-to-peer networks 2.4 STRUCTURED PEER-TO-PEER NETWORKS: In the structured peer-to-peer nodes connected in the network are fixed. They use distributed hashing table (DHT) for indexing [4]. In DHT data is stored in the form of hash table like (key, value). Any node willing to retrieve the data can easily do that using the keys. The mapping of values to the keys are maintained by all the nodes present in the network such that there will be very less disruption in case of change in the set of participants DHT-based networks are very efficient in retrieving the resources. 2.5 UNSTRUCTURED PEER-TO-PEER NETWORKS: In unstructured p2p network nodes are established arbitrarily. There are three types of unstructured p2p networks. They are Pure peer-to-peer Hybrid peer-to-peer Centralized peer-to-peer In Pure p2p networks all the nodes in the network are equal. There wonââ¬â¢t be any preferred node with special infrastructure function. In hybrid p2p networks there will be a special node called ââ¬Å"supernodesâ⬠[3] . This supernode can be any node in the network depending on the momentary need of the network. Centralized p2p network is a type of hybrid network in which there will be one central system which manages the network. The network cannot be able to work without this centralized system Basically, all the nodes in the peer-to-peer networks contain the information of the neighbour in its routing table. The rate of propagation of worms in the peer-to-peer networks is larger than compared to the other networks. This is because the information of the neighbour peers can easily achieved from the routing table of the infected node. Different types of files are shared between the nodes in the peer-to-peer networks. These files can be the audio files, video files, music files, text documents, books; articles etc. there are a lot of peer-to-peer software available these days in the market for sharing the files. Some of them are bittorrent, limeware, shareaza, kazaa, Imesh, bearshare Lite, eMule, KCeasy, Ares Galaxy, Soulseek, WinMX, Piolet, Gnutella, Overnet, Azureus (vuze), FrostWire, uTorrent, Morpheus, Ants, Acquisition[5]. There are lot more file sharing softwares in the market but these are the top 20 file sharing softwares for peer-to-peer networks. Basically, all the nodes connected together in the network should configure with the same network protocol and the same software should be installed in all the nodes in order to communicate with each other. Else the nodes in the network cannot communicate if they are configured with the different software or protocol. 2.6 ADVANTAGES OF PEER-TO-PEER NETWORKS [6]: It is more useful for the small business network comprising of very small number of computer systems or devices. Computers in this network can be configured easily. Full time network administrator is not required for the p2p networks. Easy maintenance of the network. Only a single operating system and less number of cables needed to get connected Can be installed easily Users can control the shared resources Distributed nature of the network increases the robustness of the network. 2.7 DISADVANTAGES OF THE PEER-TO-PEER NETWORKS [12]: No centralised administration Back-up should be performed on the each computer individually. Peer-to-peer networks are not secure Every computer in the network behaves as server and client which can slow down the performance of the system Legal controversy with the copyrights. 2.8 WORM: Worm is a computer malware program or it can be called as a mischievous code which can multiple itselfà into several replicas or it duplicate itself into several copies. Worm in simple can be called as ââ¬Å"autonomous intrusion agentâ⬠[19] .It doesnââ¬â¢t actually alters the function of the system but it pass through i.e., worm is unlike virus.à It intrudes the network without the mediation of the user. This is first detected by Robert T Morris in 1988[18]. Today we have some billions of systems connected to internet. Bu during 1988 there were only 60,000 systems connected to the internet. During that period 10% of the internet systems i.e., 6000 of the systems are infected and almost clogged because of the worms [8]. Worms when enters the system it hides in the operating system where it cannot be noticeable [18] . It drastically slows down the system the effect the other programs in the system. In worst cases it could even effect the entire network and slow down the internet across whole world. As it is said earlier that it replicates itself into multiple copies and attach itself to the emails and corrupt them and sometimes deleting the file without the user interaction. If it enters our email, it can able to send itself to all the contacts in our email book and then to all the contacts of the emails of our email book and likewise it propagates, grow and spread at the higher rate. Worms will even create the ââ¬Å"backdoorâ⬠into the computer [11]. This will make the attackers to send spam easily. Some famous worms discovered in 2003 and 2004 are ââ¬Å"Mydoomâ⬠, ââ¬Å" Sobigâ⬠and ââ¬Å"Sasserâ⬠[7].à ââ¬Å"Sasserâ⬠worm has recently affected the computers which are using Windows 2000 or Windows XP operating system. It restarts the system automatically and crashes it. It is spread to all the nodes in the network. There are some worms which are unlike the normal worms. These worms are very useful to the user some times. Hence, these are called the ââ¬Å"helpful wormsâ⬠[9]. Sometimes they help users without the interaction with the user. But most of the known worms are harmful and will always tries to infect the nodes in the network and affect the performance of the network. When the peer-to-peer networks are attacked by the worms, it slows down the efficiency of the network. So there is a need to save the networks from entering into the network and spreading itself all over the network. The worms should be detected and defended. If we delay in defending these worms, they replicate itself and makes many copies of itself and spread all through the network. This is very dangerous to the network as it affects the performance and efficiency of the network [10]. CHAPTER 3 RELEVANT WORK DONE BY OTHERS IN ORDER TO SOLVE THE PROBLEM: Many people proposed solutions to this problem. First Zhou L gave solution to p2p worm and he observed that propagation of worm in p2p network is very speed when compared to other networks[13] . Jayanthkumar performed some simulations on worm propagation from infected node to other node[10]. Wei yu researched on the behaviour of worms in p2p networks[14]. In my research I found one more interesting method of detecting the worms in the peer-to-peer network. This is indeed a special method of detecting the worms in network because the authors Yu Yao, Yong Li, Fu-xiang Gao, Ge Yu in their paper titled ââ¬Å"A Signature-behaviour-based P2P worm detection approachâ⬠they proposed a mechanism of detecting the known worms in the peer-to-peer networks based on characteristic string matching. Worm make use of vulnerabilities in the network and +Spreads[15]. They also proposed the detection mechanism for the unknown worms based on their behaviour. They technique mainly consists of the te chnology of characteristic string matching, identifying the application and the unknown worm detection technology. They have given the algorithm for the matching the characteristics string of the worm called suffix-tree algorithm- suffix array algorithm. This is efficient and simple with very less time complexity. As peer-to-peer network follows fragment transfer technique there is chance of assigning the characteristics string of the worm to the other blocks of data. And again during the reorganisation process this characteristic string can identify the worm. These authors even validated their results by simulation. They proved that their method is also one of the efficient methods of p2p worm detection. As mentioned above this method detects the known worm and also the unknown worms based on characteristic string matching and their behaviour respectively. In this method they initially capture the network packets using the library function called ââ¬Å"LibPcapâ⬠. ââ¬Å"LibPcapâ⬠is the library function that captures the network packets in UNIX and Linux platforms. This function contains many functions that will be useful for capturing the network packets. After capturing the data packets with help of these functions the non-P2P packets are filtered out. So now the P2P packets are filtered. In these P2P packets the known worms are detected by using the characteristic string matching. This is implemented by the couple of algorithms. They are the ââ¬Å"suffix array algorithmâ⬠and the ââ¬Å"dichotomy algorithmâ⬠. These algorithms are very accurate and are capable of detecting the worms in very less time. As I mentioned above peer-to-peer networks follow fragment transfer mechanism. Hence the characteristic string of the worm can be assigned to the other blocks of data. So, in this situation it is difficult to detect the worm if the characteristic string of the worm is based on the single packet. But if the characteristic string is present in the block then there is a chance of detecting the worm because it will assign it to the two packets. At this time the worm characteristic string present in the two different data packets need to restructure. After restructuring, the worm can be detected by using the matching mechanism. In this way the known worm in the network is detected by using the characteristic string matching. The unknown worms in the p2p network can be detected with the help of the act characteristics of the worm at the initial stage of its propagation. This can be called as the behaviour based detection of the unknown p2p worms. Like this all the known and unknown worms in the network are detected. 3.1 P2P KNOWN WORM DETECTION: There are four steps in detecting the p2p known worms. They are: Deal flow Technology of identifying the application Characteristic string matching Reorganising the characteristic string 3.1.1 DEAL FLOW: In this step of deal flow the flow of data is divided into four steps[16]. Step 1: Extracting the p2p data stream from the original data stream. Step 2: check the extracted p2p data stream for worms using characteristic string matching with the worms already existing in the library function. Step 3: data is flow is reorganised. It now contains worm characteristic string as well. Go to step 2. Step 4: check the data flow for unknown worms using unknown worm detection techniques. After performing the four steps update the library function. All the four steps is representedà pictorially as in the next page. Figure 4: flow chart representing four steps to detect worms à yes à normalà à Normal noà à à à Abnormal abnormal 3.1.2 TECHNOLOGY OF IDENTIFYING THE APPLICATION: As said earlier, this paper uses the method of capturing the data packets and sca it for the worms which are known with the help of a function library called ââ¬Å"LibPcapâ⬠[17] . For this there should be already some assigned rules in the network interface devices. So assigning these rules to those devices is done in stepwise procedure as: Identify the available network interface devices Open the network interface device Compile the rules that we are willing to attach to the devices Setup the rules of filtering to the device Now operate the equipment Start the process of capturing the packets There are some rules for identifying the p2p application. They are: Characteristic information of the known p2p is used Sometimes, if source-destination IP pairs donââ¬â¢t use the known P2P and they may use TCP and UDP at same time, then they are p2p. At a particular time source pairs {srcIP, srcport}[27] and the destination pairs {dstIP, dstport}[27] are checked Here we can identify whether itââ¬â¢s a p2p or not. If the number of connection port is equal to the number of connection IP, then we can say that it is a p2p. There are the situations where these rules have been used unruly. So the there were some amendments made to these rules. The amendments are rule (2) can identify even the mazes which are present and rule (3) is modified in such a way that in the detect cycle {srcIP, srcport}[27] pairs at the source and the {dstIP,à dstport }[27] pairs at the destination are checked. From this they derived that if the number of connection port is equal to the number of connection IP, the protocols which are used are same. If they are different then the protocols are different. 3.1.3 CHARACTERISTIC STRING MATCHING: This is the most important section of the paper. Here authors have given some definitions to the terms which we are going to use, the algorithms which we are going to use to detect the worm. Couple of algorithms are mentioned. They are suffix-array algorithm and the dichotomy algorithm. So the entire process of detecting the worm depends on the efficiency and the accuracy of these algorithms. First of all before using and understanding suffix-array algorithm we will try to understand some keywords and rules. Suffix: suffix is the part of a string or a substring which starts at a particular location to the end of the string. If a suffix in the string S starts at the location ââ¬Ëiââ¬â¢ to the end of the string S, then the suffix can be represented as Suffix(i)=S[i,Len(S) ][27] . Let us understand how the strings can be compared. The comparison in this paper followed ââ¬Å"dictionary comparisonâ⬠If u and v are the two different strings. Comparing the strings u and v is same like comparing u[i] and v[i], where ââ¬Ëiââ¬â¢ starts with the value 1. ÃË Here string u is equal to string v i.e., u=v when u[i]=v[i] ÃË String u is greater then string v i.e., u>v when u[i]>v[i] ÃË String u is less than string v i.e., u But the results were still not obtained for i>len(u) or i>len(v) Also if len(u)>len(v) then u >v, if len(u) Suffix-array: suffix-array is denoted by SA. It is a one-dimensional array. It is an array of SA[1], S[2], SA[3],â⬠¦. And so on. Here s[i] Rank-array: rank-array is nothing but SA-1. If SA[i]=j, then Rank[j]=i. we can say that the rank[i] saves the rank of Suffix(i) in an ascending order for all the suffixes. In this paper the author has taken the example of string ââ¬Å"scienceâ⬠and explained everything clearly. The string ââ¬Å"scienceâ⬠can generate seven suffixes. They are: Suffix(1): science Suffix(2): cience Suffix(3): ience Suffix(4): ence Suffix(5): nce Suffix(6): ce Suffix(7): e When we sort out everything in a dictionary order it will be in the order as follow Suffix(6)= ce Suffix(2)= cience Suffix(7)= e Suffix(4)= ence Suffix(3)= ience Suffix(5)= nce Suffix(1)= science Suffix-array algorithm follows multiplier ideas. Firstly get SA1 and Rank1 by comparing every character in the string. Comparing string is similar to comparing the every character sequentially. So by comparing every character, SA1 and Rank1 can derive SA2 and Rank2. And this SA2 and Rank2 will derive SA4 and Rank4. And this will again derive SA8 and Rank8. So finally suffix-array and rank-array are derived from this process. The main process of the suffix-array algorithm is ÃË Calculating SA1 and Rank1. Firstly all the suffixes are arranged in the first letter order and then suffix-array (SA1) is generated by using quick sorry algorithm and then Rank1 is also generated. ÃË Comparing 2k-prefix Suffix(i) and Suffix(j) using SAk and Rankk. 2k-Suffix(i) = 2k-Suffixes(j), this is equivalent to Rankk[SAk[i]] = Rankk[SAk[j]] and Rankk[SAk[i+k]] = Rankk[SAk[j+k]] 2k-Suffix(i) Suffix-array algorithm is a sorting algorithm which sorts out the characteristic string. So, this uses binary search algorithm. The algorithm follows Step 1: in the first step values are assigned like left=1, right=n and max_match=0 Step 2: the middle value i.e., mid= (left +right)/2. Step 3: comparing the characters corresponding to Suffix (SA[mid]) and P. the longest public prefix r can be helpful in implantation and comparison. If r > max_match, then max_match=r. Step 4: if Suffix(SA[mid]) à If Suffix(SA[mid])>P, then right=mid-1 à If Suffix(SA[mid])=P, then go to step 6 Step 5: if left Step 6: if max_match= m, then print ââ¬Å"match is successfulâ⬠. 3.1.4 REORGANISING THE CHARACTERISTIC STRING: In this step the characteristic string is reorganised. If the character string is divided into two different data blocks, then the data block with the partial characteristic string is stored. Basically, all the information about the data block like index, beginning offset, length of the block and so on are contained at the head of the each block. Here a structure piece is defined which consists of index of the block, beginning offset of the block offset, length of the character array head and the length of the character array end[18]. Initially each and every data packet is compared with the characteristic string for matching. If it is matched then the warning or an alert is sent to all the users about the worm. Here if the tail of the characteristic string of the worm matches with the head of the data block, then it will be stored in the character array end. And if the head of the characteristic string of the worm matches with the tail of the data block then it is stored in the corr esponding character array head. Suppose if the neighbouring data block contains a partial characteristic string of the worm then the neighbour string in the array head as well as in the end will be reorganised. Now this reorganised string will again perform the characteristic string matching and if any worm is detected then again the warning is sent to all users saying that the worm have found. If it is not matched then it wonââ¬â¢t perform any operation. If in a case that the characteristic string is present in the block but is divided into two different data packets, then a special term called ââ¬Å"character arrayâ⬠is introduced. First the matching mechanism is performed in both the data packet. If the matching characteristic string is found then the warning is sent to the users that there is a worm present. But if only part of the characteristic string is found then it will be enough if it meets some of the requirements like the head of the data packet should match wit h the tail of the characteristic string or the tail of the data packet should match with the head of the characteristic string. But if these conditions are not satisfied then no operation is performed. Now, if the tail of the data packet contains the partial characteristic string then the data packet is stored in the array. If the length of the characteristic string is m, then the Array[m] is set as ââ¬â¢Ã¢â¬â¢. And if the head of the data packet contains a part of the characteristic string then that data packet is stored in the n consecutive units of array. Finally, this array will be the characteristic string matching and if the worm is detected then the warning is sent to all the users. If it is not matched then nothing is done. 3.2 DETECTING UNKNOWN P2P WORM: In the above section we have seen how the known worm is detected. But that algorithm or mechanism are meant to detect the unknown p2p worms. So here in this section we will understand how the unknown worms can be detected and restrain the network. As we know in p2p networks a node can able to send the information to multiple hosts at a same time. Anyhow same protocol is used by all the nodes in the network[27]. These characteristics of the network helps worm to propagate easily. As we discussed above, only the known worms can be detected by using the characteristic string matching method. Here we will see how the unknown worms can be detected. The unknown worms are detected based on the behaviour of the node. Some of the detection rules are: same content files are transferred to multiple hosts in a very short time. Same protocol is used and the destination port is same. If these rules are satisfies by the source port then it allows the p2p worm to propagate. Now, it is necessary to e xtract the characteristics of worm near the worm propagation nodes. When these characteristics are extracted, they are added to the feature library. This data similarity comparison and extracting the characteristics are done using the LCSeq algorithm. But the LCSeq algorithm based on generalized suffix tree (GST) is the more efficient. The overall idea is that all the suffixes are represented as a tree. And this tree will have some characteristics like: ÃË Every node in a tree is a string and root is the empty string ÃË Every suffix can be represented as a path from the root. ÃË Every substring can be considered as a prefix of a suffix. ÃË To achieve the searching public sub sequence, every node should be set the information of its subordinate source string. 3.3 EXPERIMENT: We know that the worm body tries to infect the other nodes in the network by sending the worm to the specific ports of p2p node. So here the author tried to prove the efficiency of his method by performing an experiment. In this experiment he prepared a multiple group worm body and sent it repeatedly at regular intervals of time. Then he captured these packets and extracted their characteristics and compared it with the one that already exist in the feature library. P2p worm is detected separately using different algorithms like BF algorithm, KMP algorithm and suffix-array algorithm and compared their results doing three experiments. In the experiment 1, worm characteristics are in the same packet.. in the experiment
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.