Network Traffic Identification and Classification with Machine Learning

A computer network should consist of four primary objectives: Fault tolerance, Scalability, Security, and Quality of Service (QoS). These objectives make computer networks efficient, reliable, and suitable to satisfy users' requirements. Network administrators try to continuously improve these properties to give a better experience for their customers. There are various practices network admins can implement to achieve these four main objectives related to computer networking. Knowing about the network is one of the initial and basic activities that we can commonly see in those practices. Network monitoring is the process we use to get to know about the network. Network traffic monitoring plays a significant part in computer networking. It has been used from network resources management to network security-related activities.  Network monitoring is an umbrella term that is used to combine a few other processes/techniques.

Network traffic identification and classification (TIC) is a sub-area under network traffic monitoring. TIC aims to identify network traffic based on characteristics that are embedded in network traffic packets. There are three main categories of traffic identification and classification techniques namely, Packet-based techniques, Connection patterns based techniques, Statistical based techniques. This article focuses on Statistical based techniques. However, it’s better to have some idea on Packet-based and Connection patterns-based techniques before moving on.

Packet-based TIC techniques make use of information extracted from packets' header or payload to match applications by using predefined values or rules. There are two methods in this approach: Port-based application identification and payload-signature based identification also called deep packet inspection (DPI). Both of these methods were very successful in the past but with the widespread use of dynamic ports in new applications and widespread use of encryption standards due to security concerns, these methods are not that effective in the current context. Another concern is high resource consumption, methods like DPI require high computational power due to its nature though it is concerned to be very effective for TIC activities. 

The idea behind the Connection patterns-based approach is, that each application communicates with an endpoint that has a connection pattern which can be used to identify the application. Therefore, this method does not need to consider information provided by individual packets to identify network traffic. Let's take client-server communication as an example, server and client will contact using two Ports. However, in a P2P application, a peer will connect with other peers by using multiple ports. The main issue of this approach is the lack of fine-grained application identification. However, this technique is still used in network monitoring in various situations.

Machine Learning based Traffic Classification Flow.

Statistical methods were developed to overcome issues with packet-based traffic identification techniques. In statistical methods, traffic classification is done by recognizing statistical properties that can be externally observed with data packets. Statistical methods use machine learning techniques to identify network traffic, mainly, supervised and unsupervised techniques. The statistical approach is based on the concept of flows. A flow is identified as an aggregation of data packets that have the same source IP, destination IP, source port, destination port, and protocol. Statistical properties are extracted from flow to train machine learning models. A trained model then can be deployed to identify traffic passing through the network. Applications of these techniques span a wider area in networking. Use cases can be found in intrusion detection, malware identification, DDoS protection, network capacity planning, and many more. By combining software-defined networking (SDN) with machine learning we could build networks that are dynamically configured based on the users’ requirements. Intent-based networking (IBN) could be considered as that type of an intelligent network solution. Therefore in near future, we will see a lot of machine learning-based intelligent networks in our day-to-day life.

References


Comments

  1. Good article. It is interesting to see how these machine learning algorithms can be used in the field of networking.

    ReplyDelete
    Replies
    1. Thanks for the reading. Definitely, we could see cutting-edge technologies powered by AI in the very near future.

      Delete
  2. Machine learning make the tasks easier. will next level be no people at organizations? Nicely written osura.

    ReplyDelete

Post a Comment

Popular posts from this blog

How Cloud Resiliency organized in Microsoft Azure

MalLocker.B