Abstract. The problem of determining by a cellular operator what applications a particular network user has used is needed to compile statistics of the most frequently used applications. Such a definition of application statistics helps to not only monitor network status, detect failures, but also, if necessary, restrict access to network resources that, from the point of view of information security, can harm the user. The introduction of methods of data mining and machine learning allows to perform automatic classification, analysis and filtering of malicious and unwanted mobile network traffic applications. Malicious mobile applications can be a threat to the integrity or availability of data, and unwanted ones are a threat to confidentiality. The paper considers classification of network encrypted traffic by application types: email of Mail.ru, Sberbank, Skype, Pikabu, Instagram, Hearthstone and other methods of machine learning using algorithms Naive Bayes, C4.5, SVM, AdaBoost and Random Forest. For the analysis, more than two million network packets were collected from four applications that transmitted encrypted traffic, after which training and test samples were generated. To assess the quality of the qualifier, such criteria as Accuracy, Precision, Recall, F-Measure and Area Under Curve were used.
The use of the InfoGain algorithm showed that to ensure the high quality of classification of traffic of applications that use encryption, it is enough to limit thirteen attributes. Classifier Random Forest is the slowest, but has the best indicators of assessing the quality of classification. The size of the learning sample of the Random Fores algorithm to achieve a sufficiently high quality of classification of mobile applications cannot exceed 300 threads. To ensure high quality thread classification, it is enough to analyze from 16 to 58 packets in a stream depending on the application. Further increase in the number of packets in the stream does not lead to a noticeable improvement in the quality of classification.
Keywords: classification, machine learning, algorithms, network traffic, application, packet, flow, protocol, network, mobile applications efficiency.
DOI: https://doi.org/10.17747/TEDS-2018-98-101