#### On a Bayesian Approach to Malware Detection and Classification through $n$-gram Profiles

##### José A. Perusquía, Jim E. Griffin, Cristiano Villa

Detecting and correctly classifying malicious executables has become one of the major concerns in cyber security, especially because traditional detection systems have become less effective with the increasing number and danger of threats found nowadays. One way to differentiate benign from malicious executables is to leverage on their hexadecimal representation by creating a set of binary features that completely characterise each executable. In this paper we present a novel supervised learning Bayesian nonparametric approach for binary matrices, that provides an effective probabilistic approach for malware detection. Moreover, and due to the model's flexible assumptions, we are able to use it in a multi-class framework where the interest relies in classifying malware into known families. Finally, a generalisation of the model which provides a deeper understanding of the behaviour across groups for each feature is also developed.

arrow_drop_up