Nowadays, cybersecurity companies implement a variety of methods to discover new, previously unknown malware files. Machine learning( ML) is a powerful and widely used approach for this task. At Kaspersky we have a number of complex ML simulates based on different file features, including modelings for static and dynamic detection, for processing sandbox logs and system events, etc. We implement different machine learning techniques, including deep neural networks, one of the most promising technologies that make it possible to work with large amounts of data, incorporate different types of features, and boast a high accuracy rate. But can we rely entirely on machine learning approaches in the battle with the bad guys? Or could powerful AI itself be vulnerable? Let’s do some research.

In this article we attempt to attack our product anti-malware neural network frameworks and check existing defense methods.


An adversarial attack is a method of constructing small-scale modifications to the objects in such a way that the machine learning model begins to misclassify them. Neural networks( NN) are known to be vulnerable to such attacks. Research of adversarial methods historically started in the sphere of image recognition. It has been proved that minor changes in illustrations, such as the addition of insignificant interference can cause remarkable changes in the predictions of the classifiers and even totally mystify ML models[ i ].

The addition of inconspicuous interference causes NN to classify the panda as a gibbon

Furthermore, the insert of small-scale patterns into the image can also force frameworks to change their predictions in the wrong direction[ ii ].

Adding a small patch to the image constructs NN classify the banana as a toaster

After this susceptibility to small data modifications was highlighted in the image recognition of neural networks, similar techniques were demonstrated in other data domains. In particular, a variety of assaults against malware detectors were proposed, and many of them were successful.

In the working paper” Functionality-preserving black-box optimization of adversarial windows malware[ iii ] the authors extracted data sequences from benign portable executable( PE) files and added them to malware files either at the end of the file( cushion) or within newly created parts( part injection ). These alters affected the scores of the targeted classifier while preserving file functionality by design. A collection of these malware files with inserted random benign file portions was formed. Using genetic algorithms( including mutants, cross-over and another type of metamorphosis) and the malware classifier for predicting scores, the authors iteratively modified the collecting of malware files, stimulating them more and more difficult for the framework to be classified correctly. This was done via objective role optimization, which contains two conflicting words: the category output on the manipulated PE file, and a penalty function that evaluates the number of injected bytes into the input data. Although the proposed attack was effective, it did not use state-of-the-art ML adversarial techniques and relied on public pre-trained simulates. Likewise, the authors measured an average effectiveness of the attack against VirusTotal anti-malware locomotives, so we don’t know for sure how effective it is against the cybersecurity industry’s producing solutions. Furthermore, since most security products still use traditional methods of detection, it’s unclear how effective the attack was against the ML component of anti-malware solutions, or against other types of detectors.

Another study, “Optimization-guided binary diversification to mislead neural networks for malware detection” [ iv ], proposed a method for functionality-preserving assembler operand changes in functions, and adversarial onslaughts based on it. The algorithm haphazardly selects a function and metamorphosi type and tries to apply selected alters. The attempted conversion is applied only if the targeted NN classifier becomes more likely to misclassify the binary file. Again, this attack absence ML methods for adversarial modification, and it has not been tested on specific anti-malware products.

Some papers proposed gradient-driven adversarial methods that use knowledge about model structure and features for malicious file modification[ v ]. This approach furnishes more opportunities for file modifications and outcomes in better effectiveness. Although the authors conducted experimentations in order to measure the impact of such attacks against specific malware detectors( including public models ), they don’t work with product anti-malware classifiers.

For a more detailed overview of the various adversarial assaults on malware classifiers, see our whitepaper and” A survey on practical adversarial examples for malware classifiers “.

Our purpose

Since Kaspersky anti-malware solutions, among other techniques, rely on machine learning frameworks, we’re extremely interested in investigating how vulnerable our ML frameworks are to adversarial attempts. Three attempt scenarios can be said to be 😛 TAGEND

– White-box attack. In this scenario, all information about a model is available. Armed with this information, attackers try to convert malware files( detected by the model) to adversarial samples with identical functionality but misclassified as benign. In real life this attack is possible when the ML detector is a part of the client application and can be retrieved by code overruling. In particular, researchers at Skylight reported such a scenario for the Cylance antivirus product.

– Gray-box attack. Complex ML frameworks typically require a significant amount of both computational and memory resources. Therefore, the ML classifiers may be cloud-based and deployed on the security company servers. In such cases, the customer applications simply compute and send file features to these servers. The cloud-based malware classifier responds with the predictions for made features. The attackers have no access to the model, but they still have knowledge about feature construction, and can get labels for any file by scan it with the security product.

– Black-box attack. In this case, feature computation and modeling prediction are performed on the cybersecurity company’s side. The client applications just send raw files, or the security company compiles files in another way. Therefore , no information about feature processing is available. There are strict legal restraints for sending information from the user machine. This approach also involves traffic limitation. This entails the malware detecting process usually can’t be performed for all consumer files on the go. Therefore, an attack on a black-box system is still the most difficult.

Consequently, we will focus on the first two attempt scenarios and investigate their effectiveness against our product model.

Features and malware classification neural network

We built a simple but well-functioning neural network similar to our product simulate for the task of malware detection. The modeling is based on static analysis of executable files( PE files ).

Malware classification neural network

The neural network model works with the following types of features 😛 TAGEND

PE Header features- features extracted from PE header, including physical and virtual file size, overlay size, executable characteristics, system type, number of imported and exported parts, etc. Segment features- the number of parts, physical and virtual size of parts, section c Section statistics- various statistics describing raw section data: entropy, byte histograms of different section parts, etc. File strings- strings parsed from raw file employing special utility. Extracted strings packed into bloom filter

Let’s take a brief look at the bloom filter structure.

Scheme of packing strings into bud filter arrangement. Bits related to strings are set to 1

The bloom filter is a bit vector. For each k string n predefined hash roles are calculated. The value of the hash functions determines the position of a bit to be set to 1 in the bud filter vector. Note that different strings is likely to be mapped to the same bit. In such cases the bit remains in the situated position( equal to 1 ). This way we can pack all file strings into a vector of a set size.

We developed the aforementioned neural network on approximately 300 million files- half of them benign, the other half malware. The category quality of this network is displayed in the ROC curve. The X-axis shows the false positive rate( FPR) in logarithmic scale, while the Y-axis corresponds to the true positive rate( TPR)- the detecting rate for all the malware files.

ROC curve for qualified malware detector

In our company, we focus on techniques and simulates with very low false positive rates. So, we determined a threshold for a 10 -5 false positive rate( we rate 1 false positive as 100 000 true detections ). Use this threshold, we detected approximately 60% of the malware samples from our test collection.

Adversarial strike algorithm

To attack the neural network, we use the gradient method described in” Practical black-box strikes against machine learning “. For a malware file we want to change the score of the classifier to avoid detection. To do so, we calculate the gradient for the final NN score, back-propagate it through all the NN layers to the file features. The main difficulty of creating an adversarial PE is saving the functionality of the original file. To achieve this, we define a simple strategy. During the adversarial assault we only add new segments, while existing sections remain intact. In most cases these modifications don’t affect the file execution process.

We also have some restrictions for features in the new sections 😛 TAGEND

Different size-defining features( related to file/ segment size, etc .) should be in the range from 0 to some not very large value. Byte entropy and byte histograms should be consistent. For example, the values in a histogram for a buffer with the sizing S should give the value S when blended. We can add bits to the bloom filter, but can’t remove them( it is simple to add new strings to the file, but difficult to remove ).

To satisfy these restrictions we use an algorithm similar to the one was reflected in” Deceiving end-to-end deep see malware detectors using adversarial examples” but with some modifications( described below ). Specifically, we are moving forward the “fix_restriction” step into the “while” loop and expanded the restrictions.

Here dF( x, y )/ dx is the gradient of the simulate output by features, fix_restrictions projects features to the aforementioned permitted value field, is the step size.

The adversarial-generating loop contains two steps 😛 TAGEND

We calculate gradient of model rating by features, and add to the feature vector x in the direction of the gradient for all non-bloom features. Update the feature vector x to meet existing file limiteds: for example, put integer file features into the required interval, round them.

For bloom filter features we just set up one bit corresponding to the largest gradient. Actually, we should also find the string for this bit and set up other bits corresponding to it. However, in practice, this level of accuracy is not necessary and has almost no effect on the process of generating adversarial samples. For simplicity, we will skip the addition of other corresponding string bits in further experiments.

White-box attack

In this section we analyse the efficacy of the algorithm for the white-box approach. As mentioned above, this scenario accepts the availability of all information about the model structure, as is the case when the detector is deployed on the customer side.

By following the algorithm of adversarial PE generation, we “ve managed to” mystify our classification model for about 89% of the malicious files.

Removed detection rate. X-axis shows the number of steps in algorithm 1; Y-axis shows the percentage of adversarial malicious files that went undetected by the NN classifier( while their original versions were detected ).

Thus, it is easy to change files in order to avoid detection by our simulate. Now, let us take a closer look at the details of the attack.

To understand the vulnerabilities of our NN we implement the adversarial algorithm for different feature characters separately. First, we tried to change string features merely( bloom filter ). Doing so confuses the NN for 80% of the malware files.

Removed detection rate for string altering only

We also explore which bits of the bloom filter are often set to 1 by the adversarial algorithm.

The histogram of bits, added by the adversarial algorithm to the bloom filter. Y-axis corresponds to the ratio of files that the current bit is added to. A higher rate meant that bit is important for decreasing the model score

The histogram shows that some bits of the bloom filter are more important for our classifier, and determining them to 1 often leads to a decrease in the score.

To investigate the nature of such important bits we reversed the popular bits back to the string and secured a list of strings likely to change the NN score from malware to benign 😛 TAGENDPooled mscoree.dll CWnd MessageBoxA SSLv3_method assembly manifestVersion= “1. 0” xmlns=”urn… SearchPathA AVbad_array_new_length @std Invalid colouring format in% s file SHGetMalloc Setup is preparing to install[ name] on your computer e:TScrollBarStyle{ssRegular,ssFlat,ssHotTrack SetRTL VarFileInfo cEVariantOutOfMemoryError vbaLateIdSt VERSION.dll GetExitCodeProcess mUnRegisterChanges ebcdic-Latin9–euro GetPrivateProfileStringA XPTPSW cEObserverException LoadStringA fFMargins SetBkMode comctl32.dll fPopupMenu1 cTEnumerator< Data.DB.TField cEHierarchy_Request_Err fgets FlushInstructionCache GetProcAddress NativeSystemInfo sysuserinfoorg uninstallexe RT_RCDATA textlabel wwwwz

We also tried to attack the simulate to force it to misclassify benign files as malware( inverse problem ). In such cases, we procured the following list 😛 TAGENDmStartTls Toolhelp3 2ReadProcessMemory mUnRegisterChanges ServiceMain arLowerW fFTimerMode TDWebBrowserEvents2DownloadCompleteEvent CryptStringToBinaryA VS_VERSION_INFO fFUpdateCount VirtualAllocEx Free WSACreateEvent File I/ O mistake% d VirtualProtect cTContainedAction latex VirtualAlloc fFMargins set_CancelButton FreeConsole ntdll.dll mHashStringAsHex mGetMaskBitmap mCheckForGracefulDisconnect fFClientHeight mAddMulticastMembership remove_Tick ShellExecuteA GetCurrentDirectory get_Language fFAutoFocus AttributeUsageAttribute ImageList_SetIconSize URLDownloadToFileA CopyFileA UPX1 Loader

These sets of “good” and “bad” strings look consistent and plausible. For instance, the strings’ MessageBoxA’ and’ fPopupMenu1′ are actually often used in benign files. And vice versa, strings like’ Toolhelp3 2ReadProcessMemory ‘,’ CryptStringToBinaryA ‘,’ URLDownloadToFileA’ and’ ShellExecuteA’ looking suspicious.

We also attempted to confuse our modeling using merely binary parts statistics.

Removed detection rate for part added, without bud features. X-axis corresponds to the number of added segments, Y-axis to the percentage of malware files that become “clean” during adversarial attacks

The graph shows that it is possible to remove detecting for about 73% of malware files. The best result is achieved by adding 7 sections.

At this level, the issue of a” universal section” originates. That is, a section that leads to the incorrect classification and detecting removal of many different files when added to them. Taking this naive approach, we simply calculated mean statistics for all sections received during the adversarial algorithm and generated one “mean” segment. Regrettably, adding this section to the malware files removes only 17% of detections.

Byte histogram of “mean” section: for its beginning and ending. X-axis corresponds to the byte value; Y-axis to the number of bytes with this value in the section part

So, the idea of one universal section failed. Therefore, we tried to divide the erected adversarial section into compact groups( utilizing the l2 metric ).

Adversarial sections dendrogram. Y-axis shows the Euclidian distance between parts statistics

Separating the adversarial parts to clusters, we calculated a “mean” segment for each of them. Nonetheless, the detecting prevention rate did not increase rapidly. As a make, in practice, simply 25 -3 0% of detection lawsuits can be removed by adding such” universal signify parts “.

The dependence of the removed detecting share on the number of clusters for “mean” segments computation

The experiments showed that we do not have a ” universal ” segment for making a file look benign for our current version of NN classifier.

Gray-box attack

All previous strikes were constructed with the assumption that we already have access to the neural network and its weights. In real life, this is not always the case.

In this section we consider a scenario where the ML model is deployed in the cloud( on the security company’s servers ), but features are calculated and then sent to the cloud from the user’s machine. This is a typical scenario for simulates in the cybersecurity industry because sending user files to the company side is difficult( due to legal restraints and traffic restrictions ), while specifically extracted features are small enough for forwarding. It meant that attackers have access to the mechanisms of feature extraction. They can also scan any files utilizing the anti-malware product.

We made a number of new modelings with different architectures. To be precise, we altered the number of fully connected layers and their sizes in comparison with the original model. We also accumulated a large collection of malware and benign files that were not in the original training specify. Then we extracted features from the new collecting- this can be done by reversing the code of the anti-malware application. Then we labeled the collecting in two different ways: by the full anti-malware scan and using just the original model verdicts. To clarify the difference, with the selected threshold the original framework detects about 60% of malware files in comparison with the full anti-malware stack. These models were trained on the new dataset. After that the adversarial assault described in previous sections was implemented for proxy models. The resulting adversarial samples built for the proxy simulate were evaluated on the original one. In spite of the fact that the architectures and training datasets of the original and proxy models may differ, it turned out that onslaughts on the proxy model can render adversarial samples for the original modeling. Surprisingly, attacking the proxy modeling could sometimes lead to better assault results.

Gray-box attack ensues compared to white-box attack. Y-axis corresponds to the percentage of malware files with removed detectings of the original simulate. The effectiveness of the gray-box attack in this case is better than that of the white-box attack.

The experiment shows that a gray-box attack can achieve similar results to the white-box approach. The only difference is that more gradient steps are needed.

Assault transferability

We don’t have access to the machine learning modelings of other security companies, but we do have reports[ vi ] of gray-box and white-box adversarial attempts being successful against publicly known frameworks. There are also research papers[ vii ] about the transferability of adversarial strikes in other realms. Therefore, we presume that product ML detectors of other corporations are also vulnerable to the described attack. Note that neural networks are not the only vulnerable machine learning type of model. For example, another popular machine learning algorithm, gradient busting, is also reported[ viii ] to have been subjected to effective adversarial attacks.

Adversarial assault protection

As part of our study, we examined several proposed algorithms in terms of protecting modelings from adversarial onslaughts. In the requirements of this regulation, we report some of the results of their impact on model protection.

The first approach was described in” Distillation as a defense to adversarial perturbations against deep neural network “. The authors propose to train the new “distilled” model based on the scores of the first modeling. They show that for some tasks and datasets this method reduces the effectiveness of gradient-based adversarial strikes. Regrettably, the idea does not guarantee successful modeling protection, and in our case, when we taught the simulate according to the proposed approach, it still turned out to be easily confused by the adversarial algorithm.

We also tried to add noise to data 😛 TAGEND

For continuous features we calculated mean and standard deviation values. We added a random number of segments to each file during model educate. Then the generated parts parameters were taken from a Gaussian distribution with parameters equal to the aforementioned mean and difference values. For the bloom filter arrangement, we likewise added a 5% possibility of determining each bit to 1.

The idea behind this method is to try to expand the determine of potential file segments, stimulating the network more stable and resistant to assaults that add parts to the end of a file. However, the method used was not effective either.

We also tried protection of rights approach described in” Detecting adversarial samples from artifacts “. In many cases, neural network layers contain a dropout mechanism. A dropout is a layer that randomly replaces features from a previous layer with zeros. The idea of the paper is to enable a dropout in the presumption stage( while it is usually incapacitated because the dropout layer mechanism produces a degree of randomness and noise in predictions for the same objects ), and calculate the score multiple times for each file. Enabling dropout layers leads to slightly different scores for each operate, so we get a score distribution. These distributions, according to the authors, have standard deviation values for adversarial samples that are higher than for usual samples. We tried this approach, computing standard deviation of scores for adversarial malware and original malware files. Remarkably, the standard deviations of the adversarial scores appeared to be significantly higher than the discrepancies of the initial file scores.

Standard deviation distribution for adversarial and initial sample simulate scores

With this difference we can, for example, see 50% of adversarial samples with initial samples misclassification of around 10%, with a threshold.

X-axis shows the share of original samples detected as adversarial; Y-axis shows the share of detected adversarial samples

The other approach proposes the use of monotonic networks( watch” Monotonic Networks” and” Monotonic modelings for real-time dynamic malware detection “). The principle behind this method is to create a neural network with positive layer weights and monotonic activation roles. Such simulates will, by design, be resistant to the addition of new sections and strings, and any addition will simply increase the model detecting score, building the attack described in this article impracticable.

Adversarial attempt difficulties in the real world

Currently, there is no approach in the field of machine learning that can protect against all the various adversarial onslaughts, signifying techniques that rely heavily on ML predictions are vulnerable. Kaspersky’s anti-malware solution offer a complex multi-layered approach. It contains not only machine learning techniques but a number of different components and technologies to detect malicious files. First, detection relies on different types of features: static, dynamic or even cloud statistics. Complex detection rules and diverse machine learning modelings are also used to improve the quality of our products. Ultimately, complex and ambiguous lawsuits go to the virus analysts for further investigation. Thus, embarrassment in the machine learning model will not, by itself, lead to misclassification of malware for our products. Nevertheless, we continue to conduct research to protect our ML simulates from existing and prospective attempts and vulnerabilities.

[ i ] Goodfellow, Ian J ., Jonathon Shlens, and Christian Szegedy.” Explaining and harnessing adversarial instances .” arXiv preprint arXiv: 1412.6572( 2014 ).

[ ii ] Brown, Tom B ., et al.” Adversarial patch .” arXiv preprint arXiv: 1712.0966 5( 2017 ).

[ iii ] Demetrio, Luca, et al.” Functionality-preserving black-box optimization of adversarial windows malware .” IEEE Transaction on Information Forensics and Security( 2021 ).

[ iv ] Sharif, Mahmood, et al. “Optimization-guided binary diversification to mislead neural networks for malware detection.” arXiv preprint arXiv: 1912.0906 4( 2019 ).

[ v ] Kolosnjaji, Bojan, et al.” Adversarial malware binaries: Evading deep learning for malware detecting in executables .” 2018 26 th European signal processing seminar( EUSIPCO ). IEEE, 2018 ;P TAGEND

Kreuk, Felix, et al.” Deceiving end-to-end deep study malware detectors employing adversarial instances .” arXiv preprint arXiv: 1802.0452 8( 2018 ).

[ vi ] Park, Daniel, and Bulent Yener.” A survey on practical adversarial examples for malware classifiers .” arXiv preprint arXiv: 2011.05973( 2020 ).

[ vii ] Liu, Yanpei, et al.” Delving into transferable adversarial instances and black-box attempts .” arXiv preprint arXiv: 1611.0277 0( 2016 ).

Tramer, Florian, et al.” The space of transferable adversarial instances .” arXiv preprint arXiv: 1704.0345 3( 2017 ).

[ viii ] Chen, Hongge, et al.” Robust decision trees against adversarial instances .” International Conference on Machine Learning. PMLR, 2019.

Zhang, Chong, Huan Zhang, and Cho-Jui Hsieh.” An efficient adversarial strike for tree ensembles .” arXiv preprint arXiv: 2010.11598( 2020 ).

Read more: