Today’s cybersecurity threats continue to find ways to fly and stay under the radar. Cybercriminals use polymorphic malware because a slight change in the binary code or script could allow the said threats to avoid detection by traditional antivirus software. Threat actors customize their wares specific to their target organizations to increase their chances of breaking into and moving laterally through an entire corporate network, exfiltrating data, and leaving with little or no trace. The underground economy is rife with malware builders, Trojanized versions of legitimate applications, and other tools and services that allow malware operators to deploy highly evasive malware.
As the number of threats seen in the wild continues to increase exponentially, the continued evolution and innovation of their evasion tactics create a scenario where most malware is appreciated only once. Therefore, in today’s threat landscape, security answers should no longer be just about the number of unique malware they can detect. Instead, they should deliver durable solutions that can defend against existing as well as future attacks. This requires comprehensive visibility into threats, coupled with the ability to process vast amounts of data. Microsoft 365 Defender furnishes such a capability applying its cross-domain optics and the transformation of data into actionable security information through innovative applications of AI and machine learning methodologies.
We have previously discussed how we apply deep learning in detecting malicious PowerShell, exploring new approaches to categorize malware, and in detect threats via the fusion of behavior signals. In this blog post, we discuss a new approach that mixes deep learning with fuzzy hashing. This approach utilizes fuzzy hashes as input to identify similarities among files and to determine if a sample is malicious or not. Then, a deep learning methodology inspired by natural language processing( NLP) better recognizes similarities that actually matter, thus improving detection quality and scale of deployment.
This model aims to improve the overall accuracy of categorizing malware and continue closing the gap between malware release and eventual detecting. It can detect and block malware at first sight, a critical capability in defending against the wide range of threats, including sophisticated cyberattacks.
Instance study: New GoldMax malware blocked at first sight
In March this year, Microsoft 365 Defender successfully blocked a file that would later be confirmed as a variant of the GoldMax malware. GoldMax, a command-and-control backdoor that perseveres on networks as a scheduled task impersonating systems management software, is part the of tools, tactics, and procedures( TTPs) of NOBELIUM, the threat actor behind the two attacks against SolarWinds in December 2020.
Microsoft was able to proactively protect its customers from this newly discovered GoldMax variant because it leveraged two main technologies: fuzzy hashing, which serves as the input, and deep learning techniques inspired by NLP and computer vision, among others.
The earliest GoldMax sample, which Microsoft sees as Trojan: Win6 4/ GoldMax.A! dha, was first submitted on VirusTotal in September 2020. While the new file was confirmed to be GoldMax variant in June 2021, or three months after Microsoft first blocked it, we started defending patrons as soon as we understood it. As considered to be in the screenshots below, the new file’s TLSH and SSDEP hashes–the fuzzy hashes uncovered on VirusTotal–are observably similar to the first GoldMax variant. Both files also have the exact ImpHash and file size, further supporting our initial conclusion that the second file is also part of the GoldMax family.
In the next parts, we discuss fuzzy hashes and how we use them in conjunction with deep learning to see new and unknown threats.
Understanding fuzzy hashes
Hashing has become an essential technique in malware existing literature and beyond because its output–hashes–are commonly used as checksums or unique identifiers. For instance, it is common practice to use SH-A2 56 cryptographic hash to query a knowledge database like VirusTotal to determine whether a file is malicious or not. The first antivirus products operated this way before antivirus signatures existed.
However, to identify or see similar malware, traditional cryptographic hashing poses a challenge because of its inherent property called cryptographic diffusion, whose purpose is to hide the relationship between the original entity and the hash so that these are still considered one-way roles. With this property, even a minimal change in the original entity–in this case, a file–yields a radically different, undetected hash.
Below are screenshots that illustrate such principles. The word change in the text file and the resulting change in the MD5 hash represent the effect of changes in binary content of other files 😛 TAGEND
Figure 2. Example of cryptographic hashing
Fuzzy hashing transgresses the aforementioned cryptographic diffusion while still hiding the relationship between entity and hash. In doing so, this method provides similar ensuing hashes when dedicated similar inputs. Fuzzy hashing is the key to finding new malware that looks like something we have discover previously.
Like cryptographic hashes, there are several algorithms to calculate a fuzzy hash. Some instances are Nilsimsa, TLSH, SSDEEP, or sdhash. Using the previous text file example, below is a screenshot of their SSDEEP hashes. Note how observably similar these hashes are because there is only a one-word difference in the text 😛 TAGEND
Figure 3. Example of fuzzy hashing
The main benefit of fuzzy hashes is similarity. Since these hashes can be calculated on several parts or the entirety of a file, we can focus on hash sequences that are like one another. This is important in determining the maliciousness of a previously undetected file and in categorizing malware according to type, family, malicious behaviour, or even referred threat actor.
Fuzzy hashes as “natural language” for deep learning
Deep learning in its many applications has only just been been remarkable at simulate natural human language. For example, convolutional architectures, recursive architectures like Gated Recurrent Units( GRUs) or Long Short Term Memory networks( LSTMs ), and most recently attention-based networks like all the variants of Transformers have been proven to be state-of-the-art in addressing human speech tasks like sentiment analysis, question answering, or machine translation. As such, we explored if similar techniques can be applied to computer languages like binary code, with fuzzy hashing as an intermediate step to reduce sequence complexity and length of the original space. We discovered that segments of fuzzy hashes could be treated as “words, ” and some sequences of such terms could indicate maliciousness.
Architecture overview and deployment at scale
A common deep learning approach in dealing with terms is to use word embeds. Nonetheless, because fuzzy hashes are not exactly natural language, we could not simply use pre-trained models. Instead, we needed to train our embeddings from scratch to identify malicious indicators.
Once with these embeds, we have tried to do most things with a language deep neural network. We explored different architectures use standard techniques from literature, explored convolutions over these embeddings, attempted with multilayer perceptrons, and tried traditional sequential models( like the previously-mentioned LSTM and GRU) and attention-based networks( Transformers ).
We got fairly good results with most techniques. Nonetheless, to deploy and enable this framework to the Microsoft 365 Defender, we looked into other factors like inference times and the number of parameters in the network. Inference time ruled out the sequential frameworks because even though they were the best in terms of precision or recollect, they are the slowest to run inference on. Meanwhile, the Transformers we experimented on also yielded excellent ensues but had several million parameters. Such parameters will be too costly to deploy at scale.
That left us with the convolutional approach and multilayer perceptron. The perceptron yielded slightly higher outcomes between these two because the spatial adjacency intrinsically provided by the convolutional filters does not properly capture the relationship among the embeddings.
Once we had landed on a viable architecture, we use modern tools available to us that Microsoft continues to extend. We used Azure Machine Learning GPU capabilities to develop these simulates at scale, then exported them to Open Neural Network Exchange( ONNX ), which gave us the extra performance we needed to operationalize this at scale on Microsoft Defender Cloud.
Deep study fuzzy hashes: Seeming for the similarities that matter
A question that arises from an approach like this is: why apply deep learn at all?
Adding machine learning allows us to learn which similarities on fuzzy hashes topic and which ones don’t. Additionally, adding deep learning and training on vast amounts of data increases the accuracy of malware classification and allows us to understand the minor nuances that differentiate legitimate software from its malware or Trojanized versions.
A deep read approach also has its inherent advantages, one of which is creating big pre-trains on massive amounts of data. One can then reuse this framework for different category, clustering, and other scenarios by utilizing its transfer learning properties. This is similar to how modern NLP approaches speech undertakings, like how OpenAI’s GPT3 solves question answering.
Another inherent benefit of deep learning is that one does not have to retrain the model from scratch. Since new data is constantly flowing into the Microsoft Defender Cloud, we can fine-tune the modeling with these incoming data to adapt and quickly respond to an ever-changing menace landscape.
Deep learning continues to provide opportunities to improve threat detection significantly. The deep study approach discussed in this blog entry is just one of the ways we at Microsoft apply deep learning in our protection engineerings to detect and block evasive threats. Data scientists, menace experts, and product teams work together to build AI-driven solutions and investigation experiences.
By treating fuzzy hashes as “words” and not mere codes, we proved that natural language techniques in deep learning are viable methods to solve the current challenges in the threat landscape. This change in perspective presents different potentials in cybersecurity innovation that we are looking forward to exploring further.
Numerous AI-driven technologies like this allow Microsoft 365 Defender to automatically analyze massive amounts of data and quickly identify malware and other menaces. As the GoldMax case study demonstrated, the ability to identify new and unknown malware is a critical aspect of the coordinated defense that Microsoft 365 Defender delivers to protect patrons against the most sophisticated threats.
Edir Garcia Lazo