Understanding Machine Learning Attacks, Techniques, and Defenses

Machine learning (ML) is a subset of Artificial Intelligence (AI), which enables machines and software to automatically learn from historical data to generate accurate output without being programmed to do so. Many leading organizations today have incorporated machine learning into their daily processes for business intelligence. But the ability of machine learning can be altered by threat actors to be malicious, causing systems to malfunction, or to execute an attack. This is known as adversarial machine learning. It misguides machine learning models with deceptive input to make mistakes in its predictions.

There are two main categories of machine learning attacks:

White box attack – The threat actor has all access to the information of the machine learning model`s parameters, architecture, and gradients.
Black box attack - The threat actor doesn`t have access to the information of the machine learning model`s parameters, architecture, and gradients. Therefore, the threat actor employs a different model or an ad-hoc method in hopes of generating adversarial examples in the target model.

Types of adversarial attacks

Three of the most common attack methods are poisoning, evasion, and model extraction attacks.

Poisoning attack – Also referred to as contaminating attack, this technique takes place during the training phase of a machine learning model, which causes the model to generate inaccurate decisions and reduce the accuracy and performance of the system. When a machine learning model is retrained during deployment, threat actors are able to inject malicious samples that disrupt or influence the retraining process.

Evasion attack – This attack is most commonly used in machine learning systems. Opposite to poisoning attacks, they occur on machine learning models that have already been trained. The threat actor manipulates the data during the deployment phase of the machine learning model, but the malicious inputs are unknown when the model malfunctions, which affects the accuracy and confidentiality of the model. Spoofing attacks conducted against biometric verification systems are one popular example of an evasion attack.

Model extraction – In this attack, the threat actor reconstructs the model by extracting the data on which it was trained. This attack is significant when either the training data or the model is sensitive or confidential.

Adversarial attack methods

Adversarial attack methods seek to trick ML models with deceptive inputs, altering the model’s ability to correctly identify the results.

Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) – This algorithm is used to minimize the number of perturbations added to images. It is effective at generating adversarial examples but very computationally intensive and time-consuming.
FastGradient Sign method (FGSM) – This is a simple and efficient method used to generate adversarial examples to minimize the maximum amount of perturbation added to any pixel of the image to cause misclassification.
Jacobian-based Saliency Map Attack (JSMA) – Unlike FGSM, which adds perturbation to every feature, this method uses feature selection to minimize the number of features that are modified, but it is more computationally intensive than FGSM.
Deepfool attack – Produces adversarial examples effectively with fewer perturbations and higher misclassification rates. This method is more computationally intensive than FGSM and JSMA
Carlini & Wagner Attack (C&W) – The most effective method in generating adversarial examples, and can misguide adversarial defensive technologies as well.

Defenses against adversarial attacks on machine learning systems

There are some protective techniques that are used to deter attacks:

Adversarial training – Training the machine learning model to identify adversarial examples. There are various tools available today that enable the automatic discovery of adversarial attacks.
Switching models – Using multiple models in the system that change randomly when making predictions. The threat actor is unaware of which model is currently in use, and may have to compromise all models for an attack to be successful. Attacks on multiple models are more difficult than just one.
Generalized models – In this approach, multiple models are combined to create one generalized model, and all the individual models contribute to the final prediction. The threat actor may be able to deceive one model but not all of them.
Using responsible AI – Existing security frameworks are likely inadequate to address the vulnerabilities of machine learning. The responsible AI framework addresses security issues unique to AI and ML.

Machine learning is a powerful tool used for a multitude of business activities today. Still, they are susceptible to attacks that alter their behavior, which causes disruption to systems and reduces their accuracy and performance. Adversarial machine learning is a technique used by threat actors to misguide machine learning models through malicious input. There are various attack types and methods used in machine learning systems that an organization needs to be aware of. This will also help to implement necessary and appropriate defenses against those attacks.

Editor’s Note: The opinions expressed in this guest author article are solely those of the contributor, and do not necessarily reflect those of Tripwire.

Request a Live Demo

Experience the power of Tripwire's cybersecurity solutions firsthand! Take a guided tour or participate in live demos to see how our products can enhance your organization's security. Start exploring now.

Request a Demo

Understanding Machine Learning Attacks, Techniques, and Defenses

Types of adversarial attacks

Adversarial attack methods

Defenses against adversarial attacks on machine learning systems

Request a Live Demo

Dilki Rathnayake

Contact Information

Privacy Policy

Cookie Policy

Impressum