Trojan Attacks and Machine Learning


Trojan attacks allow malicious actors to place backdoors into machine learning models, with potentially devastating repercussions for safety-critical cyber and physical systems.

Logic testing and side-channel analysis are popular techniques used to detect Hardware Trojan (HT) attacks in integrated circuits; however, these methods typically require expensive golden models for testing and analyzing signals.


Trojan attacks compromise hardware by altering AI algorithms or inserting hidden triggers into systems. Their effects depend on their type and attack vector; some trojans may lead an AI to misclassify specific inputs, while others could potentially cause all input data to be classified as one target.

Unsupervised machine learning techniques have been successfully applied to a broad spectrum of hardware trojans for the first time in this work, using unlabeled test measurements combined with QDM magnetic field imaging and trojan detection to provide an unbiased analysis without labeled training data.

PCA can reduce the dimensionality of measurement data before clustering is used to identify whether a test chip automatically contains a trojan. This approach offers advantages over more traditional methods that rely on the classification of test chips; adversaries can compromise such sort by inserting trojans in either training data or test set sets.

Three scalable trojans are used to assess the efficacy of this technique: comparator and shift register trojans represent combinational logic, while counter trojans describe sequential logic; these were chosen so they represent all common logic types within practical trojan designs.


Trojan is a hidden trigger in AIs that causes them to respond differently when presented with specific inputs. Adversaries undertake trojan attacks to hijack an AI, gain control of its behavior, and gain access to sensitive information or systems. Attackers may employ tactics such as altering training data or manipulating its structure (e.g., modifying weights of deep neural networks).

Traditional methods for hardware Trojan detection rely on side-channel signals such as power consumption. But simple comparison techniques may lead to false positive detections, while noise could mask distinctions between standard and Trojan circuits.

Scholars have turned to machine learning techniques to address these challenges, with methods including static and dynamic detection approaches being employed. Static detection utilizes methods like extracting netlist features without needing gold chips as references; popular methods include shallow neural networks, random forests, and SVM.

Dynamic detection approaches use machine learning to identify hardware Trojans during the design phase by observing that Trojans cause different behaviors when exposed to other inputs. Research in this area typically focuses on finding practical feature sets – for instance, path sentence generation expressions for component domains or sensitivities to power changes – that enable accurate detectors while decreasing circuit testing times.


Technology advances make it increasingly easy for Trojans to infiltrate AI algorithms, with dangerous Trojans often concealed within. Although traditional defense measures such as protecting and cleaning up training data are compelling, often, these approaches are impractical or too time-consuming for actual use cases. Instead, many bespoke AIs are created via transfer learning (taking an existing public open-source AI published online and adapting it for specific use cases); this approach poses a particular risk as Trojans may persist even after modifications have been made to its original model.

Researchers have attempted to detect hardware Trojans using side-channel signals. Since Trojans can alter transient power usage on chips, comparing power consumption can reveal any Trojan-affected circuits; however, process variation and measurement noise may limit these methods’ effectiveness.

Researchers have developed methods using machine learning to overcome these limitations, with two broad categories being netlist-level detection and gate-level detection. Netlist-level detection is widespread as it does not require physical access and is, therefore, more practical; however, this method suffers from difficulties extracting features and high rates of false positives compared with gate-level detection.


Trojan attacks are particularly hazardous as they don’t rely on changing training data (a common defense against AI manipulation) and cannot be detected using traditional techniques (side-channel analysis and gate-level netlist). As a result, it becomes incredibly challenging to recognize and protect against them.

One defense against them is to ensure a Trojan only activates on test data sets or under abnormal operating conditions without altering normal behavior for inputs that don’t include triggers. Unfortunately, this can be challenging because the adversary can create triggers by changing normal operating conditions or testing environments.

Another approach for Hardware Trojan detection models involves RTL analysis-assisted machine learning models. These approaches use RTL designs of AES-256 circuits to extract critical features such as delay, power consumption, and resource utilization profile into feature vectors for processing using resampling variants and feature selection methods to address data inequities such as imbalance or skewness while still maintaining enough feature to train a machine learning classifier; finally, a robust heterogeneous ensemble model is used to perform two-class classification – standard and Trojan circuits.

This approach has produced impressive results compared with prior work, such as logic testing and side-channel statistical power analysis for Trojan detection in integrated circuits (ICs). Furthermore, it demonstrated better scalability and performance under process variation conditions than existing methods.