Anomaly Detection System for IoT Using the Max78000

This submission Contains 3 sections all focused on anomaly detection. These three sections seek to illustrate how versatile the MAX78000 is in the field of AI and Anomaly detection. The three sections are as follows: -Network based anomaly dection. -Anomaly detection in the power data of a CNN application. - Lightweight Intrusion Detection System for a generic program Instead of only looking at data such as network packets, this project also looks closely at the time series power data of the applic

Anomaly Detection System for IoT Using the Max78000

Please make sure to check the attached PDF for full details on the project, including figures and results

Internet of Things (IoT) security is growing in importance in many applications ranging from automotive to biomedical to environmental to industrial applications. Access to (sensitive) data is the primary target for many of these applications. Often IoT devices are an essential part of critical control systems that could affect well-being, safety, or inflict severe financial damage. There is no solution to address all security aspects. Many software and hardware proposed solutions tackle different aspects of security without taking a unified view on security. Hardware solutions are widely recognised as more secure, while software allows more flexibility of use and hence it is more pervasive. Both the hardware and software are ultimately characterised by power consumption and various side channel attacks based on power consumption as well as associated countermeasures were deployed it the past.
This project covers 3 main fields of IoT attack: Network Attacks, Model attacks and Attacks to the Board and the Code Running on it (intrusion detection).

Section 1: Network based anomalies.

This section contains anomaly detection at the network level, in which we adapt the framework which will be published in FPL (Field Programmable Logic and Applications) 2021 conference. We propose a versatile framework for real-time Internet of Things (IoT) network intrusion detection using Artificial Neural Network (ANN).
An efficient hierarchical decision-making for IDS is proposed and evaluated on the new IoT-23 dataset, with improved accuracy.
In this contest, we bring the pre-trained ANN models into MAX78000.

The framework includes 4 main steps: Data preparation (Extracting Feature) -> Training -> Offline Inference (with hierarchical decision-making approach) -> Online inference.

For training, Tensorpack library is used. The hidden layer is configured with 40 neurons, each followed by a hardware efficient ReLU (Rectified Linear Unit) activation function. The processed data are distributed at the rate of 50% for training, 25% for validation, and 25% for testing/inference.
The NN models were trained using the cross-entropy loss function and Adam optimizer with a batch size of 256, and the learning rate is 0.01. The average training and validation time for 1 epoch is 1,183s and 516s on the Quadro M2000 GPU, respectively.
Training result: After running through several training schemes, the selected model achieves 99.68% accuracy and F1-score at 99.82%.

We implement the last step, which is ANN on the SoC platform to the MAX78000 platform.

We have successfully deployed the ANN model on the arm processor of MAX78000 (software). The implemented design is in C language on Eclipse MaximSDK. (Source code on GitHub https://github.com/ngominh911/max78000 ).
IoT-23 Dataset: https://www.stratosphereips.org/datasets-IoT23

Section 2: Intrusion Detection System for anomalies in the code and model power.

Many safety critical applications in IoT rely increasingly on machine learning inference. For example, in ADAS systems, real-time traffic sign recognition is based on such models. Constraints of implementation such as memory, latency, and power consumption are accompanied with other constraints such as high accuracy and also security. What if one would attempt to attack the inference model itself? How easy is to detect such an attack?

This project consists of two separate CNN Classifiers. The first: A road sign classifier trained on 8 of the 43 classes from the German Traffic Sign Test Bench dataset (https://benchmark.ini.rub.de/gtsrb_news.html).

This first CNN application simulates an AI camera, like you would find on many vehicles today. In total, this CNN achieved 99.6% accuracy across 8 classes; Stop Signs & Speed limit 30, 50, 60, 70, 80, 100, 120 signs. It will be the “test” application for this project, in other words, the application being monitored and protected.

The Second CNN is an Intrusion Detection System (IDS), trained on the power data of the Road Sign CNN, seeking to spot anomalies in this data.

To achieve this, power time series data form the Road Sign CNN was taken as it inferred, then synthetic anomalies were inserted into this power data to simulate an attacker gaining access to the system and tampering with the CNN. There are many possible ways to attack any sort of AI or neural network, such as, changing weights or bias, or even switching out the entire model. If so inclined, the attacker could force an error, or even supply a new model so that important inferences such as warning signs and stop signs, would be ignored by the vehicle. For example, causing the CNN to view stop signs as speed limit 50 signs could prove fatal for the occupants of the compromised vehicle.

Over 150 Million datapoints were taken from the Road Sign CNN as it inferred. This equates to over 44’000 inferences (Each of window size 4096 data points). Anomalies were then inserted randomly into these windows, to create a total of 44’000 anomalous samples. For a grand total 88’000 samples to train the CNN.

The IDS CNN achieved an accuracy of 99.95% validation and 99.92% test set accuracy on the synthetic anomalies. Two other models were also trained on “harder” and “mixed” anomalies; however, they will not be included in this project submission.

The results show great promise for the use of CNNs and the Max78000 on low power anomaly detection, especially as the IDS CNN took on average less than 15mW during run time. In a real-world scenario, the application being monitored would draw much more power than this, making this solution very low demand from a power perspective.

Some more detail behind each CNN:

The Road Sign CNN:

This CNN was trained 8 of the 43 classes from the German Traffic Sign Test Bench. These images are pre-processed, and already focussed on the sign, saving much effort. The selected classes were Stop Signs & Speed limit 30, 50, 60, 70, 80, 100, 120 signs.

To train the CNN for the Maxim Board, the “Cats_vs_Dogs” network was trained on the Road Sign Samples. This network proved the best choice as not only could it achieve over 99% accuracy on the data, but there were no issues with regards to Quantization or Synthesis with this network.

The NAS CIFAR Network, with batch normalization, was also trained on the data. This network managed to achieve over 99% accuracy over all 43 classes. However, it was unable to accept the 8-bit max weights that the Maxim board requires. As such, the cats_vs_dogs network proved to be the better starting point.

After training the CNN, the power data of each of the eight classes was taken. Over 150 Million datapoints, for a total of 44000 inference samples (each of window size 4096) were taken. Anomalies were then inserted into this data to train a supervised anomaly detection CNN.

Training breakdown:

· Pytorch and the “cats_vs_dogs”, ai85 network were used.
· Network does not use transfer learning, instead it was trained from the ground up.
· Trained this network on the German Traffic Sign Recognition Benchmark dataset.
· Trained on 8 of the 43 classes (Stop Signs & Speed limit 30, 50, 60, 70, 80, 100, 120 signs).
· Data split 80% training, 20% testing.
· Data resized to 64x64, with random rotations applied.
· Achieved 99.626% Validation and 99.625% Testing accuracy.
· Network then quantized and synthesized for the Max78000 board.
· Network makes full use of the Max78000 neural network accelerator.
· Samples were loaded onto the board for testing and to obtain the power data.
· GitHub for the trained network: https://github.com/popovici/Dominic_Lightbody/tree/main/RoadSignCNN

Intrusion Detection CNN:

This CNN was trained on the power data of the road sign CNN. These 150 Million datapoints were sampled at 250ksps. This sampling rate maximized resolution while not making it difficult to take many samples in a short window of time. This is one of the main areas focused on during this project. At what minimal resolution can we obtain a clear reading of what is going on in the code without losing important data. 250ksps was chosen for this project as it offered good resolution while not taking up too much data points per inference. Further work will look into how accurate we can make our CNNs / ML models while reducing the sampling rate as much as possible.

Anomalies were then inserted in this power data. This simulated tampering / hacking of the system. There were 3 different anomaly types in this training set. Point type anomalies, and two variations of noise type anomalies.

The methodology to introduce anomalies to the power data followed on from the work found in the paper a “Lightweight Anomaly Detection Framework for IoT”. This project features a more simplified version of anomaly insertion, focusing only on a handful of anomaly types.

After the data was taken and anomalies were added, the data was further processed. Before training. Very little pre-processing was needed to be done to the data in this project. The raw data was first grayscale normalized. Then the peaks over certain thresholds were found to focus the windows on the inference. Then finally the window was saved as a PNG. This process cut down the size of the dataset from around 15 GB to around 300MB, which was quite helpful.

In essence, the 4096-long window of data was folded repeatedly into a 64 x 64 Grayscale image. This was then extended out over three channels to make use of the RGB functionality of the Maxim Networks. This also made it possible to utilise the RGB script to generate functional sample data for the board, which sped up development considerably for this competition

Training Breakdown:

· Pytorch and the “cats_vs_dogs”, ai85 network were used.
· Trained this network on the power data taken from the Road Sign CNN.
· Network does not use transfer learning, instead it was trained from the ground up.
· Trained on 44000 Normal samples (4096 long each), 44000 anomalous samples (4096 long each).
· Data split 80% training, 20% testing.
· Data size at 64x64, with no random rotations or augmentations applied.
· Achieved 99.954% validation accuracy.
· Achieved 99.92% test accuracy.
· Network then quantized and synthesized for the Max78000 board.
· Network also makes full use of the Max78000 neural network accelerator and other features.
· Github for the trained network: https://github.com/popovici/Dominic_Lightbody/tree/main/EASYAnom

Section 3: Lightweight Intrusion Detection System for a generic program

The IDS proposed in the paper and developed in our group: “Lightweight Anomaly Detection Framework for IoT”, IEEE ISSC 2020, consists of various flavours of ARIMA/SARIMA models, which is used to find if the measured data is within the normal distribution and has a small difference in the integral for the window of points being analysed.

A power model of the application running on MAX78000 is generated off-line using various flavours of ARIMA (SARIMA and Central ARIMA). This model will generate a time series of points which is compared to real measurements obtained through a power monitor which in our case was an external device provided by ST Microelectronics. In the future we plan to use the internal power monitor of MAX78000 or an external custom circuit built around MAXIM parts. After generating the statistics of the two points (the predicted and the measured), an IDS probability is generated.

After optimising the detection method, it was found that TPR can be as high as 80%. This IDS shows better or similar performance compared to other non-neural network algorithms, which have TPR ranging from 45.74% to 87.65%. It also performed much better when compared to another lightweight algorithm in “Anomaly Detection in Sensor Systems Using Lightweight Machine Learning”, which achieved 53.6% TPR. Our framework provides the potential of implementation on a low power device with a high detection rate.

Our algorithms focus on anomaly detection within the computing unit by tracing intrusions in the code running on it. The nice feature of the algorithm is its light weight (10 additions and 8 multiplications to implement SARIMA and double of that for Central ARIMA). However, the accuracy of the IDS depends also on the power monitor. Further work is and analysis is required towards minimising the power consumption of the power monitor and power consumption comparison with the CNN methodology presented above.

Please make sure to check the attached PDF for full details on the project, including figures and results