An outlier is data whose characteristics are relatively different from those of the dataset. A model being trained to make conclusions from observed features cannot ensure to give the right conclusions if the observed features are different from those it knows. Thus, spotting outliers in a production set helps to understand a wrong prediction by a model.
To determine the outliers present in a production set, we use an auto-encoder algorithm.
The auto-encoder algorithm uses in the first place a convolutional neural network to reduce the images dimension to a space composed of the most important common features.Then the algorithm tries to reconstruct the images from the information the neural network has drawn from them.
The neural network trains on a dataset in order to minimize the reconstruction error of the images dataset.
The images in production are then passed through the same neural network and are assigned an outlier score which is their reconstruction error defined as below :
The higher this score is, the more the image can be considered as an outlier.
A threshold can be imposed on the outlier score by choosing a percentage of outliers on the images in production.
The images considered as outliers and falsely predicted by the model can be judiciously added to the dataset in order to increase the chances of having good predictions for these types of images.
Here is the research paper that inspired this metric 👉 [Use of Uncertainty with Autoencoder Neural Networks for Anomaly Detection](https://hal.archives-ouvertes.fr/hal-03233919/document
Updated 10 months ago