The goal of the NIO Power business team is to build a globally innovative smart energy service system, a power-on solution based on the mobile Internet, and an extensive network of charging and swapping facilities. Relying on NIO Cloud technology, we can build a "reliable The energy service system "rechargeable, replaceable and upgradeable" provides car owners with full-scenario power-up services.
NIO Power equipment operation and maintenance services mainly include NIO power swap stations, NIO super charging piles, 7KW home charging piles 2.0, 20KW home fast charging piles and other equipment; this service currently faces many challenges, mainly including:
① Ensure that the equipment has no safety hazards.
#② User complaints: Poor power-up experience.
③ The success rate of charging and swapping is reduced due to equipment failure.
#④ Downtime due to equipment failure.
#⑤ Operation and maintenance costs are high.
##The company's four main types of charging and discharging equipment (battery swap stations, super charging piles, 7KW home charging piles, 20KW home fast charging piles) all contain a large number of sensors, so the data collected by the sensors in real time are unified and connected to the NIO Energy Cloud Unified storage and management are carried out, and predictive maintenance technology based on PHM (Fault Prediction and Health Management) is introduced. Through a series of AI algorithms, such as GAN (Generative Adversarial Network) and Conceptor (Conceptor Network), the abnormal detection status of the equipment is obtained. and fault diagnosis, and provide the optimal predictive maintenance decision-making solution for the equipment based on the diagnosis prediction results, and issue relevant operation and maintenance work orders to achieve:
① Eliminate equipment safety hazards.
#② Reduce user complaints about poor power-on experience.
#③ Improve the success rate of charging and swapping.
#④ Reduce downtime caused by equipment failure.
#⑤ Reduce operation and maintenance costs. Therefore, the introduction of PHM technology and algorithms has effectively helped the company improve its smart energy service system and form a closed loop, thereby improving and optimizing NIO Power's service capabilities.
3. Challenges faced by PHM technologyCutting-edge PHM technologies are all based on data-driven artificial intelligence technology, and "data-driven" relies on A large number of samples and labels are used to build models, and models are often built under ideal scenarios, but real scenarios are often not ideal.
As can be seen from the above picture, real scenes often have the following characteristics:
① There are few fault samples. ② It is difficult to label fault samples. This brings about two types of problems faced in this scenario: one is unsupervised learning problems, and the other is small sample learning problems. In response to these two types of problems faced in real scenarios, we proposed the following Several cutting-edge PHM technologies are applied in NIO Power scenarios. Generative adversarial network, proposed in 2014, is a type of unsupervised learning technology based on deep learning, which is mainly composed of two sub-networks: generator and discriminator. 4. PHM cutting-edge technology
1. Unsupervised anomaly detection based on Generative Adversarial Network (GAN)
(1) GAN structure
The red G network in the above picture is the generator network, and the blue D network That is the discriminator network.
Generator networkInput a random number distribution (such as Gaussian distribution) and output a specific distribution specified by the user; From the perspective of samples, 100 data sampled from a random number distribution are input to the G network. The G network will map these 100 samples into the same space as the real data to form a distribution G(z), and The discriminator network is used to obtain the difference between the two distributions G(z) and the real data X, and then the G network is optimized until the G(z) distribution is close to the real data X distribution. The G network will output these 100 data and form a specific distribution G(z).
The core of the discriminator network is to construct an approximation of the G(z) distribution and the real data x distribution Jensen-Shanon divergence, which measures the difference between the generated distribution and the true distribution. Approximate Jensen-Shanon divergence is implemented through a standard binary classification network based on binomial cross-entropy, and the discriminator network output is a continuous value from 0 to 1. If the output is 1, it is considered that the input sample X comes from the real distribution; if the output is 0, the input sample X is considered to be fake and fake.
In the training form of the GAN network, the samples generated by the generator try to be close to the distribution of real samples, while the discriminator tries to distinguish the generated samples as fake , which provides the generator with a more accurate gradient of the Jensen-Shanon divergence value, allowing the generator to iterate in a better direction. In the end, the two forms a confrontational relationship. The generator "desperately" generates false data, and the discriminator "desperately" distinguishes between true and false input data. The GAN network will eventually reach an equilibrium state: the generated data distribution G(z) just completely covers the distribution of all real samples X.
Understanding the GAN network from a mathematical perspective can be understood from the loss function. The loss function can use the value function V(G, D) to simultaneously optimize the parameters of the G network and the D network through common minmax optimization; for a given G network, the optimization goal is to minimize the value function, as shown in the following formula:
In the formula, JSD is the core optimization term of the loss function, which is the difference between the two distributions. A measure of difference. As can be seen from the formula, the essence of this optimization is to minimize the distribution difference between X and G(z); the smaller the distribution difference, the more successfully the G network is trained.
Based on GAN network, Introduce Auto-Encoder to realize anomaly detection of equipment operating data.
The specific implementation method is:
The first step , build a GAN model and train to obtain a G network, which just reconstructs the distribution of equipment operating data.
The second step is to discard the D network part in the GAN network, fix the G network parameters and introduce the Encoder network before the G network; in this way, the Encoder network and G The network combination forms a set of standard Auto-Encoder networks, and the loss function of this network is the reconstruction error.
In this way, we can complete anomaly detection by optimizing the Auto-Encoder network. The principle behind it is: regardless of the input sample, through the Auto-Encoder network The output samples will be within the normal sample interval. Therefore, if the input sample is a normal sample, the generated sample and the original sample are in the same interval, so the reconstruction error will be very small or even close to 0; and if the input sample is an abnormal sample, the generated sample is still within the normal sample interval. This will lead to a large reconstruction error; therefore, the reconstruction error can be used to determine whether the sample is normal.
The third step is to obtain a series of reconstruction error scores through a small batch of normal samples, and use its maximum value as the reconstruction error threshold for anomaly detection .
ThisThis principle has been fully discussed in the paper, which was published in IEEE in Transactions on Intelligent Transportation Systems in 2022 , the paper information is as follows:
M. Xu, P. Baraldi, X. Lu and E. Zio, "Generative Adversarial Networks With AdaBoost Ensemble Learning for Anomaly Detection in High-Speed Train Automatic Doors," IEEE in Transactions on Intelligent Transportation Systems, 2022.
The second type of technology we use is a small sample fault diagnosis technology of unsupervised RNN (named Conceptor Network: Conceptor).
##firstFirst introduce the background of this technology-unsupervised RNN. Compared with ordinary RNN, the most special thing about unsupervised RNN is that the connection weights of the neurons in the input layer of the network and the connection weights of the hidden layers are randomly initialized and are fixed during the entire training and inference process. This means that we do not need to train the weight parameters of the input layer and hidden layer; therefore, compared with ordinary RNN networks, we can set the hidden layer neurons to be very large, so that the memory period and memory capacity of the network will be If it is very large, the memory period for the input time series will be longer. The hidden layer neurons of this special unsupervised RNN are often called Reservoir.
① Reservoir State UpdateThe state update method is the same as the standard RNN update method.
② Long-term temporal dependencies representation by ConceptorDevelop an unsupervised representation learning method based on this unsupervised RNN. Specifically, input a multi-dimensional time series of variable length, and obtain the RNN hidden neurons of each time step through Reservoir. Status; use the Conceptor method (shown in the light blue box in the above figure) to obtain an N×N dimension concept matrix. Understood in terms of linear algebra, the meaning of this matrix is: when processing time series, for each time step, the time series signal is projected into an N-dimensional space (N corresponds to the scale of the hidden neuron).
If there is ti ## time steps, then these ti points in the N-dimensional space form a point cloud; such points The cloud ellipsoid can be decomposed into N mutually orthogonal directions, and the eigenvectors and eigenvalues in each direction are obtained.
The function of Conceptor is to capture eigenvalues and eigenvectors, and normalize eigenvalues; for these N eigenvectors, it can be understood as The N properties captured in the time series (such as periodicity, trend, volatility and other complex time series characteristics) are the extraction of implicit features; and all extracted feature information is retained in this N-dimensional matrix Middle (that is, the Conceptor matrix, the dark blue box on the right side of the above figure).
According to the basic characteristics of the matrix, the two time series The Conceptor matrix is subtracted and the Frobenius norm is extracted, that is, the Conceptor distance of the two time series is obtained; this scalar can be used to characterize the difference between the two time series.
Based on the above characteristics of Conceptor, it can be used for small sample fault diagnosis Sample troubleshooting analysis.
If there are a small number of actual fault samples (for example, there are less than 10 fault samples), the corresponding time series All are input into the Conceptor network and aggregated to form a corresponding concept matrix, which serves as an abstract representation of the fault mode of this category; similarly, normal samples will also be aggregated into a normal concept matrix. During testing, use the same method to extract the corresponding concept matrix from the input time series, and conduct comparative analysis with the concept matrices of normal samples and abnormal samples to calculate the corresponding concept differences. If the similarity between the input sample and the concept matrix of a specific failure mode is high, the sample can be considered to belong to that failure mode.
This method is also fully discussed in the following paper:
Mingjing Xu , Piero Baraldi, Zhe Yang, Enrico Zio, A two-stage estimation method based on Conceptors-aided unsupervised clustering and convolutional neural network classification for the estimation of the degradation level of industrial equipment, Expert Systems with Applications, Volume 213, Part B, 2023, 118962.
The chain of the battery compartment of the battery swap station cooperates with the battery compartment elevator to lift the batteries entering the warehouse to charging warehouse for charging. If the chain is faulty, it may loosen or even break, which may cause the battery to get stuck during transportation to the charging bin and prevent it from being put into the bin. In addition, if the chain breaks, the battery will fall, causing battery damage or even fire accidents.
# Therefore, it is necessary to build a model to detect the looseness of the chain in advance, prevent the occurrence of related safety accidents in advance, and minimize the risk.
The main variables directly related to chain loosening It is a vibration-related signal. However, the collection and storage cost of vibration data is high, so most equipment does not collect vibration-related signals.
#In the case of missing vibration data, the looseness of the chain can be detected through the torque, position, speed and other signals of the chain drive motor.
Compare the loose chain data and the normal chain data in the figure below, you can It is obvious that the looseness of the battery compartment chain will cause obvious periodic fluctuations in the torque signal, and the amplitude of the fluctuations will show an attenuation trend.
The actual number of samples for this fault is very small, less than 20 samples; however, this type of fault is of high importance , so the accuracy and recall rate of the prediction model are very high.
##① First, divide the original data into time series, and extract the torque data of the uniform process from the long-term series.
#② Then decompose the time series and retain only the fluctuation characteristics of the time series.
#③ Further perform spectrum analysis on the sequence, and finally obtain the spectrum characteristics.
#However, there is more than one frequency band at the moment of failure, and the amplitudes in different frequency bands obey specific distributions. Therefore, it is difficult to identify accurately using traditional methods. The lower rate will cause more false alarms and missed alarms. Therefore, the AE-GAN model is selected to more accurately capture the specific fault distribution under the fault mode, and finally obtain the equipment anomaly score.
Among them, the feature layer is mainly the algorithm module involved in the feature engineering mentioned above; in this case, the algorithm layer uses the AE-GAN algorithm; based on the abnormal score results of the algorithm layer, and the feature table in the feature layer The data is recorded for further judgment and decision-making in the model layer; the final output work order is sent to the specialist for processing.
Based on the above process, traditional expert experience detection is upgraded to AI algorithm detection, and the accuracy rate is increased by more than 30%.
2. Fault diagnosis of overcharge tip deteriorationFirst establish a physical model based on the charging current, voltage, temperature and other physical signals of the charging gun to obtain the physical quantity of the temperature rise coefficient of the gun head, and use this as a characteristic signal for further fault diagnosis. However, this kind of physics-based feature engineering usually uses time sliding windows for feature generation, and finally obtains a new time series as the feature result; such feature sequences are often noisy.
#The following figure is an example. This project usually selects one week or one month of data as the time window to obtain a characteristic time series similar to the figure below. It can be seen from the figure that the noise of this sequence is very large, and it is difficult to directly distinguish between degraded samples and normal samples.
In addition, in actual failure samples, the number of degraded tips is often less than 50.
Based on the above two reasons, the Conceptor model is introduced to get rid of manual experience and automatically capture the time series characteristics of degraded samples through the model.
a. If 50 fault samples are input, 50 concept representation matrices will be obtained;
b. Aggregate the mean of these 50 matrices and multiply them by the center of gravity of the characterization matrix of the fault mode to obtain the characterization matrix under the fault mode;
c. In the model testing stage, the concept matrix is calculated for the input test data, and compared with the representation matrix of the failure mode to obtain the anomaly score.
#Based on the above process, upgrading the traditional mechanism model detection method to a mechanism model combined with AI algorithm detection method can reduce the model false alarm rate to 1/5 of the original.
A1: For the trained AE-GAN model, input a sample into Auto-Encoder to get the reconstruction error of the sample, which is the anomaly score; if the score is less than the specified If the threshold is exceeded, the sample is considered normal, otherwise it is considered an abnormal sample. The premise of using this method is that all training data are normal sample data.
A2: When training the GAN network, either normal data or abnormal data in a specific mode will be used. Mixed data will not be used for training, so there will be no Issues such as sample imbalance. If there is a huge gap between the two types of data samples in the actual data, a GAN network 1 will generally be trained for normal samples, and then a GAN network 2 will be trained for a certain fixed pattern of abnormal samples, and the test samples will be tested based on the reconstruction errors of the two networks. Final judgment.
#A3: Mode collapse is the core problem encountered in GAN model training. First, understand mode collapse, and second, focus on the core tasks of GAN training.
Mode collapse, the data generated by the generator focuses on a specific area; when this happens The reason is that the definition of the loss function in the GAN network is ignored. During the GAN network training process, the loss of the G network and the loss of the D network are usually calculated separately, and the joint loss function of the two networks (i.e., the JSD loss in the formula) is often ignored. If training mode collapse occurs, the JSD loss often does not converge; therefore, visualizing the JSD loss during training can effectively avoid mode collapse. This is also the reason why many recent improved versions of GAN models have been able to stand out and produce better results; in addition, introducing specific tricks into standard GAN networks can also achieve similar effects.
A4: For scenarios where there is a huge gap between positive and negative samples, if you use commonly used LSTM, RNN, GRNN and other models, you will often face the problem that the loss function does not converge; therefore, this type of The way to deal with the problem is often to start from unsupervised learning, randomly fix the weight of the hidden layer of the main network, and use a specific method to regularize the characteristic components of the generated concept matrix; although the weight parameter is random, the obtained representation The components can reflect the hidden characteristics of the time series and are enough to distinguish small sample scenes. The above are the advantages of the RNN hidden layer being randomly fixed.
A5: The model is shown in the figure below.
Among them, the Reserve part is basically the same as the ordinary RNN network, the only difference is Win and W are randomly set (note that they are only randomly generated once); subsequently, each time step The hidden state of the long-term neuron is calculated and updated, and the corresponding concept matrix is obtained. The above is the complete version of Conceptor.
#A6: The following figure is the training process of the Encoder network.
First, a standard GAN will be trained, and on this basis, the parameters of the hidden layer in the G network will be fixed; then in G Insert an Encoder network before the network and connect the two networks to form an Auto-Encoder network. The input of the Auto-Encoder network is the original data sample, and the output is the reconstructed data sample; the AE-GAN network identifies abnormal data by constructing reconstructed samples.
#A7: Please refer to the relevant chapters of the article for details. The code is not yet open source.
A8: It can be used. However, compared with ordinary signals, the image field has higher dimensions, more complex data distribution, and a larger amount of data required for training. Therefore, if it is used for image classification and there are few data samples, the model effect will be compromised; if it is used for anomaly detection, the effect is still good.
A9: The most intuitive evaluation indicators are the false positive rate and false negative rate. More scientific indicators include recall rate, precision rate, F-score, etc.
A10: If there is no more direct and faster way to obtain fault characteristics, a pure data-driven method is generally used to mine fault sample features, usually by building a deep learning network. Key features of fault samples are learned and characterized as concept matrices.
A11: For a small number of samples, the unsupervised RNN method is generally used to represent the data characteristics; if there are a large number of normal samples for anomaly detection problems, it can be used AE-GAN network is implemented.
A12: The concept matrix output by RNN can be understood as the set of all features in the input time series; since the features of the data in the same state are similar, this The concept matrices of all samples in the state are averaged and aggregated, that is, the concept center matrix in this type of state is abstracted; for the input time series of the unknown state, by calculating its concept matrix and comparing it with the concept center matrix, the concept center with the highest similarity is The matrix is the category corresponding to the input data.
#A13: After completing network training, use a small batch of normal sample data to calculate the reconstruction error, and take the maximum value as the threshold.
A14: Generally, it will not be updated, but if the original data distribution changes (such as the operating conditions change), the threshold may need to be retrained, and the threshold may even be changed. Transfer learning related methods are introduced into the GAN network to fine-tune the threshold.
A15: GAN generally does not train the original time series, but trains features extracted based on the original time series.
#A16: Traditional GAN is also often used for anomaly detection. AE-GAN has a more in-depth analysis of the principles of GAN, so it can also avoid problems such as mode collapse to the greatest extent; and the introduction of Auto-Encoder can ensure that the principle of anomaly detection is executed accurately, thereby reducing the false alarm rate.
#A17: The fault diagnosis model is divided into many levels. The results of the model layer are only the basis for the decision-making layer and not the final result. They are generally combined with other business logic to assist judgment.
#A18: Generally based on the results of anomaly detection, a technical specialist will be designated to confirm in the real scene on site.
A19: Related attempts are being made.
A20: The Conceptor model mentioned in the article can handle time series of any length, so there is no need to fill in 0, and it also avoids the parameter "training" process, so this can be circumvented type of problem.
#A21: If it is only used in the field of anomaly detection, in fact, the more "overfitting", the better the model performance will be. In addition, due to the large randomness in the G network of the GAN model during the training process, overfitting generally does not occur.
#A22: This type of problem generally depends on the size of the neural network, the dimensions of the hidden neurons, etc. Generally speaking, for a 2-layer neural network with 100 neurons per layer, the volume of training data needs to be 1-2 orders of magnitude larger than the hidden layer dimension in order to achieve better results. At the same time, it is also necessary to use Some tricks to avoid pattern collapse from happening.
A23: Many Conceptor models currently online use the same set of empirical parameters, without further parameter adjustment; according to practical experience, the relevant parameter settings range from 10 to 100, the difference in the impact on the results is very small, the only difference is in the computational cost. If the sample size of fault data is small and you want the results to be more accurate, you can set the parameters to 128, 256 or even higher. Correspondingly, the calculation cost will be higher. The number of labels for fault analysis is generally between 1 and 10. Business value quantification is generally measured by false alarms and missed alarms, because false alarms and missed alarms can be directly converted into quantitative business value impacts.
#A24: The Conceptor method can be used to use the time growth window to form multiple concept matrices; the concept matrix can be spectrally clustered to determine the time of fault occurrence. See related papers in the Conceptor chapter for details.
#A25: In real scenarios, due to different operating conditions of the equipment, normal data is often very different.
A26: It is difficult to clearly divide the specific usage scenarios of these two models; generally speaking, GAN is better at solving problems with special data distribution and difficult to use classification networks. Unsupervised RNN is more suitable for handling small sample problems.
A27: For domain-specific scenarios, if domain-specific knowledge can be introduced to extract high-order features, it is generally possible; if only images are used for detection, if the image sample size If it is large and can represent normal behavior, the problem can be transformed into CV domain subdivision scene detection, which can be detected using the model mentioned in this article.
A28: It has no parameters.
#A29: It depends on the specific scenario, including model requirements, fault sample size, distribution complexity, etc. If the timing waveforms of two faults are very similar, there is generally no need to train a separate model. You only need to build a multi-classification model to determine the classification boundary. If the data forms of the two fault modes are very different, you can use the GAN model to update the data. Accurate identification.
A30: The training cost of the Conceptor model is very small and can be used to extract features; the training time of the GAN model is relatively longer, but for common structured tabular data, training It won't take too long.
A31: The two models themselves have no requirements on the number of positive and negative samples; considering the model training time, thousands of representative samples are generally selected for training. . There is generally no minimum number recommendation for the number of times in a timing subset.
#A32: The dimension of the feature matrix is directly related to the number of hidden neurons. If there are N hidden neurons, the dimension of the feature matrix is N×N. Considering the complexity of the model and calculation efficiency, N is generally not set too large, and a commonly used setting value is 32.
A33: According to the principle of GAN, D network is used to distinguish normal samples from fake samples; and if fake samples are trained to a "complete body" state, they will be very close to normal samples. , making it difficult to distinguish between normal samples and abnormal samples; and the AE-GAN network assumes that normal samples and abnormal samples have a certain degree of distinction, which is the theoretical basis for using AE-GAN.
#A34: The generalization ability of the model needs to be based on an a priori assumption: all faults of the same type have similar data distribution. If the distribution of similar fault data is quite different, it is generally necessary to further subdivide the fault categories to ensure the generalization ability of the model.
#A35: For the two models mentioned in the article, the data only needs to be normalized.
#A36: Through the completeness of the theory, GAN can more completely describe the distribution of normal sample data, thereby constructing a more complete decision boundary. However, methods such as ordinary AE, isolated forest, and One Class SVM do not have theoretical completeness and cannot construct a more complete decision boundary.
A37: If the discriminator is indeed unable to identify normal samples and fake samples, it can be seen from the side that the training of the generator is very successful; in the anomaly detection stage, only to the generator without using the discriminator. The generator in the GAN network is of great significance, so AE-GAN will not degenerate into AE. It can be understood as an upgraded version of AE, which is a regularized AE.
#A38: In scenarios with small samples and high interpretability requirements, no such attempt has been made, and related attempts may be made later.
A39: VAE is also a commonly used method for anomaly detection. VAE uses a prior Gaussian distribution in the hidden layer and changes the shape of the prior Gaussian distribution to fit real data. Makes the two distributions equivalent; however, the loss function used by VAE is KL divergence instead of JSD divergence, and KL divergence is asymmetric, so it may not work well in complex samples.
#A40: The charging gun case in the article is a case of serious noise. Some decomposition methods based on time series can decompose periodic items, trend items, noise items, etc. in the time series; missing features can be processed using incomplete data methods.
A41: Taking GAN as an example, sample enhancement is mainly performed by adding noise, and the APA enhancement strategy is not used.
A42: The references provided in this article contain many extreme examples. For example, the example you cited is a typical two-gaussian ball example. AE-GAN can solve this type of problem.
The above is the detailed content of NIO deep learning algorithm practice. For more information, please follow other related articles on the PHP Chinese website!