On the basis of the company's existing business data, it applies the research results of data integration acquisition technology and relies on the company's enterprise information model standards to achieve unified and standardized storage and management of defect data. Apply knowledge mapping and big data analysis technologies to analyze the data correlation relationship between various systems to achieve lean management of relay protection device fault data. Apply knowledge mapping technology and information extraction methods to build a knowledge map of relay protection device faults, and effectively summarize various types of defect data, so that the data and the business can be mutually corroborated, and gradually realize the full coverage of the knowledge map of equipment defects. About equipment fault ontology relationship architecture is shown in Fig. 2 .
Information extraction extracts and stores the knowledge contained in the information source through the processes of identification, analysis, screening and generalization to form a knowledge base.More research is currently being done on natural language. For some knowledge contained in databases or textual data, although it has an explicit data form, it is not enough with current machine learning methods to achieve automatic extraction of information extraction by learning rules.
Currently, information extraction is based on manual or machine-assisted manual acquisition. For example, in acquiring the experiential knowledge of experts in a certain field, professional knowledge engineers are required to communicate directly with the experts for many times, or even need to participate in the process of dealing with the problems on the spot to learn, until the knowledge engineer believes that he acquires and understands the experience of the experts, and then the corresponding personnel collate, analyze and summarize the experience, establish a model, and take the appropriate structure to convert the knowledge into the required format through the computer to further learn it. into the desired format and learn it further through computers. In power big data, knowledge extraction is the process of transforming multiple quantitative data into abstract mathematical representations. Information extraction is an important part of constructing a large-scale knowledge graph.
Structured data are data that have clear, definable relationships between data points and contain a predefined model. Structured data are in fact data of high quality, and how to analyze this type of data in constructing a knowledge map of relay protection device faults is also an important task.
This paper deals with structured data related to equipment failures mainly equipment ledger data. Since it is stored in a relational database (with structured characteristics), it can be mapped to the knowledge graph ontology fields for quick conversion to the Resource Description Framework(RDF)data, Structured data extraction is shown in Fig. 3 .
During the operation of the equipment the system generates a large number of unstructured defective output files. These files do not have a consistent structure, cannot be represented in a unified way, and cannot be stored in a database. In practical applications, people need to read, retrieve, modify and update unstructured information, which wastes a lot of manpower and leads to extremely low efficiency in transaction processing. There is an urgent need to convert unstructured into the form of structured data as a means of completing the integration of heterogeneous data sources, realizing data consistency, and eliminating the heterogeneity of data. It is necessary to process and integrate the dispersed data in heterogeneous data sources to shield the logical and physical differences between various original data, and ultimately complete the unified representation, storage and management of data.
The unstructured data related to equipment defects mainly comes from the maintenance record text data generated by the daily maintenance of relay protection devices. Take a certain overhaul record as an example, the content is as follows: "On 4 December 2019, the Electric Power Research Institute went to Binhai Company's 110kV High-Tech Park line to carry out transformer cable terminal local discharge detection, the test found that there is a high-frequency suspected discharge signal in the cable terminal of phase A of the No. 2 main transformer, based on the existing test situation shows that, judging initially is caused by external factors of the retaining ring and conductor Poor contact, which is a general defect, and the subsequent discovery that the equipment was produced by Jiangsu Huapeng Transformer Co. on 1 August 2013 and put into operation on 5 March 2014."
In order to extract the required entities, relationships from these unstructured data, natural language processing (NLP) technology can be used for processing. Currently information extraction methods are divided into three main categories: rule templates, machine learning and deep learning.
Match the text to be recognized with the rule template constructed through a priori knowledge, so as to complete the information extraction. And with the improvement of the semantic feature template of the expressed text, its extraction effect will be better than other algorithms.
However, this approach relies heavily on language, domain knowledge, etc., which makes it difficult to write rule templates and less applicable, resulting in the need to build different rule templates for different domains. All in all, this method has high labour cost, the extraction effect depends on whether the rule template is well written or not, and the generality is weak due to its strong targeting.
Machine learning information extraction algorithms include HMM model, MEMM model, and CRF model. Among them, the HMM model is suitable for predicting the state process of the system, which can predict the annotation results of the subsequent text according to the current text sequence annotation, so as to complete the text information extraction. However, it should be noted that there are two precondition assumptions:
The MEMM model, on the other hand, uses a discriminative model to build classifiers for each momentary state, and eventually multiplies all classifier probability values together. Since MEMM only considers obtaining the local optimal solution, it cannot obtain the global optimal solution. Therefore, although the model solves the HMM output independence assumption, there is a labeling bias problem.
The CRF model completes the data annotation by calculating the joint probability. Although CRF solves the problem of annotation bias in MEMM, and can make use of contextual text features to make the current moment output affect the previous moment output, but also due to the introduction of too many features to make the CRF model become more complex, reducing the training efficiency.
As the development of deep learning technology becomes more and more mature, experts and scholars have found that it can achieve better results in the field of natural language processing, and therefore it is gradually being used in a variety of types of text tasks, such as sentiment analysis, machine translation and entity extraction.
Recurrent neural networks are the earlier proposed and most used. However, as experts and scholars continue to deepen their research, it is found that recurrent neural networks have the problem of gradient disappearance or gradient explosion in the process of practical application. In order to solve the above problems, the long and short-term memory neural network uses 'gates' to control the memory information on the basis of recurrent neural network. Although this neural network solves the gradient disappearance or gradient explosion problem of recurrent neural networks, it still cannot solve the problem that recurrent neural networks cannot carry out parallel computation, which makes the model computationally intensive and time-consuming when it needs to deal with the task of large data volume.
Later, with the proposal of Attention, researchers found that compared with recurrent neural networks and their improved models, Attention-based neural networks can save time and space costs by selectively focusing on part of the information, thus reducing the computation amount of the model. Therefore, more and more scholars and enterprises began to study and apply Attention-based neural networks, taking Google's Bert model (based on Attention) as an example, through experiments, it was found that in the field of natural language processing, the Bert model achieved very good results in 11 tasks.
The emergence and rise of deep learning can have better generalization while solving the problem that traditional machine learning cannot handle long sequences of text. This is because deep learning models contain embedded layers that can convert text in the form of word vectors into word vector matrices, which is helpful for subsequent calculations, and the models are also able to adjust the network structure and parameters in order to achieve better experimental results.
Through the research and analysis of the three information extraction techniques mentioned above, this paper will use the deep learning model for text data information extraction, and the main extraction object is the entity in the text.
Unstructured data is difficult to represent using relational databases. In addition, unstructured data not only has a large volume of data, but its value density is usually higher than that of structured data. Therefore, this research is mainly designed to complete the processing of unstructured data, after obtaining the text data (unstructured data), the brat tool is used to annotate the entities, and the annotated data is transformed into the format used for training, and then through the data pre-training, the information extraction model is constructed, and then model training is carried out, and finally, the results are analyzed and optimized. The overall process of model training is shown in Fig. 4 .
The overall framework for information extraction model training and optimization is shown in Fig. 5 . It is mainly divided into three sub-modules:
The word vector representation layer can transform text utterances into real-valued vectors and use the vectorised text as input to the neural network. For a defective text utterance , which indicates a single character in the text. Use word2vector pre-training to obtain a text vector representation ,and as input to the neural network feeds it into the feature extraction layer, which indicates dimensional vector。
The feature extraction layer is a joint extraction of utterance features by taking advantage of the fact that Bi-directional Long Short-Term Memory(BiLSTM) networks and iteratively expanding convolutional neural networks (IDCNN)are able to extract textual features at different levels, so as to extract deep semantic information in text vectors.
The basic model structure of fusing Attention neural network based Bert model with BiLSTM-IDCNN text feature extraction is shown in Fig. 6 .
BiLSTM network, superimposed by forward LSTM (Long Short-Term Memory) and backward LSTM, belongs to Recurrent Neural Network (RNN), which can analyze the contextual information of the text and effectively solve the unidirectionality problem of LSTM, which can only analyze the information above the text. In selecting parameters for BiLSTM gate computations, it is essential to consider the number of hidden units, typically ranging from 64 to 512 based on task complexity and dataset size. The learning rate is generally initialized at 0.001 but should be adjusted according to model performance during training. Batch sizes are usually set between 16 and 256, balancing training stability and resource constraints. These parameters must be fine-tuned to meet the specific needs of the application scenario and optimize model effectiveness. IDCNN can effectively solve the over fitting problem of traditional CNN, which is caused by many parameters and large model.
Contextual features are learnt for to obtain the forward output as well as the reverse output at the current moment, and then by splicing the results of the reverse output and the reverse output, the representation , of the contextual features contained in the input word at the current moment is finally obtained as the output of the BiLSTM network. In order to reduce the loss of resolution due to the possible inconsistency in the number of interneurons between the BiLSTM network and the IDCNN network, the output of the BiLSTM network is convolved and the convolution result is used as the input of the IDCNN network:
The relationship between the input and output of the convolutional layer is shown in equation:
Included among these, denotes the input size, denotes the size of the convolution kernel(kernel size), Indicates stride(stride), Indicates a boundary expansion(padding), Indicates output size(output).Afterwards, at the IDCNN layer, the features learnt by splicing each inflated convolutional block are obtained which denotes the inflated convolutional block output, denotes the parameter in this inflated convolution block. Finally, the fully connected matrix is obtained through the fully connected layer of the neural network ,Analyzing the matrix yields a sequence of labels for that text.
The CRF constraint layer is used to adjust the text label sequence by calculating the joint probability, which can avoid sequence dependency and thus get the optimal result. In selecting parameters for CRF joint probability calculations, it is crucial to consider the transition features and their associated weights, which define the relationships between labels and help ensure that the predicted sequences are coherent. The learning rate should be carefully chosen, often starting around 0.001, and adjusted based on training performance to optimize convergence. Dropout rates can also be applied to further enhance generalization, commonly set within the range of 0.2 to 0.5. These parameters need to be fine-tuned to fit the specific characteristics of the dataset and task requirements. The resultant output of BiLSTM-IDCNN is processed as the input of CRF, and for a predicted label sequence, the score function is expressed as:
The final model objective function is:
Maximizing the log-likelihood of in training prevents the results from overflowing:
Included among these, denotes the transfer matrix, denotes the probability that label is transferred to label , denotes the probability that the word in the text is the label, denotes the true tagged value.
Overall, IDCNN has the great feature of speed. If used alone with BiLSTM for information extraction, IDCNN is still slightly worse than BiLSTM in terms of effectiveness, but it will be faster than BiLSTM.However, the joint extraction of utterance features method combining BiLSTM model and CRF model proposed in this paper has better performance for unstructured fault output text information dataset.
Knowledge fusion mainly involves conflict detection and consistency checking of the acquired data to verify the accuracy of the knowledge and eliminate duplicate and fallacious concepts. On this basis, correspondence association and merging operations are performed on the verified error-free knowledge. The process mainly includes two aspects: knowledge integration and entity mapping. Knowledge integration refers to combining the existing knowledge base with other sources such as third-party knowledge base or structured information; entity mapping refers to the process of identifying and associating entities in natural language text to the original entities in the knowledge base, which is specifically manifested in the elimination of entity disambiguation and the resolution of the co-reference problem. Deep learning models have become the dominant approach for extracting information from textual data due to their strong capability in automatically capturing semantic and syntactic features. Unlike traditional methods that rely heavily on manually designed features and domain-specific rules, deep learning models can learn high-level representations directly from raw text, significantly reducing the need for feature engineering and improving generalization across different domains and languages.
Moreover, these models excel at handling complex linguistic structures and contextual dependencies, which are crucial for tasks like named entity recognition, relation extraction, and entity disambiguation -- key components of knowledge extraction and mapping. With architectures such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and especially transformer-based models like BERT, it becomes possible to model long-range dependencies and dynamically adjust word representations based on context, thereby enhancing the precision and robustness of extracted knowledge. After effectively solving the knowledge conflict problem, knowledge storage and visualization can be carried out.
Knowledge reasoning process refers to the initial construction of the knowledge graph, the use of known entities and relationships carried by the information, through the corresponding algorithms to deduce the entities or relationships not yet included in the knowledge graph, which is essentially the process of deducing the performance of unknown or new knowledge on the basis of existing knowledge. Its core focus is on the prediction of triples, and the types of reasoning can be broadly classified into simple structural reasoning and complex structural reasoning.
Simple structural reasoning treats the triples in the knowledge graph as isolated knowledge units, focusing on modeling the structure of individual triples itself, without considering the network connectivity they form through relationships. In this type of reasoning, each triad is treated as an independent fact, and the basis of its reasoning is limited to the internal information of the triad. For simple structural reasoning, based on the design principle of the objective function, the models can be classified into the categories of semantic matching models, transformation models, and neural network models, etc., which mainly aim at how to induce the head entities, relations, and tail entities to interact with each other in order to form a reasonable statement of fact.
Since single-structure reasoning ignores the connections between entities through relationships, it fails to make full use of the intrinsic structural information of the knowledge graph at the input level of the model, which means that even if the model is subsequently optimized with more sophisticated processing, it may still be difficult to make up for the lack of information brought about by ignoring the overall relevance of the model. In contrast, the complex structural reasoning approach effectively meets this challenge by exploring the graphical structure of the knowledge graph at different levels, and then subdividing it into two categories: local subgraph structural reasoning models and global path structural reasoning models. The advantage of this type of model is that it can fully consider and utilize the complex connections between entities to achieve more comprehensive and accurate knowledge reasoning.