[期刊论文][Full-length article]


ABC-Fusion: Adapter-based BERT-level confusion set fusion approach for Chinese spelling correction

作   者:
Jiaying Xie;Kai Dang;Jie Liu;Enlei Liang;

出版年:2023

页    码:101540 - 101540
出版社:Elsevier BV


摘   要:

Chinese spelling correction (CSC) aims to automatically detect and correct spelling errors in Chinese sentences. Recently, the method that combines a pre-trained language model with external knowledge has achieved excellent performance. The knowledge is either derived from multi-modal information such as pronunciations and glyphs, or from a confusion set that collects confusing character pairs. However, existing advanced multi-modal knowledge based methods have superior performance at the cost of largely increased model size; and although context semantics is essential for CSC, current confusion set based methods fail to use the confusion set to model the semantics as they do not fuse the lexical feature. To deal with these issues, we propose an Adapter-based BERT-level Confusion Set Fusion method which fuses BERT with the semantics of confusing characters in the semantic encoding phase. A lightweight adapter is designed to be placed between BERT layers, which dynamically extracts the relevant knowledge among the confusing candidates and integrates it with the context. In this way, the contextual information and the semantics of the candidates can fully interact within BERT. Experiments 1 are conducted on three benchmarks. The results demonstrate that our method outperforms the previous confusion set based methods and shows comparable performance with the multi-modal knowledge based methods. Introduction Chinese spelling correction (CSC) aims to detect and correct spelling errors in Chinese. Chinese characters are ideograms that are different from alphabetic writing, so most Chinese spelling errors are phonetically or visually similar (Liu et al., 2010). These spelling errors are commonly seen in people’s writing, and are challenging technologies like automatic speech recognition (ASR) and optical character recognition (OCR). CSC systems require both human-like comprehension and similarity knowledge to make accurate modifications. Table 1 shows two examples of Chinese spelling errors. In the first example, the model needs to use phonetic similarity knowledge to select “ ” whose pronunciation is similar to “ ”, rather than “ ” and “ ” which can form a smooth sentence but change the original meaning of the sentence. Likewise, in the second example, the model requires visual similarity knowledge to correct “ ” to “ ”, instead of “ ” and “ ”. Previous researchers applied language models (Yu and Li, 2014) and sequence-to-sequence (Seq2Seq) models (Li et al., 2018, Wang et al., 2019) to carry out the spelling correction task. Later, pre-trained language models like BERT (Devlin et al., 2019) emerged and achieved significant advancements in a variety of language tasks owing to its strong semantic intelligence capabilities. In CSC, it has been demonstrated that fusing similarity knowledge with pre-trained language models is effective. It is generally necessary to use a knowledge fusion module to combine similarity knowledge with a pre-trained model like BERT, where the knowledge fusion module plays the role of encoding and fusing the similarity knowledge. Based on the type of knowledge applied by the existing methods, we divide the existing methods into two categories: (1) multi-modal knowledge based methods which use multi-modal knowledge including the pronunciations and glyphs of Chinese characters (Xu et al., 2021, Huang et al., 2021); (2) confusion set based methods which use confusion sets containing phonetically and visually similar character pairs (Wang et al., 2019, Cheng et al., 2020). However, both kinds of existing methods are faced with challenges. As to the multi-model knowledge based methods, they increase the size of the model to a large extent, since more parameters are needed to extract knowledge by adding various components to a pre-trained language model. For the confusion set based methods, they directly fuse the representation of the confusion set knowledge with the final output of a pre-trained model. Such a fusion strategy is called model-level fusion, as shown in Fig. 1(a). However, the strategy fails to make good use of the lexical knowledge in the confusion set as the fusion does not happen in the BERT encoding process. Jawahar et al. (2019) proved that BERT captures lexical-level information in lower layers, learns syntactic features in middle layers and semantic features in higher layers. The lexical-level information is diluted as the number of layers increases. Meanwhile, individual Chinese characters have rich meanings and the CSC model needs to concentrate on the correction of individual characters. Whereas the high-level semantic representation dilutes the lexical-level information of the original sentence, the model cannot make good use of lexical knowledge from confusion set to focus well on specific characters according to the high-level semantic representation. We believe that the confusion set knowledge as a kind of lexical knowledge should be integrated into the encoding process within BERT. In this way, it allows the model to use the lexical knowledge of the confusion set to model the semantics of the original sentence, helping the model to better identify and correct errors. And we could fully exploit the confusion set knowledge to make the model achieve comparable results to large multi-modal knowledge based models without introducing a high number of parameters. The significance of the two factors combined is seen in Table 2. Both “ (jı̄ng)” and “ (jı̄n)”, which are frequently used words in Chinese, are easily confused due to their similar pronunciations. And the model must integrate the context semantics with the meanings of confusing candidates to determine whether the original input needs to be corrected. To solve the above issues, we propose an Adapter-based Bert-level Confusion Set Fusion Approach (ABC-Fusion). ABC-Fusion applies a BERT-level fusion strategy to fuse the semantic information of confusing characters in the semantic encoding phase, as shown in Fig. 1(b). The BERT-level fusion strategy is that the knowledge fusion module is involved in the BERT internal encoding process and designed to be inserted between the BERT layers. It enables the representation of the similarity knowledge to be fused with the representation from the lower layers of BERT and then sends the fusion result to the subsequent BERT layers, so that the similarity knowledge participates in the BERT encoding process of the original sentence. Specifically, we find some corresponding confusing characters from a confusion set for each character in a Chinese sentence at first. Then, in order to achieve lexical-level fusion of the original sentence and the confusing characters in the encoding phase, we specifically design a BERT structure with a lightweight adapter inserted as a knowledge fusion structure between its bottom layers of BERT. And the knowledge of confusing characters is not always useful for correction. To enable the model to focus on key information, we design a multi-headed attention mechanism in the adapter. The adapter utilizes the multi-head attention mechanism to adaptively extract relevant knowledge from the confusing candidates and incorporates the information into the contextual semantic modelling process, which facilitates the model to correct spelling errors. There are three advantages of our approach. First, our model incorporates similarity knowledge in the semantic encoding phase through an adapter module inserted at the bottom of the model, leveraging confusing characters to help model the semantics. Second, our method changes the internal structure of the BERT only, which is complementary to the previous model-level fusion approach. Our model can theoretically take the place of the original BERT of model-level fusion approaches to produce better outcomes. Third, our model is easy to train and deploy. The confusion set used does not require additional encoders, which keeps the adapter simple and takes advantage of BERT’s own capabilities for further fusion as the fusion results continue to be sent to the subsequent BERT’s layers after the adapter. We conduct comprehensive experiments on three open benchmarks to verify the effectiveness of the suggested model. The results show that our approach outperforms BERT on CSC by a wide margin. And our approach is superior to other confusion set based models and performs comparably to complicated structural models that employ multi-modal information. In addition, we present full comparisons and analyses of the knowledge fusion module location and the setting of confusing candidates, showing that the representations of lower layers can better fuse lexical similarity knowledge with the impact of confusing candidates in the model. And we provide some examples to show the features of our model and how the adapter works on confusing candidates. To summarize, the contributions of this paper are summarized as follows: • We propose a novel Chinese spelling correction method, ABC-Fusion, which fuses confusion set based on a specifically devised adapter enhanced BERT model. By introducing an adapter layer inside the BERT structure, it can effectively capture the low-level lexical information which is crucial for CSC. • We introduce a confusion set adapter to make use of the confusion set knowledge to capture the semantics in encoding phase. Specifically, we introduce a certain amount of confusing candidates for each character and apply the multi-head attention mechanism in the adapter to adaptively extract and fuse the knowledge from the confusing candidates. • Experimental results show that our approach outperforms other confusion set based approaches and most multi-modal knowledge based approaches, which confirms that confusion set knowledge is more appropriate for fusion into the lower layers of BERT. Section snippets Chinese spelling correction Chinese spelling correction has received a lot of attention over the last two decades owing to its unique and challenging nature. Early spelling correction task was accomplished on the basis of human-made rules. Such rule-based approaches (Angell et al., 1983, Ren, 2001) used knowledge like dictionaries and confusion sets to identify erroneous characters and provide similar word suggestions within that knowledge. These methods are limited by rules and ignore the role of context for error Approach We propose an adapter-based BERT-level confusion set fusion model called ABC-Fusion for CSC. The architecture is shown in Fig. 2, which consists of two components, BERT as the correction network and the adapter as the confusion set knowledge fusion module. Datasets We conduct series experiments to validate the effectiveness of our model on CSC. The dataset statistics are summarized in Table 3. The training dataset consists of three human-annotated datasets provided in SIGHAN13 (Wu et al., 2013), SIGHAN14 (Yu et al., 2014) and SIGHAN15 (Tseng et al., 2015), and a large machine-generated annotated dataset proposed in Wang et al. (2018). There are 277,454 sentence pairs in the training dataset. We evaluate our model on three benchmark datasets from SIGHAN13, Conclusion In this paper, we propose an approach called ABC-Fusion for Chinese spelling correction, it combines a specifically devised adapter with BERT to fuse confusion set knowledge based on lexical-level features. We argue that the lexical similarity information from the confusion set is more suitable to be fused with the representations from lower layers in BERT. In this way, it allows the model to take into account the semantic information of the confusing characters and the context together in Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgments This research is supported by the National Natural Science Foundation of China under grant No. 61976119 . References (30) Angell R.C. et al. Automatic spelling correction using a trigram similarity measure Inf. Process. Manag. (1983) Yu L. et al. Overview of SIGHAN 2014 bake-off for Chinese spelling check Bapna A. et al. Simple, scalable adaptation for neural machine translation Chang C.H. A new approach for automatic Chinese spelling correction (1998) Cheng X. et al. SpellGCN: Incorporating phonological and visual similarities into language models for Chinese spelling check Devlin J. et al. BERT: Pre-training of deep bidirectional transformers for language understanding Guo Z. et al. Global attention decoder for Chinese spelling error correction Hou W. et al. Meta-adapter: Efficient cross-lingual adaptation with meta-learning Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., de Laroussilhe, Q., Gesmundo, A., Attariyan, M., Gelly, S.,... Huang L. et al. PHMOSpell: Phonological and morphological knowledge guided Chinese spelling check Jawahar G. et al. What does BERT learn about the structure of language? Li, C., Chen, J., Chang, J.S., 2018. Chinese Spelling Check based on Neural Machine Translation. In: PACLIC... Li C. et al. Exploration and exploitation: Two ways to improve Chinese spelling correction models Li Y. et al. The past mistake is the future wisdom: Error-driven contrastive probability optimization for Chinese spell checking Liu W. et al. Lexicon enhanced Chinese sequence labeling using BERT adapter View more references Cited by (0) Recommended articles (6) Research article CAMELAMA: Cooperative awareness and spaceborne monitoring enabled by location-assisted medium access Computer Communications, 2023 Show abstract In beaconing systems such as AIS or ADS-B, used by ships and aircraft, each node periodically broadcasts its navigational state to nearby nodes to increase traffic safety. Nowadays, these beacons are also used as a source for satellite-based global traffic monitoring. This dual use imposes competing needs on the medium access control protocol as the size of the collision domains varies by a large factor between the use cases. Even the subproblem of solely avoiding terrestrial nodes’ packets to collide from the perspective of a receiving satellite is not trivial to solve if the satellite’s collision domain spans multiple hops in the terrestrial network. Based on ideas of the LAMA protocol, we propose CAMELAMA, a novel contention-free medium access control protocol for position awareness beaconing. Our low-overhead approach uses neither forwarding of node state nor handshakes but only the navigational data that is shared between the terrestrial nodes anyway. CAMELAMA provides local cooperative awareness while at the same time desynchronizing transmissions within a satellite’s collision domain even in terrestrially disconnected topologies. In a simulation-based performance evaluation we find that CAMELAMA outperforms SO-TDMA (the MAC protocol of AIS) and also scales better with respect to high node densities. Research article Harnessing the information contained in low-quality data sources International Journal of Approximate Reasoning, Volume 55, Issue 7, 2014, pp. 1485-1486 Research article High-resolution computerized tomography for ossicular replacement prostheses Acta Otorrinolaringológica Española, Volume 74, Issue 4, 2023, pp. 239-242 Show abstract To study the accuracy of high-resolution computed tomography (HRCT) for assessing the ossicular structures in cadaveric temporal bone by the distance between temporal bone elements is of great interest. To record the distances between the malleal neck and both the stapedial head and footplate by HRCT. Further, after partially opening the temporal bone toward the ossicular structure, to record the actual distances between those structures during surgical dissection. This study compared actual and HRCT measurements of cadaveric temporal bone. We studied, measured, and recorded distances within and between various structural elements. All data are reported as means and were analyzed to prove the accuracy of HRCT to assess ossicular structure from the temporal bone. This study included the temporal bones of 10 male and 10 female cadavers (mean age, 70.4 years). By surgical dissection, the distances between the malleal neck and the stapedial head and footplate were 3.40 and 5.30 mm, respectively (measured from the bone); by HRCT, the corresponding values were 3.35 and 5.29 mm. The intraclass correlation coefficients for assessing ossicular structure in contrast to the actual measurements were 0.901 (malleal neck to stapedial head) and 0.923 (malleal neck to stapedial footplate) (p < 0.05). There were no differences between the actual malleal neck to stapedial head (p = 0.793) or footplate (p = 0.242) measurements. HRCT produced statistically comparable, reliable, and accurate measurements compared with actual measurements in cadaveric temporal bone. Research article Response of oxidized asphaltene aggregations in presence of rejuvenators and characteristics of molecular assembly behavior Construction and Building Materials, Volume 397, 2023, Article 132468 Show abstract In this work, a high quantum level of density functional theory (DFT-D) approach and fourier transform infrared spectrum (FTIR) at a micro-level were used to understand one phenomenon: the specific molecular mechanisms responsible for the deagglomeration effect of 10 kinds of representative molecules of bio- and petroleum-based rejuvenators on asphaltene aggregations found in the aged asphalt. The DFT results indicated that the action site of rejuvenators in sulfoxide group (S = O) position was bound stronger than those in carbonyl (C = O) and pyridine nitrogen. Compared with petroleum-based rejuvenators, bio-rejuvenators were responsible for significant contribution to the deagglomeration process when they were adequately inserted into the oxidized asphaltene dimer. Especially, 2-methoxyphenol (phenolic compounds) showed significantly higher efficiency than either of them. The deagglomeration behavior was mainly attributed to a new interaction (hydrogen-bonds of O-H··O/N-H··O and dispersion-attraction) of a series of competing factors between the rejuvenators and the asphaltenes. This polarized the charge distribution throughout the aromatic region of the asphaltene dimer and built up a multi-centered electron density (such as a three-center four-electron H-bond), thus destroying eventual interactions between asphaltene stacks and reducing the extent of clustering of asphaltene units that were intensified during oxidative aging. The FTIR data painted a preliminary picture of the hybrid rejuvenator-asphaltene as aromatic cores with shorter aliphatic side chains and more carbonyl and hydroxyl side groups, and showed the absence of chemical bonding interaction between the rejuvenators and asphaltene nanoclusters. Research article Classification of stuttering – The ComParE challenge and beyond Computer Speech & Language, Volume 81, 2023, Article 101519 Show abstract The ACM Multimedia 2022 Computational Paralinguistics Challenge (ComParE) featured a sub-challenge on the classification of stuttering in order to bring attention to this important topic and engage a wider research community. Stuttering is a complex speech disorder characterized by blocks, prolongations of sounds and syllables, and repetitions of sounds and words. Accurately classifying the symptoms of stuttering has implications for the development of self-help tools and specialized automatic speech recognition systems (ASR) that can handle atypical speech patterns. This paper provides a review of the challenge contributions and improves upon them with new state-of-the-art classification results for the KSF-C dataset, and explores cross-language training to demonstrate the potential of datasets in multiple languages. To facilitate further research and reproducibility, the full KSF-C dataset, including test-set labels, is also released. Research article CLSpell: Contrastive learning with phonological and visual knowledge for chinese spelling check Neurocomputing, Volume 554, 2023, Article 126468 Show abstract The task of Chinese Spelling Check (CSC) is to identify and correct spelling errors in text, which are mainly caused by phonologically and visually similar characters. Although pre-trained language models are helpful for this task, they lack phonological and visual information. Previous works have primarily focused on identifying errors based on local contextual data, while neglecting the importance of sentence-level information. To address the above issues, C ontrastive L earning S pell (CLSpell) is proposed, which combines phonetic and glyphic information through contrastive learning and simultaneously acquires local and global information through multi-task joint learning. During pretraining, token representations are learned using a combination of phonological, visual, and semantic information. Moreover, we propose to include an auxiliary task of correct sentence discrimination in the multi-task joint training process to capture sentence-level information. Experiments on widely used benchmarks demonstrate that the proposed method surpasses all competing methods. 1 Code are available at https://github.com/jying2023/ABC-Fusion . View full text © 2023 Elsevier Ltd. All rights reserved. About ScienceDirect Remote access Shopping cart Advertise Contact and support Terms and conditions Privacy policy We use cookies to help provide and enhance our service and tailor content and ads. By continuing you agree to the use of cookies . Copyright © 2023 Elsevier B.V. or its licensors or contributors. ScienceDirect® is a registered trademark of Elsevier B.V. ScienceDirect® is a registered trademark of Elsevier B.V.



关键字:

暂无


所属期刊
Computer Speech & Language
ISSN: 0885-2308
来自:Elsevier BV