Processing math: 2%
We use cookies to improve your experience with our site.

Chinese Named Entity Recognition Augmented with Lexicon Memory

Yi Zhou, Xiao-Qing Zheng, Xuan-Jing Huang

downloadPDF
周奕, 郑骁庆, 黄萱菁. 词典记忆网络增强的中文命名实体识别方法[J]. 计算机科学技术学报, 2023, 38(5): 1021-1035. DOI: 10.1007/s11390-021-1153-y
引用本文: 周奕, 郑骁庆, 黄萱菁. 词典记忆网络增强的中文命名实体识别方法[J]. 计算机科学技术学报, 2023, 38(5): 1021-1035. DOI: 10.1007/s11390-021-1153-y
Zhou Y, Zheng XQ, Huang XJ. Chinese named entity recognition augmented with lexicon memory. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 38(5): 1021−1035 Sept. 2023. DOI: 10.1007/s11390-021-1153-y.
Citation: Zhou Y, Zheng XQ, Huang XJ. Chinese named entity recognition augmented with lexicon memory. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 38(5): 1021−1035 Sept. 2023. DOI: 10.1007/s11390-021-1153-y.
周奕, 郑骁庆, 黄萱菁. 词典记忆网络增强的中文命名实体识别方法[J]. 计算机科学技术学报, 2023, 38(5): 1021-1035. CSTR: 32374.14.s11390-021-1153-y
引用本文: 周奕, 郑骁庆, 黄萱菁. 词典记忆网络增强的中文命名实体识别方法[J]. 计算机科学技术学报, 2023, 38(5): 1021-1035. CSTR: 32374.14.s11390-021-1153-y
Zhou Y, Zheng XQ, Huang XJ. Chinese named entity recognition augmented with lexicon memory. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 38(5): 1021−1035 Sept. 2023. CSTR: 32374.14.s11390-021-1153-y.
Citation: Zhou Y, Zheng XQ, Huang XJ. Chinese named entity recognition augmented with lexicon memory. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 38(5): 1021−1035 Sept. 2023. CSTR: 32374.14.s11390-021-1153-y.

词典记忆网络增强的中文命名实体识别方法

Chinese Named Entity Recognition Augmented with Lexicon Memory

Funds: This work was supported by the National Key Research and Development Program of China under Grant No. 2018YFC0830900, the National Natural Science Foundation of China under Grant No. 62076068, and Shanghai Municipal Science and Technology Project under Grant No. 21511102800.
More Information
    Author Bio:

    Yi Zhou is a post-graduate student at School of Computer Science, Fudan University, Shanghai. He received his Bachelor's degree in computer science from Fudan University, Shanghai, in 2018. His research interests include natural language processing, deep learning, and machine learning

    Xiao-Qing Zheng is an associate professor of School of Computer Science, Fudan University, Shanghai. He received his Ph.D. degree in computer science from Zhejiang University, Hangzhou, in 2007. He was an international faculty fellow at the Sloan School of Management, Massachusetts Institute of Technology, Boston. His research interests include natural language processing, machine learning, and semantic Web

    Xuan-Jing Huang is a professor of School of Computer Science, Fudan University, Shanghai. She received her Ph.D. degree in computer science from Fudan University, Shanghai, in 1998. Her research interest includes artificial intelligence, natural language processing, information retrieval, and social media processing. She served as the PC Co-Chair of EMNLP 2021, CCL 2019, CCL 2016, NLPCC 2017, SMP 2015, the organizer of WSDM 2015, and the competition chair of CIKM 2014. She has been included in the 2020 Women in AI List and AI 2000 Most Influential Scholar Annual List

    Corresponding author:

    Xiao-Qing Zheng: zhengxq@fudan.edu.cn

  • 摘要:
    研究背景 

    命名实体识别一直是自然语言处理研究领域的挑战之一。近期,深度神经网络已被尝试用于构建命名实体识别系统,并且取得了显著的成功。然而,这些系统在进行命名实体识别时仍然面临两个主要困难:存在大量的未见词和实体名称边界模糊。中文人名的姓大多来自常用的姓,而机构名又多以“公司”、“大学”和“厂”等结尾,在基于神经网络模型中以分布式表示的方式加入这种前后缀信息对于提高未见实体的识别召回率有较大帮助。另外,命名实体通常以片段(包含数量不等的若干个词)的形式出现,基于片段的、并辅以上述前后缀位置相关信息的认别模型值得进一步探索。

    目的 

    受认知科学中内容寻址检索(Content-addressable retrieval)概念的启发,我们希望能够提出一种借助词典记忆网络增强的中文命名实体识别模型。将已知的命名实体名称以记忆网络的方式存储,在命名识别时通过作用于记忆网络的注意力机制(Attention mechanism),从中提取当前文本片断的位置相关特征(特别是前后缀信息),从而用于提高未见实体的召回率。

    方法 

    我们提出了一种基于文本片断(Fragment)分类的、融合了包括字和词等多粒度特征的中文命名实体识别的模型。模型首先通过深度神经网络产生输入文本所有可能片断的上下文相关表示;然后对于每一个片断,通过注意力机制从词典记忆网络中提取与这个片断有关的位置相关信息,特别是有助于实体识别的前后缀特征;最后结合这些特征形成片断的最终表示,并输入到分类器中进行识别。模型主要由三个主要模块组成:基于字符的编码器,用于对输入的全文进行扫描,并且产生上下文相关的表示;基于片断的编码器,通过作用于记忆网络的注意力机制,从中提取并融合当前文本片断的位置相关特征,进而产生片断的最终表示;片断分类器,判断片断是否是一个命名实体,并且决定其实体类型。其中记忆网络可从大量未标注的文本中自动构建。我们还借助文本片断之间的内在嵌套关系,提出了一种采用循环神经网络经过一次扫描输入文本同时产生所有可能片断表示的优化方法,从而提高模型训练和使用效率。

    结果 

    实验结果表明我们借助词典记忆网络增强的中文命名实体识别模型(LEMON)在四个不同的数据集上均达到目前最佳的性能,特别是在未见实体的召回率方面提升显著,较已有模型在F1指标下最多能够提升3.2%。模型的源代码可以从github.com/dugu9sword/LEMON下载。

    结论 

    中文命名实体(人名、地名和机构名)通常以一些特殊的模式出现,并且它们的前后缀信息对于识别它们特别有用。基于上述观察,我们提出了一种借助词典记忆网络增强的、基于文本片断分类的中文命名实体识别模型。为了缓解实体识别时存在大量未见实体的问题,设计了一种通过注意力机制来作用于记忆网络,从中提取与当前文本片断相关的词级别和前后缀信息,并以“软”的方式融入到当前文本片断的分布式特征表示中。实验结果表明借助词典记忆网络增强的中文命名实体识别模型(LEMON)在多个不同数据集上均达到目前最佳的性能。虽然这项工作主要关注于如何在实体识别过程中通过记忆网络以“软”的方式融合词级别和前后缀特征,将来可以进一步探索所提出的LEMON模型如何与目前现有的大规模预训练语言模型(如ERNIE、XLNet和ELMo),及其结合了知识图谱的扩展(如K-BERT)的结合途径与方法,从而进一步提高中文命名实体识别的性能。

    Abstract:

    Inspired by the concept of content-addressable retrieval from cognitive science, we propose a novel fragment-based Chinese named entity recognition (NER) model augmented with a lexicon-based memory in which both character-level and word-level features are combined to generate better feature representations for possible entity names. Observing that the boundary information of entity names is particularly useful to locate and classify them into pre-defined categories, position-dependent features, such as prefix and suffix, are introduced and taken into account for NER tasks in the form of distributed representations. The lexicon-based memory is built to help generate such position-dependent features and deal with the problem of out-of-vocabulary words. Experimental results show that the proposed model, called LEMON, achieved state-of-the-art performance with an increase in the F1-score up to 3.2% over the state-of-the-art models on four different widely-used NER datasets.

  • Quantum computing has attracted increasing interest in the last decades since it provides the possibility to efficiently solve important problems such as integer factorization[1], unstructured search[2], and linear equations[3].

    In recent years, with the popularity of quantum computing, many companies, universities, and institutes have been actively working to develop prototypes of quantum computers. For example, in 2019, Google announced the realization of quantum supremacy, the development of the 53-qubit quantum processor “Sycamore”[4]. In November 2021, IBM unveiled its new 127-qubit “Eagle” processor whose scale makes it impossible for a classical computer to reliably simulate, and the increased qubit count allows users to explore problems at a new level of complexity 1. In June 2022, Xanadu demonstrated a quantum computational advantage with a programmable photonic processor that realized Gaussian boson sampling on 216 squeezed modes[5]. These systems are referred to as noisy intermediate-scale quantum systems[6] and have small qubit counts, restricted connectivity, and high gate error rates. The duration of a physical quantum gate is roughly 10 ns–800 ns, if the fidelity of physical gates achieves at least 99%[7]. At present, the coherence time of each physical qubit is 1 μs–150 μs and only a limited set of gates can be realized with relatively high fidelity on a quantum device. Each quantum processor may support a specific universal set of 1-qubit and 2-qubit gates, which are called primitive gates[7]. Table 1 lists three gate sets: GCom, GIBM, and GSur, where GCom is a commonly used gate set[8], GIBM is implemented by the IBM Q series[9], and GSur is used by the Surface-17 quantum processor[10].

    Table  1.  Three Gate Sets for Different Scenarios
    Gate Set Symbol
    GCom[8] {\bf{H}}, {\bf{X}}, {\bf{Y}}, {\bf{Z}}, {\bf{S}}, {\bf{S}}^{\dagger}, {\bf{T}}, {\bf{T}}^{\dagger}, {\boldsymbol{R}}_{\rm z}(\theta), {\bf{CX }}
    G_{ {\rm{IBM}}}[9] {\boldsymbol{U }}_1(\theta), \ {\boldsymbol{U }}_2(\alpha,\ \lambda),\ {\boldsymbol{U }}_3(\theta,\ \alpha,\ \lambda), \ {\bf{CX }}
    G_{ {\rm{Sur}}}[10] {\bf{X}}, {\bf{Y}}, {\boldsymbol{R}}_{\rm{x}}(\theta), {\boldsymbol{R}}_{\rm{y}}(\theta), {\bf{CZ}}
    下载: 导出CSV 
    | 显示表格

    The realization of quantum algorithms relies on specific quantum compilations, which mainly focus on the number of inserted quantum gates[11, 12] or the fidelity of the compiled quantum circuits[13, 14]. However, there are various ways to physically implement qubits in different physical devices and manipulate these qubits. These differences lead to different communication methods and connection topologies, with each vendor implementing its own set of primitive gates. Therefore, quantum circuits have to be transplanted from one platform to another. In addition, since the gate types supported by a quantum processor are limited, quantum circuits may also be rewritten when some high-level gates are decomposed into low-level gates before quantum circuits execute on the quantum processor.

    Converting a quantum circuit supported by one gate set to a quantum circuit supported by another gate set with respect to some rules is called quantum circuit rewriting. Usually, a rule is in the form {\cal{C}}_{\rm{p}} = {\cal{C}}_{\rm{s}} , where {\cal{C}}_{\rm{p}} is a fragment of a circuit whose behavior is the same as that of the fragment {\cal{C}}_{\rm{s}} . We call {\cal{C}}_{\rm{p}} a pattern circuit and {\cal{C}}_{\rm{s}} a substitution circuit. In this paper, we refer to the circuit to be rewritten as the target circuit. Motivated by the aforementioned requirements, our approach consists of two key steps: the first is to identify the patterns in the target circuit, the second is to replace them with semantically equivalent substitution circuits. For that purpose, we first introduce a new representation of quantum circuits using symbolic sequences. Unlike the traditional way of using directed acyclic graphs (DAGs) based on the execution dependencies of the circuit[15], the new representation allows us to easily identify the patterns that appear non-consecutively but are reducible. In the case that a fragment of a circuit can be matched by several different rules, we encounter a replacement conflict and need to resolve it with an appropriate policy. We propose three policies for generating schedulers to cope with replacement conflicts. One policy is precise in the sense that it will consider all the replacement candidates of a conflict set. In the worst case, its time complexity is exponential. For a large-scale circuit, we need to make a trade-off between the quality of the generated circuit and the time it takes. Therefore, for large-scale circuits, we propose a greedy policy to handle replacement conflicts and a stochastic policy to show that better policies exist.

    The main contributions of this paper are summarized below.

    ● We introduce a new representation of quantum circuits, which can easily identify the patterns that appear non-consecutively but are reducible in the target circuits.

    ● We present a polynomial-time dynamic programming based pattern matching and replacement algorithm.

    ● We propose three policies for generating schedulers to deal with replacement conflicts.

    ● We develop a rule library for basic optimizations.

    The rest of the paper is structured as follows. Section 2 introduces the related work. Section 3 recalls the basic notations about quantum computing. Section 4 proposes a new representation of quantum circuits. Section 5 discusses the design of the pattern matching based quantum circuit rewriting framework. Section 6 shows two case studies. Section 7 evaluates QRewriting by using the BIGD[16] benchmarks and a set of benchmark circuits[17] consisting of arithmetic circuits and implementations of multi-controlled Toffoli gates. Finally, Section 8 gives the conclusion.

    Several quantum circuit optimization compilers have recently been proposed to compile a quantum circuit to various processors. For example, Qiskit 2 and t | ket \rangle [18] support generic gate sets, and Quilc 3 is tailored for the Rigetti Agave quantum processor. There are several optimizers that automatically discover patterns[19-22]. QRewriting aims to rewrite quantum circuits between different processors according to a given rule set, mainly focusing on pattern matching and replacement.

    Pattern matching is widely used in circuit optimization. For example, many algorithms have employed peephole optimization and pattern matching to optimize circuits. Peephole optimization identifies small sets of gates and replaces them with equivalent sets that have better performance[13, 23]. Exact matching is only feasible for small-scale and medium-scale circuits[24]. Heuristics are often used in large-scale circuits, but they cannot ensure optimal results[25, 26]. Prasad et al. and Soekens et al. showed how to find optimal quantum circuits for all 3-qubit functions[27, 28]. Nam et al. proposed five optimization subroutines[17]. Murali et al. developed the first multi-vendor quantum computer compiler which compiles from high-level languages to multiple real-system quantum computer prototypes, with device-specific optimizations[9]. The work of Chen et al. is the closest to ours, where a quantum circuit optimization framework based on pattern matching (PaF) was proposed[29]. It uses subgraph isomorphism to find a pattern circuit in the target quantum circuit according to a given rule description, and then replaces it with an equivalent one.

    Previous work often treats a target circuit {\cal{C}} as a DAG, which is usually not unique because various gates may commute. The patterns that appear non-consecutively but are reducible cannot be directly identified by subgraph isomorphism in the DAG. It can integrate with commutation analysis to adjust the order of commuting gates with time complexity O(| {\cal{C}}|^{2}) , where | {\cal{C}}| is the length of the circuit {\cal{C}} .

    In this paper, we introduce a new representation of quantum circuits, which can deal with non-consecutive patterns more conveniently. For quantum circuit rewriting, we propose a polynomial-time algorithm, which is based on dynamic programming to match and replace pattern circuits in the target circuit.

    In this section, we introduce some notions and notations of quantum computing. Let {\mathbb{Z}} and {\mathbb{C}} denote the sets of all integers and complex numbers, respectively.

    Classical information is stored in bits, while quantum information is stored in qubits. Besides two basic states \left| {{\bf{0}}} \right\rangle and \left| {{\bf{1}}} \right\rangle , a qubit can be in any linear superposition state \left| {{\boldsymbol{\phi}}} \right\rangle = a\left| {{\bf{0}}} \right\rangle+b\left| {{\bf{1}}} \right\rangle , where a, b\in {\mathbb{C}} satisfy the condition |a|^{2}+|b|^{2} = 1 . State \left| {{\boldsymbol{\phi}}} \right\rangle is observed to be in state \left| {{\bf{0}}} \right\rangle with probability |a|^{2} and in state \left| {{\bf{1}}} \right\rangle with probability |b|^{2} .

    A quantum gate acts on a collection of qubits, which are called the operation qubits of the gate. For example, the Hadamard ( {\bf{H}} ) gate is applied on one qubit, and the {\bf{CX }} gate is applied on two qubits. Its behavior is described as:

    {\bf{CX}}(\alpha\left| {{\bf{0}}} \right\rangle\left| {{\boldsymbol{\psi}}} \right\rangle+\beta\left| {{\bf{1}}} \right\rangle\left| {{\boldsymbol{\phi}}} \right\rangle) = \alpha\left| {{\bf{0}}} \right\rangle\left| {{\boldsymbol{\psi}}} \right\rangle+\beta\left| {{\bf{1}}} \right\rangle({\bf{X}} \left| {{\boldsymbol{\phi}}} \right\rangle).

    That is, we apply an {\bf{X}} gate to the second qubit (called the target) if the first (the control) is in the state \left| {{\bf{1}}} \right\rangle , and the identity transformation otherwise, where \left| {{\boldsymbol{\psi}}} \right\rangle and \left| {{\boldsymbol{\phi}}} \right\rangle are the states of the second qubit. Two other gates which are relevant include the 3-qubit Toffoli gate {\bf{CCX}} and the double-controlled phase gate {\bf{CCZ}} . Likewise, the {\bf{CCX}} and {\bf{CCZ}} gates apply {\bf{X}} and {\bf{Z}} gate, respectively, when the control qubits are in state \left| {{\bf{1}}} \right\rangle .

    In a quantum circuit, each line represents a wire. The wire does not necessarily correspond to a physical wire, but may correspond to the passage of time or a physical particle that moves from one location to another through space. The interested reader can find more details of these gates in the standard textbook[8]. The execution order of a quantum logical circuit is from left to right. The width of a quantum circuit refers to the number of qubits in the quantum circuit. The depth of a quantum circuit refers to the number of layers of gates that are executable in parallel. We refer to a quantum circuit with a depth of less than 100 as a small-scale circuit, a quantum circuit with a depth of more than 1000 as a large-scale circuit, and the rest as medium-scale circuits.

    In this section, we define a new representation of quantum circuits and the pattern matching condition, which easily identifies the patterns that appear non-consecutively but are reducible. Based on that, we will state the quantum circuit rewriting problem considered in the paper.

    Definition 1. A gate is represented by a triple (\gamma, \sigma, \alpha) , where

    \gamma is the symbol of a gate type,

    \sigma is a finite sequence of operation qubits for a gate,

    \alpha is a finite sequence of rotation angles for a gate.

    A quantum circuit {\cal{C}} is a sequence of triples (\gamma_0,\ \sigma_0,\ \alpha_0)(\gamma_1,\ \sigma_1,\ \alpha_1) \ldots (\gamma_{n-1},\ \sigma_{n-1},\ \alpha_{n-1}) , and the length of the circuit is denoted by | {\cal{C}}| . The gate sequence \Gamma_ {\cal{C}} of quantum circuit {\cal{C}} is a symbolic sequence of gate types obtained by projecting each element of {\cal{C}} to its first component, i.e., \Gamma_ {\cal{C}} = “\gamma_0\gamma_1\ldots \gamma_{n-1} ”. The new representation of a rule is a pair r = ( {\cal{C}}_{\rm{p}}, \ {\cal{C}}_{\rm{s}}) , consisting of a pattern circuit {\cal{C}}_{\rm{p}} and a substitution circuit {\cal{C}}_{\rm{s}} . For simplicity, if the sequence is empty \epsilon , we ignore it. Table 2 lists some of the commonly used quantum gates, their distinct aliases, and the corresponding circuit symbols.

    Table  2.  Symbols of Gates and Distinct Aliases
    Gate Aliase Circuit
    {\bf{I}} “I”
    {\bf{H}} “h”
    {\bf{X}} “x”
    {\bf{Y}} “y”
    {\bf{Z}} “z”
    {\bf{T}} “t”
    {\bf{T}}^{\dagger} “T”
    {\bf{S}} “s''
    {\bf{S}}^{\dagger} “S”
    {{\boldsymbol{R}}}_{\rm{x}} “X'
    {{\boldsymbol{R}}}_{\rm{y}} “Y”
    {{\boldsymbol{R}}}_{\rm z} “Z”
    {\bf{CX }} “c”
    {\bf{CZ}} “C”
    {\bf{CCX}} “E”
    {\bf{CCZ}} “F”
    下载: 导出CSV 
    | 显示表格

    Example 1. We illustrate the symbolic sequences of the circuit obtained by using the patterns in Fig.1 to rewrite target circuit {\cal{C}}_{\rm{t}} in Fig.2(a) as an example. The new representation of the quantum circuit is

    Figure  1.  Rule set {\cal{R}} used to optimize the X and CX gates. (a) r_0 . (b) r_1 . (c) r_2 . (d) r_3 .
    Figure  2.  (a) A quantum circuit. The markers 0–6 on the gate symbols represent the order in the gate sequence. (b) and (c) are the results of the quantum circuit in (a) rewritten by schedulers s' and s '', respectively.
    \begin{array}{l} {\cal{C}}_{\rm{t}} = ( { {“{\rm{x}}”}},\ q_2)({ {“{\rm{x}}”}},\ q_2)({ {“{\rm{c}}”}},\ q_0q_1)({ {“{\rm{c}}”}},\ q_0q_2)\\ \;\;\;\;\;\;\;\; ({ {“{\rm{c}}”}},\ q_0q_1) ( { {“{\rm{x}}”}},\ q_2)({ {“{\rm{x}}”}},\ q_0), \end{array}

    and its gate sequence is represented by \Gamma_{\rm{t}} = { {“{\rm{xxcccxx}}”}} . We can make use of the rule set {\cal{R}} = \{ r_0,\ r_1,\ r_2,\ r_3\} in Fig.1 to rewrite the circuit, where

    r_0 = ( ({ {“{\rm{x}}”}},\ q_0)({ {“{\rm{x}}”}},\ q_0),\ ({ {“{\rm{I}}”}},\ q_0)),\
    r_1 = ( ({ {“{\rm{c}}”}},\ q_0q_1)({ {“{\rm{c}}”}},\ q_0q_1),\ ({ {“{\rm{I}}”}},\ q_0)({ {“{\rm{I}}”}},\ q_1)),
    \begin{array}{l} r_2 = (({ {“{\rm{c}}”}},\,q_0q_1)({ {“{\rm{c}}”}},\,q_1q_2)({ {“{\rm{c}}”}},\,q_0q_1),\,({ {“{\rm{c}}”}},\,q_0q_2)\\\;\;\;\;\;\;\;\;( { {“{\rm{c}}”}},\,q_1q_2)),\, \end{array}
    r_3 = ( ({ {“{\rm{x}}”}},\ q_1)({ {“{\rm{c}}”}},\ q_0q_1)({ {“{\rm{x}}”}},\ q_1),\ ({ {“{\rm{c}}”}},\ q_0q_1)).

    To facilitate the description of pattern matching, we introduce the following definitions.

    Definition 2[30]. Let \Gamma and \Gamma' be two sequences. We say \Gamma' is a subsequence of \Gamma , if there exist indices 0 \leqslant i_0 <\ldots< i_{|\Gamma'|-1} < |\Gamma| such that \Gamma[i_{k}] = \Gamma'[k] for all k \in [0,\ |\Gamma'|-1] .

    The subsequence set is a set of distinct subsequences of the pattern circuit in the target circuit. We do not distinguish the indices from the gates to which the indices correspond in the quantum circuit. Suppose \Gamma is a gate sequence. Then \Gamma[i:j] means to take the fragment of \Gamma from index i to j . If j is not specified, it takes the suffix of \Gamma from index i . Supposing that \Gamma' is a subsequence of \Gamma , we write \Gamma \setminus \Gamma' to indicate the subsequence of \Gamma obtained by removing all occurrences of \Gamma' .

    Example 2. Continuing to consider Example 1, the gate sequences and subsequence sets of the pattern circuits in rule set {\cal{R}} are given as follows:

    { {“{\rm{xx}}”}} : \{[{0,\ 1}]\},

    { {“{{\rm{cc}}}”}} : \{[{2,\ 4}]\},

    { {“{{\rm{ccc}}}”}} : \emptyset,

    { {“{{\rm{xcx}}}”}} : \{[{1, \ 3, \ 5}]\}.

    Definition 3 (Qubit Mapping). Given two qubit sets Q and Q' , a qubit mapping function f is a bijective function between the qubit sets Q and Q' .

    Definition 4 (Qubit State Independence). Let {\cal{C}}_{\rm{t}} , {\cal{C}}_{\rm{p}} be two circuits with gate sequences \Gamma_{\rm{t}} , \Gamma_{\rm{p}} , respectively. Suppose a subsequence \Gamma_{\rm{t}}' = \{i_0,\ \ldots,\ i_{\rm{l}}\} of \Gamma_{\rm{t}} that can match \Gamma_{\rm{p}} . We say the qubit state in \Gamma_{\rm{t}}' is independent w.r.t. \Gamma_{\rm{t}}[i_0:i_{\rm{l}}] \setminus \Gamma_{\rm{t}}' , if the control qubit set in \Gamma_{\rm{t}}' does not intersect with the target qubit set of the gates in \Gamma_{\rm{t}}[i_0:i_{\rm{l}}] \setminus \Gamma_{\rm{t}}' , and vice-versa.

    Definition 5 (Pattern Matching). Let {\cal{C}}_{\rm{t}} and {\cal{C}}_{\rm{p}} be a target circuit and a pattern circuit with gate sequences \Gamma_{\rm{t}} , \Gamma_{\rm{p}} , respectively. We say {\cal{C}}_{\rm{p}} matches {\cal{C}}_{\rm{t}} if the following two conditions hold:

    \Gamma_{\rm{t}} has a subsequence that can match \Gamma_{\rm{p}} up to a qubit mapping function;

    the qubit sets of the subsequence and the pattern circuit satisfy the qubit mapping function and the qubit state independence condition.

    Example 3. We continue Example 2 to show the difference between the new representation of quantum circuits and the DAG representation. Suppose the gates of target circuit {\cal{C}}_{\rm{t}} (resp. pattern circuit of r_1 ) from left to right are named g_0 g_6 (resp. g_0' g_1' ). Fig.3(a) is the DAG representation of the circuit fragment g_2 g_4 and Fig.3(c) is the DAG representation of the pattern circuit of r_1 . We can see that Fig.3(a) has no subgraph isomorphic to Fig.3(c). The gates g_2 and g_3 in Fig.3(a) are exchanged by commutation analysis to obtain Fig.3(b), which contains a subgraph isomorphic to Fig.3(c). The symbolic sequence of the circuit {\cal{C}}_{\rm{t}} can directly match the pattern circuit of r_1 , i.e., the gates g_2 and g_4 . It satisfies the qubit mapping function { f ( q_0) = q_0 , f(q_1) = q_1 } and the qubit state independence condition. Therefore, circuit {\cal{C}}_{\rm{t}} can be rewritten as: ({ {“{\rm{x}}”}},\ q_2) ({ {“{\rm{x}}”}},\ q_2) ({ {“{\rm{c}}”}},\ q_0q_2)({ {“{\rm{x}}”}}, q_2)({ {“{\rm{x}}”}},\ q_0) .

    Figure  3.  (a) DAG representation of the circuit fragment g_2g_4 of {\cal{C}}_{\rm{t}}. (b) DAG representation of the circuit fragment g_3,\ g_2,\ g_4 by exchanging the gates g_2 and g_3 of (a). (c) Pattern circuit of the rule r_1.

    Definition 6. For a given target circuit {\cal{C}}_{\rm{t}} and a rule r = ( {\cal{C}}_{\rm{p}},\ {\cal{C}}_{\rm{s}}) , a replacement candidate is a triple (D,\ r,\ e) , where

    D is a subsequence of the target circuit {\cal{C}}_{\rm{t}} that can match the pattern circuit {\cal{C}}_{\rm{p}} ,

    r is a rule,

    e \in {\mathbb{Z}} is a conflict index, with the default value being -1 .

    Definition 7. A target circuit has a replacement conflict if an index of the target circuit appears more than once in the subsequence set.

    The replacement candidates for a replacement conflict form a conflict set. A replacement scheduler is a set of replacement candidates for different indices.

    Example 4. Continuing to consider Example 3, we see that the subsequence set of sequence “ {\rm{cc}} ” is \{[{2,\ 4}]\} . The subsequence [{2,\ 4}] appears non-consecutively in the sequence \Gamma_{\rm{t}} . The gate ({ {“{\rm{x}}”}}, q_2) appears in both the subsequence sets of “ {\rm{xx}} ” and “ {\rm{xcx}} ”, which means that in the target circuit, different rules may be matched at the same index. Two schedulers are given as follows:

    \begin{array}{l} s' = \{ ([{0,\ 1}],\ r_0,\ 1),\ ([{2,\ 4}],\ r_1)\},\ \\ s'' = \{ ([{1,\ 3,\ 5}],\ r_3,\ 1),\ ([{2,\ 4}],\ r_1)\}. \end{array}

    The scheduler s' (resp. s '') replaces the gates in the sequence [{0,\ 1}] (resp. [{1,\ 3,\ 5}] ) using the substitution circuit of r_0 (resp. r_3 ). After one of the schedulers is applied, we get the circuit in Fig.2(b) or Fig.2(c). Different schedulers result in different gate counts or depths of the rewritten circuits. The circuits rewritten by scheduler s' or s '' have the same gate counts but with depths 2 and 3, respectively. In both schedulers, the first element has the component 1 , which is an index to indicate where the conflict takes place.

    We are now ready to state the following problem.

    Problem 1. Given two gate sets G_1 , G_2 and a rule set {\cal{R}} that expresses the equivalence of each gate in G_1 in terms of elements of G_2 , how can one rewrite a quantum circuit supported by a gate set G_1 to a quantum circuit supported by G_2 ?

    By using the new representation of quantum circuits, we reduce the above problem to find distinct subsequences of the pattern sequence in the target sequence up to a qubit mapping function, and we use a qubit state independence condition to filter the obtained subsequences.

    We propose a pattern matching based quantum circuit rewriting framework QRewriting. It consists of two steps: one matches the pattern circuit in the target circuit, and the other replaces it.

    We propose an algorithm based on dynamic programming to match the patterns in a rule set against a target circuit. Let {\cal{C}}_{\rm{t}} and {\cal{C}}_{\rm{p}} be the target and pattern circuits with gate sequences \Gamma_{\rm{t}} and \Gamma_{\rm{p}} , respectively. We consider the problem of finding the distinct subsequences of the pattern sequence \Gamma_{\rm{p}} in the target sequence \Gamma_{\rm{t}} . The obtained subsequences only match the gate types, and thus we need to check whether the operation qubits in the subsequences satisfy the qubit mapping function and the qubit state independence condition.

    The input of Algorithm 1 is a target circuit {\cal{C}}_{\rm{t}} and a rule set {\cal{R}} , and the output is a set of replacement candidates {\cal{M}} . The function subseq (\Gamma_{\rm{t}},\, \Gamma_{\rm{p}},\, \delta) in Algorithm 2[31] uses a dynamic programming algorithm to compute the distinct subsequences of \Gamma_{\rm{t}} that can match \Gamma_{\rm{p}} and returns the subsequence set to set D . The function check_qubit_condition (D, \, {\cal{C}}_{\rm{t}},\, r) checks whether the subsequences in D satisfy the qubit mapping function and the qubit state independence condition.

    Algorithm 1. pattern_matching ( {\cal{C}}_{\rm{t}} , {\cal{R}} )
    Input: a quantum circuit {\cal{C}}_{\rm{t}} and a rule set {\cal{R}} ;
    Output: a set of substitution candidate {\cal{M}} ;
    1 {\cal{M}} \gets \emptyset ;
    2 Let \Gamma_{\rm{t}} be the gate sequence of circuit {\cal{C}}_{\rm{t}} ;
    3 for each r \in {\cal{R}} do
    4  Let \Gamma_{\rm{p}} be the gate sequence of pattern circuit of r ;
    5   D \gets subseq (\Gamma_{\rm{t}},\, \Gamma_{\rm{p}},\, \delta) ;
    6  if {{check}}\_{{qubit}}\_{{condition}} (D,\, {\cal{C}}_{\rm{t}},\, r) then
    7    {\cal{M}} \gets {\cal{M}} \cup\{(u,\, r,\, -1): u\in D\};
    8 return {\cal{M}} ;

    Algorithm 2. subseq (\Gamma_{\rm{t}},\, \Gamma_{\rm{p}}, \,\delta)
    Input: two sequences \Gamma_{\rm{t}} and \Gamma_{\rm{p}} , and a parameter \delta ;
    Output: a set of subsequences of \Gamma_{\rm{p}} ;
    1 Let D be a sequence of length |\Gamma_{\rm{p}}|+1 and each element is an empty set;
    2 Let \epsilon be an empty sequence and D[0]\leftarrow \{\epsilon\} ;
    3 for i \gets 0 {\rm{to}} |\Gamma_{\rm{t}}| do
    4  for j \gets \min(i,\,|\Gamma_{\rm{p}}|) {\rm{to}} 0 do
    5   if \Gamma_{\rm{t}}[i]=\Gamma_{\rm{p}}[j] then
    6     D[j+1]\gets D[j+1] \cup \{u. \text{append}(i):
    u\in D[j]\ { \;{\rm{and}} }\;\ i-u[0]<\delta \} ;
    7 return D[|\Gamma_{\rm{p}}|+1] ;

    The input of function subseq (\Gamma_{\rm{t}},\, \Gamma_{\rm{p}}, \, \delta) consists of a target sequence \Gamma_{\rm{t}} , a pattern sequence \Gamma_{\rm{p}} , and a parameter \delta to limit the range of indices of \Gamma_{\rm{p}} in \Gamma_{\rm{t}} . The output is a set D[|\Gamma_{\rm{p}}|+1] recording the distinct subsequences of \Gamma_{\rm{t}} that can match \Gamma_{\rm{p}} . We use set D[j+1] to record the subsequences of \Gamma_{\rm{t}} that can match \Gamma_{\rm{p}}[0:j] . If the condition \Gamma_{\rm{t}}[i] = \Gamma_{\rm{p}}[j] is satisfied, there are subsequences of \Gamma_{\rm{t}}[0:i] that can match \Gamma_{\rm{p}}[0:j] . Line 6 updates set D[j+1] . To find the subsequence of \Gamma_{\rm{p}}[0:j] , we need to first calculate the subsequences of \Gamma_{\rm{p}}[0:j-1] . The update: D[j+1] \gets D[j+1] \cup \{u. {\rm append}(i): u\in D[j]\ {{\rm{and}} }\ i-u[0]<\delta \}\ is the Bellman equation[31], which is a necessary enabler of the dynamic programming algorithm. The time complexity of Algorithm 1 is O(| {\cal{R}}|\times |\Gamma_{\rm{p}}|\times |\Gamma_{\rm{t}}|) and the space complexity is O(|\Gamma_{\rm{p}}|) .

    Example 5. Let us consider the quantum circuit in Fig.4(a) and the rule set {\cal{R}} = \{ r_0, r_1, r_2, r_3\} in Fig.1. The gate sequences of the target circuit and the pattern circuits are “{{\rm{xxxxcccxxxxxcccxxxxcccxxxcccxxxx}}} {{\text{xxccc}}} ”, “ {\rm{xx}} ”, “ {\rm{cc}} ”, “{{\rm{ccc}}} ”, and “ {\rm{xcx}} ”, respectively. The subsequence set of “ {\rm{cc}} ” is \{[{4,\ 5}],\ [{4,\ 6}],[{5, 6}], \ldots\}. The corresponding gates of the indices \{4,\ 5,\ 6\} in target circuit {\cal{C}}_{\rm{t}} are ({ {“{\rm{c}}”}},\ q_{13}q_2)({ {“{\rm{c}}”}},\ q_{9}q_{14})({ {“{\rm{c}}”}}, q_4q_{12}) . The subsequences [{4,\ 5}], [{4,\ 6}], [{5,\ 6}] do not satisfy the qubit mapping function condition. The function check_qubit_condition (D,\, {\cal{C}}_{\rm{t}},\, r) filters the subsequences and finally gets the subsequence set of “ {\rm{cc}} ” in the gate sequence of the target circuit, which is \{[{5,\ 12}],\ [{19,\ 25}]\} highlighted with dotted lines in Fig.4(a).

    Figure  4.  (a) Quantum circuit 16QBT_05YCTFL_3. The markers 0–31 on the gate symbols represent the order in the gate sequence. (b) Quantum circuit 16QBT_05YCTFL_3 optimized by QSRewriting.

    By Algorithm 1, we obtain all the subsequences of the pattern circuits in the target circuit. To resolve replacement conflicts, we propose three conflict resolution policies. They give rise to three variants of QRewriting called QPRewriting, QGRewriting, and QSRewriting, respectively. Note that QRewriting uses the greedy policy by default. Due to the decoherence of qubits, the lifetime of qubits is very short[32]. The execution time of a quantum circuit is determined by several factors such as the depth and the gate count of the quantum circuit. Here, we mainly use depth to select the optimal replacement scheduler.

    ● The precise policy calculates all the candidates when a replacement conflict occurs.

    ● The greedy policy chooses the candidate appearing first in the target circuit among the conflict set.

    ● The stochastic policy selects a candidate stochastically in the conflict set for the scheduler.

    We propose an algorithm based on the breadth-first search to compute the replacement scheduler as shown in Algorithm 3. The input is a gate sequence \Gamma_{\rm{t}} and a set of replacement candidates {\cal{M}} . {\cal{S}} is a scheduler set, and queue {\cal{S}}_{\rm{q}} stores the sub-scheduler. Firstly, we push an empty scheduler \emptyset into queue {\cal{S}}_{\rm{q}} . Then, we loop queue {\cal{S}}_{\rm{q}} (lines 4–12), until it is empty. Function next_conflict (\Gamma_{\rm{t}}, \, {\cal{M}}, \, s) computes the next conflict index e in \Gamma_{\rm{t}} from the current conflict index to the end of \Gamma_{\rm{t}} . If there is no conflict at this index, we directly add the replacement candidate to s . Otherwise, we return the index. When arriving at the end of \Gamma_{\rm{t}} , we add scheduler s into {\cal{S}} . Line 10, according to the conflict policy, calculates the candidate set that has a conflict at index e in {\cal{M}} . Lines 11 and 12 append the replacement candidates to s and push it into queue {\cal{S}}_{\rm{q}} . Finally, we calculate the depth of the replaced circuit and return the scheduler with the smallest depth.

    Algorithm 3. solve_conflicts (\Gamma_{\rm{t}}, \, {\cal{M}})
    Input: a gate sequence \Gamma_{\rm{t}} and a set of replacement candidates {\cal{M}} ;
    Output: a replacement scheduler;
    1 {\cal{S}} \gets \emptyset ;
    2 Let {\cal{S}}_{\rm{q}} be a scheduler queue;
    3 {\cal{S}}_{\rm{q}}. push (\emptyset) ;
    4 while {\cal{S}}_{\rm{q}} is not empty do
    5   s\gets {\cal{S}}_{\rm{q}}. pop();
    6   e \gets next_conflict (\Gamma_{\rm{t}}, \, {\cal{M}}, \, s) ;
    7  if e=-1 then
    8    {\cal{S}} \gets {\cal{S}} \cup \{s\} ;
    9   continue;
    10   V \gets compute the conflict set on index e ;
    11  for each v \in V do
    12    {\cal{S}}_{\rm{q}}. push (s\cup \{v\}) ;
    13 return compute_depth ( {\cal{S}} );

    The time complexity depends on the conflict policy. In the worst case, the precise policy is used and the worst time complexity is O(|V|^m) , where m is the number of conflicts and |V| is the size of the largest conflict set. When dealing with large-scale circuits, the precise policy is not scalable. Therefore we do not plan to demonstrate the precise policy in our experiments. The time complexity of both the greedy and stochastic policies are O(m|V|) .

    Example 6. Continuing to consider Example 5, we show how to generate schedulers. Starting from the index e = 0 , we search for the next conflicting index e in {\cal{M}} . When e = 18 , the conflict set is \{([{10,\ 18}], r_0,\ 18),\ ([{18,\ 26,\ 31}],\ r_3,\ 18)\} . If the precise policy is used, we append scheduler s with the two candidates and put it into queue {\cal{S}}_{\rm{q}} . Then, the generated scheduler set {\cal{S}} = \{s',\ s''\} are given as follows:

    \begin{array}{l} s' = \{ ([{0,\ 22}],\ r_0),\ ([{2,\ 13,\ 17}],\ r_3),\ ([{3,\ 9}],\ r_0),\ \\ \;\;\;\;\;\;\ \; ([{5,\ 12}],\ r_1),\ ([{10,\ 18}],\ r_0,\ 18),\ ([{15,\ 23}],\ r_0),\ \\ \;\;\;\;\;\;\ \; ([{16,\ 24}],\ r_0),\ ([{19,\ 25}],\ r_1)\},\ \\ s'' = \{ ([{0,\ 22}],\ r_0),\ ([{2,\ 13,\ 17}],\ r_3),\ ([{3,\ 9}],\ r_0),\ \\ \;\;\;\;\;\;\ \;\, ([{5,\,12}],\,r_1),\,([{18,\,26,\,31}],\,r_3,\,18),\,([{15,\,23}],\,r_0),\\ \;\;\;\;\;\;\ \; \, ([{16,\ 24}],\ r_0),\ ([{19,\ 25}],\ r_1)\}. \end{array}

    If we use the greedy policy, the replacement candidate ([{10,\ 18}],\ r_0,\ 18) is selected, and the generated scheduler is s' . If we use the stochastic policy, one of them is selected and the finally generated scheduler is either s' or s ''.

    For the scheduler provided by Algorithm 3, we propose a replacement algorithm as shown in Algorithm 4, which reversely traverses each element of the scheduler and substitutes it according to the patterns. The input includes a target circuit {\cal{C}}_{\rm{t}} and a scheduler s . The output is a substitution circuit. Line 2 obtains the mapping relationship Q_{\rm{m}} of qubits between the subsequence of the target circuit and the pattern circuit, according to the qubit mapping function qubits_mapping (s_i) . Line 3 updates the gates on the target circuit with the substitution circuit one by one. If the substitution gate sequence is longer than the pattern gate sequence, the redundant gates are inserted after the index where the pattern circuit appears in the target circuit. Otherwise, the redundant locations are removed from the target circuit. The time complexity is O(m|s|) , where m is the maximum length of the pattern circuit, and |s| is the length of the scheduler s .

    Algorithm 4. substitute( {\cal{C}}_{\rm{t}},\, s )
    Input: a quantum circuit {\cal{C}}_{\rm{t}} , and a substitution scheduler s ;
    Output: the substituted circuit {\cal{C}}_{\rm{t}} ;
    1 for i \gets |s|-1 to 0 do
    2   Q_{\rm{m}} \gets qubits_mapping (s_i) ;
    3   {\cal{C}}_{\rm{t}} \gets update (s_i, \, Q_{\rm{m}},\, {\cal{C}}_{\rm{t}}) ;
    4 return {\cal{C}}_{\rm{t}};

    Example 7. Continuing to consider Example 6, we take the replacement candidate ([{2,\ 13,\ 17}],\ r_3) as an example. We get the qubit mapping set Q_{\rm{m}} = \{f(q_0) = q_3,\ f(q_1) = q_{8}\} . The third gate of the target circuit {\cal{C}}_{\rm{t}} is updated by the gate ({ {“{\rm{c}}”}},\ q_3q_{8}) . The length of the pattern circuit is greater than that of the substitution circuit. Thus, the 13th and 17th gates are removed from the target circuit {\cal{C}}_{\rm{t}} .

    We develop a rule library for basic optimizations. The library is mainly used for basic reduction and quantum gate exchange[17, 33]. Note that almost all the gates implemented by quantum hardware devices are 1-qubit or 2-qubit gates, thus our rule library mainly concerns 1-qubit gates and 2-qubit gates. The maximum input scale involved in the rule set is up to 3-qubit gates. The gate specification involves some cancellation rules for 1-qubit gates and 2-qubit gates, as shown in Fig.5. The commutation rules shown in Fig.6 include transformation rules[33].

    Figure  5.  Gate cancellation rules. (a)–(n) 1-qubit gate rules. (o)–(t) 2-qubit gate rules.
    Figure  6.  Commutation gate rules. (a)–(d) 1-qubit gate rules. (e)–(j) 2-qubit gate rules.

    Sometimes, after a step of circuit rewriting the target circuit still matches some rules. We repeat several rounds of optimization and circuit rewriting until no pattern circuits can be matched or the specified repetition bound is reached (five by default in practice). It has been found experimentally that beyond five optimizations, the optimization opportunity decreases quickly. Hence, as a trade-off between optimization and running time, we choose to repeat the procedure five times.

    In this section, we consider two examples: the first rewrites a circuit with three {\bf{CCZ}} gates to a circuit using the gate set G_{ {\rm{Sur}}} ; the second optimizes a circuit with a stochastic policy.

    We demonstrate QRewriting by rewriting the quantum circuit Toff-NC3[16] shown in Fig.7 to the gate set G_{ {\rm{Sur}}} [10] using the gate decomposition rules in Fig.8. The {\bf{CCZ}} gate decomposition rule[34] r = ( {\cal{C}}_{\rm{p}},\; {\cal{C}}_{\rm{s}}) is a pair, where

    Figure  7.  Quantum circuit Toff-NC3. The markers 0–8 on the gate symbols represent the order in the gate sequence.
    Figure  8.  Gate decomposition into primitives supported in the superconducting Surface-17 processor. (a)–(f) 1-qubit gate rules. (g)–(h) 2-qubit gate rules. (i) 3-qubit gate rule.
    \begin{array}{l} {\cal{C}}_{\rm{p}} = ( { {“{\rm{E}}”}},\ q_0q_1q_2),\ \text{ and }\\ {\cal{C}}_{\rm{s}} = ( { {“{\rm{t}}”}},\ q_0)({ {“{\rm{t}}”}},\ q_1)({ {“{\rm{c}}”}},\ q_2q_0)({ {“{\rm{c}}”}},\ q_1q_2)\\ \;\;\;\;\;\;\;\;({ {“{\rm{T}}”}},\ q_0) ( { {“{\rm{T}}”}},\ q_2)({ {“{\rm{c}}”}},\ q_1q_0)({ {“{\rm{c}}”}},\ q_1q_2)({ {“{\rm{t}}”}},\ q_0)\\ \;\;\;\;\;\;\;\; ({ {“{\rm{c}}”}},\ q_2q_0) ( { {“{\rm{T}}”}},\ q_0)({ {“{\rm{t}}”}},\ q_2) ({ {“{\rm{c}}”}},\ q_1q_0). \end{array}

    The gate sequences of the target circuit and the decomposition pattern circuit are “{{\rm{hEhhEhhEh}}} ” and “{\rm{E}} ”, respectively. The subsequence set of {\cal{C}}_{\rm{t}} that can match “{\rm{E}} ” is \{[{1}],\ [{4}],\ [{7}]\} . Finally, the resulting circuit is shown in Fig.9.

    Figure  9.  Quantum circuit Toff-NC3 rewritten to the Surface-17 quantum processor.

    Next, we show an example of circuit optimization using QSRewriting with a stochastic policy. The target circuit and a set of rules {\cal{R}} = \{ r_0,\ r_1,\ r_2,\ r_3\} are shown in Fig.4(a) and Fig.1, respectively. The gate sequences of the target circuit and pattern circuits are “{{\rm{xxxxcccxxxxxcccxxxxcccxxxcccxxxxxxccc}}} ”, “ {\rm{xx}} ”, “ {\rm{cc}} ”, “{{\rm{ccc}}} ”, and “ {\rm{xcx}} ”, respectively. The subsequence sets of the gate sequence of the pattern circuits are as follows:

    “ {\rm{xx}} ”: \{[{3,\,9}],\,[{10,\,18}],\,[{0,\,22}],\,[{15,\,23}],\,[{16,\,24}]\},

    “ {\rm{cc}} ”: \{[{5,\ 12}],\ [{19,\ 25}]\},\

    “{\rm{ccc}} ”: \emptyset,

    “ {\rm{xcx}} ”: \{[{2,\ 13,\ 17}],\ [{18,\ 26,\ 31}]\}.

    In Fig.4(a), we have marked the order of the elements in subsequence sets on the circuit symbols. The replacement candidates ([{10,\ 18}],\ r_0,\ 18) and ([{18,\ 26,\ 31}],\ r_3,\ 18) have a conflict at the index 18 of the target circuit. With the stochastic policy, either of the candidates can be chosen. Suppose the former is taken, then the generated replacement scheduler is given as follows:

    \begin{array}{l} s = \{ ([{0,\ 22}],\ r_0),\ ([{2,\ 13,\ 17}],\ r_3),\ ([{3,\ 9}],\ r_0),\ \\ \ \ \;\; \;\; \;\;([{5,\ 12}],\ r_1),\ ([{10,\ 18}],\ r_0,\ 18),\ ([{15,\ 23}],\ r_0),\ \\ \ \ \;\; \;\; \;\;([{16,\ 24}],\ r_0),\ ([{19,\ 25}],\ r_1)\}. \end{array}

    Finally, we obtain the resulting circuit shown in Fig.4(b), which reduces the gate count and the depth by 49% and 20%, respectively.

    We compare QRewriting with a state-of-the-art algorithm for the quantum circuit optimization framework based on pattern matching, namely PaF[29]. Note that PaF is not freely available, and thus we implement it in Python. The implementation of QRewriting in Python is available online 4. All the experiments are conducted on a Ubuntu machine with a 2.2 GHz CPU and 64 G memory. For the stochastic policy, we execute QSRewriting five times and take the average result; because other policies are deterministic, we execute them only once.

    We compare QSRewriting, QGRewriting, and PaF, using the BIGD[16] benchmarks and the rules shown in Fig.1, with the results shown in Fig.10. The depth and the gate count of the generated quantum circuits are used as evaluation metrics. The BIGD benchmarks are characterized by parameter (v_1,\ v_2) , which is called the gate density vector[16]. The two components stand for the densities of 1-qubit and 2-qubit gates of a circuit, respectively. Supposing a quantum circuit has n qubits, N_{\rm{1q}} (resp. N_{\rm{2q}} ) is the number of 1-qubit (resp. 2-qubit) gates, and the longest dependency chain is l , then v_1 = N_{\rm{1q}}/(n \times l) and v_2 = 2\times N_{\rm{2q}}/(n \times l) .

    Figure  10.  Comparison of the quantum circuits generated by QSRewriting, QGRewriting, and PaF optimized on the BIGD[16] benchmarks. (a) 1-qubit gate count. (b) 2-qubit gate count. (c) Total gate count. (d) Depth.

    The BIGD benchmarks include 360 circuits with a total of 129600 gates. After a PaF optimization, the gate count and the depth decrease by 66236 and 3999 within 7125 seconds, respectively. QGRewriting (resp. QSRewriting) takes 1387 (resp. 1509) seconds to rewrite these benchmark circuits, and the generated circuits further reduce the 1-qubit gate count, 2-qubit gate count, total gate count, and depth by an average of 5.2% (resp. 4.5%), 57.0% (resp. 55.9%), 17.4% (resp. 16.7%), and 26.5% (resp. 24.8%), respectively, compared with PaF. The main evaluation results are shown in Fig.10, which compares the performance of QSRewriting, QGRewriting, and PaF in terms of 1-qubit gate count, 2-qubit gate count, total gate count, and depth of the generated circuits, respectively. The light blue bars represent the gate count (depth) of the benchmarks. The blue, red, and yellow lines represent PaF, QGRewriting, and QSRewriting, respectively. We can see that the red and yellow lines are mostly lower than the blue lines, and the yellow lines are mostly obscured by the red lines, but we can still see that the yellow lines are lower than the red ones in some places. In Fig.10(d), we can see that in a few cases, the depth of those quantum circuits might increase after optimizing. The reason is that the gates of a rear layer may be moved to the front layer, causing the original gates of the front layer to conflict with them.

    In Fig.10, QSRewriting is lower than QGRewriting on some BIGD benchmarks on which we compare the gate count and depth increments of the circuits generated by QSRewriting and QGRewriting, as shown in Fig.11(a) and Fig.11(b), respectively. In Fig.11, the light blue bars represent the total gate count (depth) of the circuits generated by QSRewriting and the blue parts are the increments in gate count (depth) that QGRewriting has over QSRewriting. The greedy policy is deterministic and the stochastic policy shows that better strategies exist. In Fig.12, we show a comparison of the time cost of the three methods optimized on the BIGD benchmarks, with PaF in blue, QGRewriting in red, and QSRewriting in yellow. It is clear that QGRewriting and QSRewriting use less time, as both the red and yellow lines are lower than the blue lines in Fig.12. On average, both QGRewriting and QSRewriting optimized on the BIGD benchmarks are about 5 times faster than PaF.

    Figure  11.  Comparison of the increments of the generated circuits on which QSRewriting outperforms QGRewriting both optimized on the BIGD[16] benchmarks. (a) Total gate count. (b) Depth.
    Figure  12.  Comparison of the time cost for QSRewriting, QGRewriting, and PaF optimized on the BIGD[16] benchmarks.

    Now we consider rewriting a set of benchmark circuits[17] consisting of arithmetic circuits and implementations of multi-controlled Toffoli gates to the Surface-17 processor, as shown in Tables 3 and 4. The set of benchmark circuits uses the commonly used gate set G_{ {\rm{Com}}} (see Table 1) with a total of 33 circuits and 201554 gates. The gate set G_{ {\rm{Sur}}} supported by Surface-17 processor limits 1-qubit gates to {{\boldsymbol{R}}}_{\rm{x}} and {{\boldsymbol{R}}}_{\rm{y}} rotations and 2-qubit {\bf{CZ}} gate, and more specifically \pm{{{\pi}}}/{4} , \pm{{{\pi}}}/{2} , and \pm{{\pi}} degrees will be used in the decomposition, as shown in Fig.8. Thus, the rewriting of the benchmark circuits is from gate set G_{ {\rm{Com}}} to gate set G_{ {\rm{Sur}}} , and the gate count (resp. depth) and time cost for each phase are shown in Table 3 (resp. Table 4) in detail. Note that the rewriting of the benchmarks simply chooses the greedy policy since no replacement conflicts arise. We calculate the reduction rate of gate count (resp. depth) by optimizing each rewritten benchmark, as shown in the last column of Table 3 (resp. Table 4). The optimization of the rewritten circuits leads to a reduction of up to 52% in gate count and 49% in depth. However, there is a price to pay. For example, for the benchmark circuit “GF( 2^{163} )-Mult” in the last row of Table 3 with millions of gates, the rewriting without optimization takes about 15 minutes (see the last row of column t_1 ), while the rewriting with optimization may take about two hours (see the last row of column t_2 ), depending on the size of the rule library and the structure of the quantum circuit.

    Table  3.  Comparison of Gate Count of Circuits Generated by QRewriting from the Benchmark Circuits[17]
    Benchmark n N N_0 t_0 N_1 t_1 N_2 t_2 \Delta(\%)
    Toff-NC3 5 9 45 0.00 135 0.03 80 0.50 40.74
    Toff-Barenco3 5 10 58 0.00 174 0.07 101 0.52 41.95
    Mod 54 5 15 63 0.00 187 0.09 89 0.57 52.41
    Toff-NC4 7 15 75 0.00 225 0.10 134 0.89 40.44
    Toff-Barenco4 7 18 114 0.00 342 0.19 198 1.12 42.11
    Toff-NC5 9 21 105 0.00 315 0.18 188 1.34 40.32
    Toff-Barenco5 9 26 170 0.01 510 0.32 296 1.70 41.96
    VBE-Adder3 10 30 150 0.00 450 0.34 266 1.51 40.89
    GF( 2^{4} )-Mult 12 33 225 0.01 675 0.58 388 2.68 42.52
    Mod-Mult55 9 35 119 0.00 341 0.20 211 1.04 38.12
    GF( 2^{5} )-Mult 15 47 347 0.01 1041 0.80 601 3.74 42.27
    CSLA-MUX3 15 50 170 0.01 510 0.44 315 2.23 38.24
    Toff-NC10 19 51 255 0.01 765 0.49 458 3.31 40.13
    GF( 2^{6} )-Mult 18 63 495 0.02 1485 1.19 854 5.36 42.49
    Toff-Barenco10 19 66 450 0.01 1350 0.95 786 4.76 41.78
    RC-Adder6 14 68 200 0.01 584 0.93 361 2.66 38.18
    Mod-Red21 11 74 278 0.01 786 0.93 463 3.39 41.09
    GF( 2^{7} )-Mult 21 81 669 0.02 2007 1.61 1153 7.37 42.55
    CSUM-MUX9 30 84 420 0.01 1204 1.57 721 3.98 40.12
    QCLA-Com7 24 95 443 0.01 1299 1.02 778 5.98 40.11
    QCLA-Adder10 36 113 521 0.01 1563 1.31 957 7.29 38.77
    GF( 2^{8} )-Mult 24 115 883 0.02 2649 2.35 1516 12.95 42.77
    GF( 2^{9} )-Mult 27 123 1095 0.03 3285 2.72 1885 12.09 42.62
    GF( 2^{10} )-Mult 30 147 1347 0.03 4041 3.36 2316 14.90 42.69
    QCLA-Mod7 26 176 884 0.02 2638 2.06 1570 12.24 40.49
    Adder8 24 216 900 0.02 2676 6.79 1623 13.00 39.35
    GF( 2^{16} )-Mult 48 363 3435 0.13 10305 9.44 5865 38.96 43.09
    Mod-Adder1024 28 865 4285 0.09 12855 18.19 7403 57.77 42.41
    GF( 2^{32} )-Mult 96 1305 13593 0.30 40779 38.71 23069 157.25 43.43
    GF( 2^{64} )-Mult 192 4539 53691 1.17 161073 146.63 91065 635.20 43.46
    GF( 2^{128} )-Mult 384 17275 213883 5.59 641649 584.97 362429 3656.48 43.52
    GF( 2^{131} )-Mult 393 18333 224265 5.90 672795 616.90 379766 3927.99 43.55
    GF( 2^{163} )-Mult 489 27705 346533 9.67 1039599 945.51 587034 7452.82 43.53
    Note: n : the number of qubits. N : the gate count of the circuit. N_0 : the gate count of the circuit after decomposition without optimization on gate set G_{ {\rm{Com}}} . N_1 : the gate count of the circuit after rewriting without optimization on gate set G_{ {\rm{Sur}}} . N_2 : the gate count of the circuit after rewriting with optimization on gate set G_{ {\rm{Sur}}} . t_i\, (i = 0,\ 1,\ 2) : running time in seconds. \Delta : (N_1-N_2)/N_1 \times 100\% .
    下载: 导出CSV 
    | 显示表格
    Table  4.  Comparison of Depth of Circuits Generated by QRewriting from the Benchmark Circuits[17]
    Benchmark n d d_0 t_0 d_1 t_1 d_2 t_2 \Delta(\%)
    Toff-NC3 5 7 23 0.00 64 0.03 42 0.50 34.38
    Toff-Barenco3 5 9 31 0.00 86 0.07 52 0.52 39.53
    Mod 54 5 15 36 0.00 97 0.09 49 0.57 49.48
    Toff-NC4 7 11 38 0.00 104 0.10 67 0.89 35.58
    Toff-Barenco4 7 17 61 0.00 166 0.19 102 1.12 38.55
    Toff-NC5 9 15 53 0.00 144 0.18 92 1.34 36.11
    Toff-Barenco5 9 25 91 0.01 246 0.32 152 1.70 38.21
    VBE-Adder3 10 20 70 0.00 194 0.34 113 1.51 41.75
    GF( 2^{4} )-Mult 12 17 85 0.01 236 0.58 145 2.68 38.56
    Mod-Mult55 9 14 43 0.00 118 0.20 80 1.04 32.20
    GF( 2^{5} )-Mult 15 20 111 0.01 310 0.80 187 3.74 39.68
    CSLA-MUX3 15 17 59 0.01 166 0.44 107 2.23 35.54
    Toff-NC10 19 35 128 0.01 344 0.49 217 3.31 36.92
    GF( 2^{6} )-Mult 18 25 139 0.02 384 1.19 235 5.36 38.80
    Toff-Barenco10 19 65 241 0.01 646 0.95 402 4.76 37.77
    RC-Adder6 14 28 93 0.01 261 0.93 166 2.66 36.40
    Mod-Red21 11 43 141 0.01 383 0.93 238 3.39 37.86
    GF( 2^{7} )-Mult 21 29 166 0.02 458 1.61 280 7.37 38.86
    CSUM-MUX9 30 15 53 0.01 147 1.57 96 3.98 34.69
    QCLA-Com7 24 15 70 0.01 192 1.02 115 5.98 40.10
    QCLA-Adder10 36 15 64 0.01 182 1.31 111 7.29 39.01
    GF( 2^{8} )-Mult 24 39 199 0.02 544 2.35 335 12.95 38.42
    GF( 2^{9} )-Mult 27 36 219 0.03 606 2.72 367 12.09 39.44
    GF( 2^{10} )-Mult 30 40 246 0.03 680 3.36 412 14.90 39.41
    QCLA-Mod7 26 39 172 0.02 487 2.06 284 12.24 41.68
    Adder8 24 55 191 0.02 527 6.79 315 13.00 40.23
    GF( 2^{16} )-Mult 48 71 415 0.13 1136 9.44 699 38.96 38.47
    Mod-Adder1024 28 521 2218 0.09 6397 18.19 3775 57.77 40.99
    GF( 2^{32} )-Mult 96 137 849 0.30 2324 38.71 1447 157.25 37.74
    GF( 2^{64} )-Mult 192 263 1711 1.17 4688 146.63 2856 635.20 39.08
    GF( 2^{128} )-Mult 384 517 3437 5.59 9420 584.97 5750 3656.48 38.96
    GF( 2^{131} )-Mult 393 537 3526 5.90 9658 616.90 5902 3927.99 38.89
    GF( 2^{163} )-Mult 489 665 4390 9.67 12026 945.51 7310 7452.82 39.22
    Note: n : the number of qubits. d : the depth of the circuit. d_0 : the depth of the circuit after decomposition without optimization on gate set G_{ {\rm{Com}}} . d_1 : the depth of the circuit after rewriting without optimization on gate set G_{ {\rm{Sur}}} . d_2 : the depth of the circuit after rewriting with optimization on gate set G_{ {\rm{Sur}}} . t_i\,\ (i = 0,\ 1,\ 2) : running time in seconds. \Delta : (d_1-d_2)/d_1\times 100\% .
    下载: 导出CSV 
    | 显示表格

    We introduced a new representation of quantum circuits, which reduces the pattern matching of quantum circuits to the problem of finding distinct subsequences. We presented an algorithm based on dynamic programming to match the pattern circuits in the target circuit. To resolve replacement conflicts, we proposed three policies for generating a replacement scheduler and a polynomial-time replacement algorithm. We developed a rule library for basic optimizations and applied it to rewrite the benchmarks consisting of arithmetic circuits and implementations of multi-controlled Toffoli gates to the Surface-17 processor. Compared with the existing method PaF optimized on the BIGD benchmarks, QRewriting reduces the depth (resp. gate count) by 26.5% (resp. 17.4%), which demonstrates the effectiveness of the proposed method.

    Acknowledgement: The authors would like to thank the anonymous reviewers for their valuable comments.
  • Figure  1.   Illustrative instances demonstrate that the position-dependent features can benefit the NER task. For the instances involving “Microsoft” and “Rome”, the semantic information of the words alone is enough for the recognition because their word embeddings are close to those of their kinds. Conversely, lesser-known individuals such as “Wang Yi” or organizations like “Linzhou University” often encounter out-of-vocabulary issues. The position-dependent information (e.g., the prefix “Wang” or the suffix “University”) could be useful for the recognition.

    Figure  2.   The LEMON architecture contains three main modules: a character encoder, a fragment encoder, and a lexicon memory. For the matched results, we use the letter “E” to denote exactly match, “P” to prefix match, and “S” to suffix match. ORG: organization; PER: person; LOC: location; GPE: geographic entity; NON: none.

    Figure  3.   Three different types of features (character, word, and part-of-speech) used to represent each character.

    Figure  4.   Example heat map. Note that the attentions are sharp for those words particularly useful for the NER tasks.

    Figure  5.   F1-scores versus different training epochs.

    Figure  6.   F1-scores versus different values of \gamma (the decoding threshold). The horizontal line represents the results yielded by the models without the decoding step.

    Table  1   Statistics of Datasets

    Dataset #Train (\times10^3 ) #Dev (\times10^3 ) #Test (\times10^3 ) Domain
    OntoNotes-4[40] 15.7 4.30 4.30 News
    MSRA[41] 46.4 -- 4.40 News
    Weibo NER[29, 42] 1.4 0.27 0.27 Social media
    Resume[12] 3.8 0.46 0.48 Resume
    Note: “#Train”, “#Dev”, and “#Test” denote the sizes of training, development, and test datasets, respectively. -- means not available.
    下载: 导出CSV

    Table  2   Results on the OntoNotes-4 Development Set with Different Network Architectures

    Model Baseline Transformer Bi-RNN
    P (%) R (%) F1 (%) P (%) R (%) F1 (%) P (%) R (%) F1 (%)
    Gold BOW 72.40 62.03 66.81 −− −− −− 73.60 69.08 71.27
    FOFE 75.52 64.86 69.78 64.35 54.04 58.74 76.93 70.43 73.54
    Bi-RNN 73.68 69.74 71.66 59.92 54.87 57.28 71.51 73.66 72.57
    BOW + Lex 78.77 70.40 74.35 76.73 73.48 75.07 78.27 75.34 76.78
    FOFE + Lex 77.33 71.90 74.52 79.92 72.65 76.11 79.49 73.77 76.53
    Bi-RNN + Lex 77.40 74.39 75.87 79.62 73.87 76.64 81.12 75.18 78.04
    Auto BOW 76.67 56.24 64.88 −− −− −− 75.39 61.36 66.92
    FOFE 71.66 58.21 64.24 73.17 61.75 66.98 76.20 61.65 68.16
    Bi-RNN 74.60 63.67 68.70 72.05 63.52 67.52 76.73 63.70 69.61
    BOW + Lex 76.33 64.75 70.06 73.96 64.69 69.02 78.42 67.06 72.30
    FOFE + Lex 77.24 63.91 69.95 78.46 62.93 69.85 76.24 68.76 72.31
    Bi-RNN + Lex 77.62 66.32 71.53 76.79 67.09 71.61 76.57 69.54 72.89
    Note: The word “gold” in the row heading denotes that gold segmentation and part-of-speech results are used, and “auto” denotes they are automatically produced by the THULAC toolkit. The best results are highlighted in bold fonts.
    下载: 导出CSV

    Table  3   Results on the OntoNotes-4 Development Set Using Different Features

    Model Ground Truth Automatically Labelled
    P (%) R (%) F1 (%) P (%) R (%) F1 (%)
    NCRF Char 66.37 60.21 63.14 -- -- --
    Char + Seg 70.58 69.96 70.27 70.77 63.33 66.85
    Char + Pos 71.81 74.48 73.12 70.20 70.26 70.23
    Char + Seg + Pos 75.63 72.35 73.08 72.88 68.18 70.45
    Without Lex Char 67.60 55.03 60.67 -- -- --
    Char + Seg 72.16 66.09 68.99 70.48 62.65 66.33
    Char + Pos 74.39 65.44 69.63 72.87 63.73 67.99
    Char + Seg + Pos 74.97 72.23 73.58 76.29 64.43 69.86
    Lex Char 77.27 60.73 68.01 -- -- --
    Char + Seg 78.40 70.75 74.38 76.11 64.63 69.91
    Char + Pos 77.71 72.35 74.93 77.46 66.89 71.79
    Char + Seg + Pos 78.70 74.95 76.78 76.41 68.61 72.30
    Note: The best results are highlighted in bold fonts.
    下载: 导出CSV

    Table  4   Results on the MSRA Dataset

    Model P (%) R (%) F1 (%)
    Chen et al.[48] 91.22 81.71 86.20
    Zheng et al.[49] 92.20 90.08 91.18
    Lu et al.[50] -- -- 87.94
    Dong et al.[51] 91.28 90.62 90.95
    Zhang and Yang[12] 93.57 92.79 93.18
    LEMON 95.40 91.77 93.55
    Note: The best results are highlighted in bold.
    下载: 导出CSV

    Table  5   Results on the Resume NER Dataset

    Model P (%) R (%) F1 (%)
    Word 93.72 93.44 93.58
    Char 93.66 93.31 93.48
    Word + Char + BiChar 94.07 94.42 94.24
    Char + BiChar + Softword 94.53 94.29 94.41
    Zhang and Yang[12] 94.81 94.11 94.46
    LEMON 95.59 94.07 94.82
    Note: The models indicated with “\dagger” are those in which the sequence labeling model (LSTM + CRF) was used. The best results are highlighted in bold.
    下载: 导出CSV

    Table  6   Results on the OntoNotes-4 Dataset

    Model P (%) R (%) F1 (%)
    Wang et al.[52] 76.43 72.32 74.32
    Che et al.[53] 77.71 72.51 75.02
    Yang et al.[30] 72.98 80.15 76.40
    Zhang and Yang[12] 76.35 71.56 73.88
    LEMON^\dagger 79.27 78.29 78.78
    LEMON 80.61 71.05 75.53
    Note: The model with “\dagger” are those in which the gold results of word segmentation were used. The best results are highlighted in bold.
    下载: 导出CSV

    Table  7   Results on the Weibo NER Dataset

    Model P (%) R (%) F1 (%)
    Peng and Dredze[29] -- -- 58.99
    He and Sun[54] -- -- 58.23
    Zhang and Yang[12] -- -- 58.79
    LEMON 70.86 55.42 62.19
    Note: The best results are highlighted in bold.
    下载: 导出CSV

    Table  8   Recall of Out-of-Vocabulary on OntoNotes-4

    Category Segmentation Lexicon R (%)
    Organization Incorrect w/o 48.7
    Correct w/o 65.8
    Correct with 67.1
    Location Incorrect w/o 14.8
    Correct w/o 14.3
    Correct with 29.7
    Person Incorrect w/o 69.6
    Correct w/o 89.3
    Correct with 93.1
    GPE Incorrect w/o 38.5
    Correct w/o 47.8
    Correct with 87.4
    None Incorrect w/o 47.9
    Correct w/o 68.9
    Correct with 81.0
    Note: We report in this table the recall of out-of-vocabulary in different situations where the results of word segmentation are correct or incorrect and the lexicon could be used (“with”) or not (“w/o”). The best results are highlighted in bold.
    下载: 导出CSV
  • [1]

    Kapetanios E, Tatar D, Sacarea C. Natural Language Processing: Semantic Aspects. CRC Press, 2013. DOI: 10.1201/b15472.

    [2]

    Lin D K, Wu X Y. Phrase clustering for discriminative learning. In Proc. the 2009 Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Aug. 2009, pp.1030–1038.

    [3]

    Nothman J, Ringland N, Radford W, Murphy T, Curran J R. Learning multilingual named entity recognition from Wikipedia. Artificial Intelligence, 2013, 194: 151–175. DOI: 10.1016/j.artint.2012.03.006.

    [4]

    Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 2011, 12: 2493–2537.

    [5]

    Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv: 1508.01991, 2015. https://arxiv.org/abs/1508.01991, Oct. 2023.

    [6]

    Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition. In Proc. the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun. 2016, pp.260–270. DOI: 10.18653/v1/N16-1030.

    [7]

    Huang S, Sun X, Wang H F. Addressing domain adaptation for Chinese word segmentation with global recurrent structure. In Proc. the 8th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Nov. 2017, pp.184–193. DOI: 10.1007/978-3-030-01716-3_3.

    [8]

    Chen L Z, Moschitti A. Learning to progressively recognize new named entities with sequence to sequence models. In Proc. the 27th International Conference on Computational Linguistics, Aug. 2018, pp.2181–2191.

    [9]

    Zhang C Y, Bengio S, Hardt M, Recht B, Vinyals O. Understanding deep learning requires rethinking generalization. arXiv: 1611.03530, 2017. https://arxiv.org/abs/1611.03530, Oct. 2023.

    [10]

    He J Z, Wang H F. Chinese named entity recognition and word segmentation based on character. In Proc. the 6th SIGHAN Workshop on Chinese Language Processing, Jan. 2008, pp.128–132.

    [11]

    Liu Z X, Zhu C H, Zhao T J. Chinese named entity recognition with a sequence labeling approach: Based on characters, or based on words? In Proc. the 2010 Advanced Intelligent Computing Theories and Applications, and the 6th International Conference on Intelligent Computing, Aug. 2010, pp.634–640.

    [12]

    Zhang Y, Yang J. Chinese NER using lattice LSTM. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), July 2018, pp.1554–1564. DOI: 10.18653/v1/P18-1144.

    [13]

    Li J, Sun A X, Han J L, Li C L. A survey on deep learning for named entity recognition. IEEE Trans. Knowledge and Data Engineering, 2022, 34(1): 50–70. DOI: 10.1109/TKDE.2020.2981314.

    [14]

    Xu M B, Jiang H, Watcharawittayakul S. A local detection approach for named entity recognition and mention detection. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), July 2017, pp.1237–1247. DOI: 10.18653/v1/P17- 1114.

    [15]

    Chiu J P C, Nichols E. Named entity recognition with bidirectional LSTM-CNNs. Trans. Association for Computational Linguistics, 2016, 4: 357–370. DOI: 10.1162/tacl_ a_00104.

    [16]

    Hassabis D, Kumaran D, Summerfield C, Botvinick M. Neuroscience-inspired artificial intelligence. Neuron, 2017, 95(2): 245–258. DOI: 10.1016/j.neuron.2017.06.011.

    [17]

    Hopfield J J. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences of the United States of America, 1982, 79(8): 2554–2558. DOI: 10.1073/pnas.79.8.2554.

    [18]

    Anderson J R, Bothell D, Byrne M D, Douglass S, Lebiere C, Qin Y L. An integrated theory of the mind. Psychological Review, 2004, 111(4): 1036–1060. DOI: 10.1037/0033-295X.111.4.1036.

    [19]

    Shallice T. From Neuropsychology to Mental Structure. Cambridge University Press, 1988. DOI: 10.1017/CBO 9780511526817.

    [20]

    Yadav V, Bethard S. A survey on recent advances in named entity recognition from deep learning models. In Proc. the 27th International Conference on Computational Linguistics, Aug. 2018, pp.2145–2158.

    [21]

    Zhang S L, Jiang H, Xu M B, Hou J F, Dai L R. The fixed-size ordinally-forgetting encoding method for neural network language models. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), July 2015, pp.495–500. DOI: 10.3115/v1/P15-2081.

    [22]

    Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv: 1409.0473, 2016. https://arxiv.org/abs/1409.0473, Oct. 2023.

    [23]

    Luong T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation. In Proc. the 2015 Conference on Empirical Methods in Natural Language Processing, Sept. 2015, pp.1412–1421. DOI: 10.18653/ v1/D15-1166.

    [24]

    Rei M, Crichton G, Pyysalo S. Attending to characters in neural sequence labeling models. In Proc. the 26th International Conference on Computational Linguistics: Technical Papers, Dec. 2016, pp.309–318.

    [25]

    Xu G H, Wang C Y, He X F. Improving clinical named entity recognition with global neural attention. In Proc. the 2nd International Joint Conference on Web and Big Data, July 2018, pp.264–279. DOI: 10.1007/978-3-319-96893-3_20.

    [26]

    Zhang Q, Fu J L, Liu X Y, Huang X J. Adaptive co-attention network for named entity recognition in tweets. In Proc. the 32nd AAAI Conference on Artificial Intelligence and the 30th Innovative Applications of Artificial Intelligence Conference and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, Feb. 2018, pp.5674–5681.

    [27]

    Weston J, Chopra S, Bordes A. Memory networks. arXiv: 1410.3916, 2015. https://arxiv.org/abs/1410.3916, Oct. 2023.

    [28]

    Hammerton J. Named entity recognition with long short-term memory. In Proc. the 7th Conference on Natural Language Learning at HLT-NAACL 2003, May 2003, pp.172–175.

    [29]

    Peng N Y, Dredze M. Improving named entity recognition for Chinese social media with word segmentation representation learning. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Aug. 2016, pp.149–155. DOI: 10.18653/v1/P16-2025.

    [30]

    Yang J, Teng Z Y, Zhang M S, Zhang Y. Combining discrete and neural features for sequence labeling. In Proc. the 17th International Conference on Computational Linguistics and Intelligent Text Processing, Mar. 2018, pp.140–154. DOI: 10.1007/978-3-319-75477-2_9.

    [31]

    Xue N W, Shen L B. Chinese word segmentation as LMR tagging. In Proc. the 2nd SIGHAN Workshop on Chinese Language Processing, July 2003, pp.176–179. DOI: 10.3115/1119250.1119278.

    [32]

    Elman J L. Finding structure in time. Cognitive Science, 1990, 14(2): 179–211. DOI: 10.1207/s15516709cog1402_1.

    [33]

    Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780. DOI: 10.1162/ neco.1997.9.8.1735.

    [34]

    Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.6000–6010. DOI: 10.5555/3295222.3295349.

    [35]

    Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of tricks for efficient text classification. In Proc. the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Apr. 2017, pp.427–431.

    [36]

    Conneau A, Kruszewski G, Lample G, Barrault L, Baroni M. What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), July 2018, pp.2126–2136. DOI: 10.18653/v1/P18-1198.

    [37]

    Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In Proc. the 26th International Conference on Neural Information Processing Systems, Dec. 2013, pp.3111–3119.

    [38]

    Sukhbaatar S, Szlam A, Weston J, Fergus R. End-to-end memory networks. In Proc. the 28th International Conference on Neural Information Processing Systems, Dec. 2015, pp.2440–2448.

    [39]

    Lin T Y, Goyal P, Girshick R, He K M, Dollár P. Focal loss for dense object detection. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2999–3007. DOI: 10.1109/ICCV.2017.324.

    [40]

    Weischedel R, Palmer M, Marcus M, Hovy E, Pradhan S, Ramshaw L, Xue N W, Taylor A, Kaufman J, Franchini M, El-Bachouti M, Belvin R, Houston A. OntoNotes release 4.0. Linguistic Data Consortium, Philadelphia, 2011. DOI: 10.35111/gfjf-7r50.

    [41]

    Levow G A. The third international Chinese language processing Bakeoff: Word segmentation and named entity recognition. In Proc. the 5th SIGHAN Workshop on Chinese Language Processing, July 2006, pp.108–117.

    [42]

    Peng N Y, Dredze M. Named entity recognition for Chinese social media with jointly trained embeddings. In Proc. the 2015 Conference on Empirical Methods in Natural Language Processing, Sept. 2015, pp.548–554. DOI: 10.18653/v1/D15-1064.

    [43]

    Sun M S, Chen X X, Zhang K X, Guo Z P, Liu Z Y. THULAC: An efficient lexical analyzer for Chinese. Technical Report, Tsinghua University, 2016. https://nlp.csai.tsinghua.edu.cn/project/thulac/, Oct. 2023. (in Chinese)

    [44]

    Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z M, Desmaison A, Antiga L, Lerer A. Automatic differentiation in PyTorch. In Proc. the 31st Conference on Neural Information Processing Systems, Dec. 2017.

    [45]

    Kingma D P, Ba J. Adam: A method for stochastic optimization. In Proc. the 3rd International Conference on Learning Representations, May 2015.

    [46]

    Devlin J, Chang M W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jun. 2019, pp.4171–4186. DOI: 10.18653/v1/N19-1423.

    [47]

    Yang J, Zhang Y. NCRF++: An open-source neural sequence labeling toolkit. In Proc. the ACL 2018, System Demonstrations, July 2018, pp.74–79. DOI: 10.18653/v1/P18-4013.

    [48]

    Chen A T, Peng F C, Shan R, Sun G. Chinese named entity recognition with conditional probabilistic models. In Proc. the 5th SIGHAN Workshop on Chinese Language Processing, July 2006, pp.173–176.

    [49]

    Zhang S X, Qin Y, Wen J, Wang X J. Word segmentation and named entity recognition for SIGHAN Bakeoff3. In Proc. the 5th SIGHAN Workshop on Chinese Language Processing, July 2006, pp.158–161.

    [50]

    Lu Y, Zhang Y, Ji D H. Multi-prototype Chinese character embedding. In Proc. the 10th International Conference on Language Resources and Evaluation, May 2016, pp.855–859.

    [51]

    Dong C H, Zhang J J, Zong C Q, Hattori M, Di H. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In Natural Language Understanding and Intelligent Applications, Lin C Y, Xue N W, Zhao D Y, Huang X J, Feng Y S (eds.), Springer, 2016, pp.239–250. DOI: 10.1007/978-3-319-50496-4_20.

    [52]

    Wang M Q, Che W X, Manning C D. Effective bilingual constraints for semi-supervised learning of named entity recognizers. In Proc. the 27th AAAI Conference on Artificial Intelligence, July 2013, pp.919–925.

    [53]

    Che W X, Wang M Q, Manning C D, Liu T. Named entity recognition with bilingual constraints. In Proc. the 2013 Conf. the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun. 2013, pp.52–62.

    [54]

    He H F, Sun X. A unified model for cross-domain and semi-supervised named entity recognition in Chinese social media. In Proc. the 31st AAAI Conference on Artificial Intelligence, Feb. 2017, pp.3216–3222.

    [55]

    Zhang Z Y, Han X, Liu Z Y, Jiang X, Sun M S, Liu Q. ERNIE: Enhanced language representation with informative entities. In Proc. the 57th Annual Meeting of the Association for Computational Linguistics, July 2019, pp.1441–1451. DOI: 10.18653/v1/P19-1139.

    [56]

    Yang Z L, Dai Z H, Yang Y M, Carbonell J, Salakhutdinov R, Le Q V. XLNet: Generalized autoregressive pretraining for language understanding. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, pp.5753–5763.

    [57]

    Peters M E, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. Deep contextualized word representations. In Proc. the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), Jun. 2018, pp.2227–2237. DOI: 10.18653/v1/N18-1202.

    [58]

    Liu W J, Zhou P, Zhao Z, Wang Z R, Ju Q, Deng H T, Wang P. K-BERT: Enabling language representation with knowledge graph. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(3): 2901–2908. DOI: 10.1609/aaai.v34i03.5681.

  • 期刊类型引用(1)

    1. Hui Jiang, Jianling Fu, Yuxin Deng, et al. A binary integer programming-based method for qubit mapping in sparse architectures. Acta Informatica, 2025, 62(1) 必应学术

    其他类型引用(0)

  • 其他相关附件

图(6)  /  表(8)
计量
  • 文章访问数:  311
  • HTML全文浏览量:  0
  • PDF下载量:  24
  • 被引次数: 1
出版历程
  • 收稿日期:  2020-11-11
  • 录用日期:  2021-12-06
  • 网络出版日期:  2023-06-19
  • 刊出日期:  2023-09-14

目录

/

返回文章
返回