Inspired by the concept of content-addressable retrieval from cognitive science,we propose a novel fragment-based Chinese named entity recognition(NER)model augmented with a lexicon-based memory in which both characte...Inspired by the concept of content-addressable retrieval from cognitive science,we propose a novel fragment-based Chinese named entity recognition(NER)model augmented with a lexicon-based memory in which both character-level and word-level features are combined to generate better feature representations for possible entity names.Observing that the boundary information of entity names is particularly useful to locate and classify them into pre-defined categories,position-dependent features,such as prefix and suffix,are introduced and taken into account for NER tasks in the form of distributed representations.The lexicon-based memory is built to help generate such position-dependent features and deal with the problem of out-of-vocabulary words.Experimental results show that the proposed model,called LEMON,achieved state-of-the-art performance with an increase in the Fl-score up to 3.2%over the state-of-the-art models on four different widely-used NER datasets.展开更多
Content addressable storage (CAS) is a promising technology for improving storage efficiency as well as access throughput. Currently, many CAS products are implemented on the block level, which results in loss of fi...Content addressable storage (CAS) is a promising technology for improving storage efficiency as well as access throughput. Currently, many CAS products are implemented on the block level, which results in loss of file information. Thus, some sophisticated optimizations cannot be achieved, such as accurate fileprefetching. This paper presents a file-aware block-level storage system combined with the CAS function. In contrast with some existing file-level CAS, this system is transparent to upper-level applications, including the operating system and the file system. These features are achieved by using smart-disk technologies to help the storage system to learn the file-system layout. A prototype was implemented on an open-source virtual machine (VM) with the vip operating system being Windows XP. Tests show that this combination significantly reduces the size of the VM image file and improves the storage performance by discarding unused blocks and using a simple file-level prefetching strategy.展开更多
This paper studies known indexing structures from a new point of view:minimisation of data exchange between an IoT device acting as a blockchain client and the blockchain server running a protocol suite that includes ...This paper studies known indexing structures from a new point of view:minimisation of data exchange between an IoT device acting as a blockchain client and the blockchain server running a protocol suite that includes two Guy Fawkes protocols,PLS and SLVP.The PLS blockchain is not a cryptocurrency instrument;it is an immutable ledger offering guaranteed non-repudiation to low-power clients without use of public key crypto.The novelty of the situation is in the fact that every PLS client has to obtain a proof of absence in all blocks of the chain to which its counterparty does not contribute,and we show that it is possible without traversing the block's Merkle tree.We obtain weight statistics of a leaf path on a sparse Merkle tree theoretically,as our ground case.Using the theory we quantify the communication cost of a client interacting with the blockchain.We show that large savings can be achieved by providing a bitmap index of the tree compressed using Tunstall's method.We further show that even in the case of correlated access,as in two IoT devices posting messages for each other in consecutive blocks,it is possible to prevent compression degradation by re-randomising the IDs using a pseudorandom bijective function.We propose a low-cost function of this kind and evaluate its quality by simulation,using the avalanche criterion.展开更多
This paper studies known indexing structures from a new point of view:minimisation of data exchange between an IoT device acting as a blockchain client and the blockchain server running a protocol suite that includes ...This paper studies known indexing structures from a new point of view:minimisation of data exchange between an IoT device acting as a blockchain client and the blockchain server running a protocol suite that includes two Guy Fawkes protocols,PLS and SLVP.The PLS blockchain is not a cryptocurrency instrument;it is an immutable ledger offering guaranteed non-repudiation to low-power clients without use of public key crypto.The novelty of the situation is in the fact that every PLS client has to obtain a proof of absence in all blocks of the chain to which its counterparty does not contribute,and we show that it is possible without traversing the block’s Merkle tree.We obtain weight statistics of a leaf path on a sparse Merkle tree theoretically,as our ground case.Using the theory we quantify the communication cost of a client interacting with the blockchain.We show that large savings can be achieved by providing a bitmap index of the tree compressed using Tunstall’s method.We further show that even in the case of correlated access,as in two IoT devices posting messages for each other in consecutive blocks,it is possible to prevent compression degradation by re-randomising the IDs using a pseudorandom bijective function.We propose a low-cost function of this kind and evaluate its quality by simulation,using the avalanche criterion.展开更多
A novel cascaded charge-sharing technique is presented in content-addressable memories(CAMs),which not only effectively reduces the match-line(ML) power by using a pre-select circuit,but also realizes a high searc...A novel cascaded charge-sharing technique is presented in content-addressable memories(CAMs),which not only effectively reduces the match-line(ML) power by using a pre-select circuit,but also realizes a high search speed.Pre-layout simulation results show a 75.9% energy-delay-product(EDP) reduction of the MLs over the traditional precharge-high ML scheme and 41.3% over the segmented ML method.Based on this technique,a test-chip of 64-word × 144-bit ternary CAM(TCAM) is implemented using a 0.18-μm 1.8-V CMOS process,achieving an 1.0 ns search delay and 4.81 fJ/bit/search for the MLs.展开更多
基金supported by the National Key Research and Development Program of China under Grant No.2018YFC0830900the National Natural Science Foundation of China under Grant No.62076068Shanghai Municipal Science and Technology Project under Grant No.21511102800。
文摘Inspired by the concept of content-addressable retrieval from cognitive science,we propose a novel fragment-based Chinese named entity recognition(NER)model augmented with a lexicon-based memory in which both character-level and word-level features are combined to generate better feature representations for possible entity names.Observing that the boundary information of entity names is particularly useful to locate and classify them into pre-defined categories,position-dependent features,such as prefix and suffix,are introduced and taken into account for NER tasks in the form of distributed representations.The lexicon-based memory is built to help generate such position-dependent features and deal with the problem of out-of-vocabulary words.Experimental results show that the proposed model,called LEMON,achieved state-of-the-art performance with an increase in the Fl-score up to 3.2%over the state-of-the-art models on four different widely-used NER datasets.
基金Supported by the National Natural Science Foundation of China(No. 60773147)the National High-Tech Research and Development (863) Program of China (No. 2006AA01Z111)the Basic Research Foundation of Tsinghua National Laboratory for Information Science and Technology (TNList)
文摘Content addressable storage (CAS) is a promising technology for improving storage efficiency as well as access throughput. Currently, many CAS products are implemented on the block level, which results in loss of file information. Thus, some sophisticated optimizations cannot be achieved, such as accurate fileprefetching. This paper presents a file-aware block-level storage system combined with the CAS function. In contrast with some existing file-level CAS, this system is transparent to upper-level applications, including the operating system and the file system. These features are achieved by using smart-disk technologies to help the storage system to learn the file-system layout. A prototype was implemented on an open-source virtual machine (VM) with the vip operating system being Windows XP. Tests show that this combination significantly reduces the size of the VM image file and improves the storage performance by discarding unused blocks and using a simple file-level prefetching strategy.
基金supported in part by IMC corporation,Slovakia,under EU Project BRAINE(Grant 876967).
文摘This paper studies known indexing structures from a new point of view:minimisation of data exchange between an IoT device acting as a blockchain client and the blockchain server running a protocol suite that includes two Guy Fawkes protocols,PLS and SLVP.The PLS blockchain is not a cryptocurrency instrument;it is an immutable ledger offering guaranteed non-repudiation to low-power clients without use of public key crypto.The novelty of the situation is in the fact that every PLS client has to obtain a proof of absence in all blocks of the chain to which its counterparty does not contribute,and we show that it is possible without traversing the block's Merkle tree.We obtain weight statistics of a leaf path on a sparse Merkle tree theoretically,as our ground case.Using the theory we quantify the communication cost of a client interacting with the blockchain.We show that large savings can be achieved by providing a bitmap index of the tree compressed using Tunstall's method.We further show that even in the case of correlated access,as in two IoT devices posting messages for each other in consecutive blocks,it is possible to prevent compression degradation by re-randomising the IDs using a pseudorandom bijective function.We propose a low-cost function of this kind and evaluate its quality by simulation,using the avalanche criterion.
基金IMC corporation, Slovakia, under EUProject BRAINE (Grant 876967).
文摘This paper studies known indexing structures from a new point of view:minimisation of data exchange between an IoT device acting as a blockchain client and the blockchain server running a protocol suite that includes two Guy Fawkes protocols,PLS and SLVP.The PLS blockchain is not a cryptocurrency instrument;it is an immutable ledger offering guaranteed non-repudiation to low-power clients without use of public key crypto.The novelty of the situation is in the fact that every PLS client has to obtain a proof of absence in all blocks of the chain to which its counterparty does not contribute,and we show that it is possible without traversing the block’s Merkle tree.We obtain weight statistics of a leaf path on a sparse Merkle tree theoretically,as our ground case.Using the theory we quantify the communication cost of a client interacting with the blockchain.We show that large savings can be achieved by providing a bitmap index of the tree compressed using Tunstall’s method.We further show that even in the case of correlated access,as in two IoT devices posting messages for each other in consecutive blocks,it is possible to prevent compression degradation by re-randomising the IDs using a pseudorandom bijective function.We propose a low-cost function of this kind and evaluate its quality by simulation,using the avalanche criterion.
文摘A novel cascaded charge-sharing technique is presented in content-addressable memories(CAMs),which not only effectively reduces the match-line(ML) power by using a pre-select circuit,but also realizes a high search speed.Pre-layout simulation results show a 75.9% energy-delay-product(EDP) reduction of the MLs over the traditional precharge-high ML scheme and 41.3% over the segmented ML method.Based on this technique,a test-chip of 64-word × 144-bit ternary CAM(TCAM) is implemented using a 0.18-μm 1.8-V CMOS process,achieving an 1.0 ns search delay and 4.81 fJ/bit/search for the MLs.