Continual learning fault diagnosis(CLFD)has gained growing interest in mechanical systems for its ability to accumulate and transfer knowledge in dynamic fault diagnosis scenarios.However,existing CLFD methods typical...Continual learning fault diagnosis(CLFD)has gained growing interest in mechanical systems for its ability to accumulate and transfer knowledge in dynamic fault diagnosis scenarios.However,existing CLFD methods typically assume balanced task distributions,neglecting the long-tailed nature of real-world fault occurrences,where certain faults dominate while others are rare.Due to the long-tailed distribution among different me-chanical conditions,excessive attention has been focused on the dominant type,leading to performance de-gradation in rarer types.In this paper,decoupling incremental classifier and representation learning(DICRL)is proposed to address the dual challenges of catastrophic forgetting introduced by incremental tasks and the bias in long-tailed CLFD(LT-CLFD).The core innovation lies in the structural decoupling of incremental classifier learning and representation learning.An instance-balanced sampling strategy is employed to learn more dis-criminative deep representations from the exemplars selected by the herding algorithm and new data.Then,the previous classifiers are frozen to prevent damage to representation learning during backward propagation.Cosine normalization classifier with learnable weight scaling is trained using a class-balanced sampling strategy to enhance classification accuracy.Experimental results demonstrate that DICRL outperforms existing continual learning methods across multiple benchmarks,demonstrating superior performance and robustness in both LT-CLFD and conventional CLFD.DICRL effectively tackles both catastrophic forgetting and long-tailed distribution in CLFD,enabling more reliable fault diagnosis in industrial applications.展开更多
Objective To address the dual challenges of long-tail distribution and feature sparsity in traditional Chinese medicine(TCM)syndrome differentiation within real clinical settings,we propose a data-efficient learning f...Objective To address the dual challenges of long-tail distribution and feature sparsity in traditional Chinese medicine(TCM)syndrome differentiation within real clinical settings,we propose a data-efficient learning framework enhanced by knowledge graphs.Methods We developed Agent-GNN,a three-stage decoupled learning framework,and validated it on the Traditional Chinese Medicine Syndrome Diagnosis(TCM-SD)dataset containing 54152 clinical records across 148 syndrome categories.First,we constructed a comprehensive medical knowledge graph encoding the complete TCM reasoning system.Second,we proposed a Functional Patient Profiling(FPP)method that utilizes large language models(LLMs)combined with Graph Retrieval-Augmented Generation(RAG)to extract structured symptom-etiology-pathogenesis subgraphs from medical records.Third,we employed heterogeneous graph neural networks to learn structured combination patterns explicitly.We compared our method against multiple baselines including BERT,ZY-BERT,ZY-BERT+Know,GAT,and GPT-4 Few-shot,using macro-F1 score as the primary evaluation metric.Additionally,ablation experiments were conducted to validate the contribution of each key component to model performance.Results Agent-GNN achieved an overall macro-F1 score of 72.4%,representing an 8.7 percentage points improvement over ZY-BERT+Know(63.7%),the strongest baseline among traditional methods.For long-tail syndromes with fewer than 10 samples,Agent-GNN reached a macro-F1 score of 58.6%,compared with 39.3%for ZY-BERT+Know and 41.2%for GPT-4 Few-shot,representing relative improvements of 49.2%and 42.2%,respectively.Ablation experiments confirmed that the explicit modeling of etiology-pathogenesis nodes contributed 12.4 percentage points to this enhanced long-tail syndrome performance.Conclusion This study proposes Agent-GNN,a knowledge graph-enhanced framework that effectively addresses the long-tail distribution challenge in TCM syndrome differentiation.By explicitly modeling manifestation-mechanism-essence patterns through structured knowledge graphs,our approach achieves superior performance in data-scarce scenarios while providing interpretable reasoning paths for TCM intelligent diagnosis.展开更多
For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and r...For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and resources of these smaller devices, current works mostly limit the queries that can be posed by users by having them predetermined by the developers. This limits the capability of these devices in supporting robust queries. Hence, this paper proposes a universal relation based database querying language which is targeted for small devices. The language allows formulation of relational database queries that uses minimal query terms. The formulation of the language and its structure will be described and usability test results will be presented to support the effectiveness of the language.展开更多
Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from quer...Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.Design/methodology/approach:First,we manually labeled 1,220 news queries from Sogou.com.Based on the analysis of these queries,we then identified three features of news queries in terms of query content,time of query occurrence and user click behavior.Afterwards,we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine(SVM)classifier.Finally,we compared the impacts of the features used in this paper on the identification of news queries.Findings:Compared with baseline features,the F-score has been improved from 0.6414 to0.8368 after the use of three newly-identified features,among which the burst point(bst)was the most effective while predicting news queries.In addition,query expression(qes)was more useful than query terms,and among the click behavior-based features,news URL was the most effective one.Research limitations:Analyses based on features extracted from query logs might lead to produce limited results.Instead of short queries,the segmentation tool used in this study has been more widely applied for long texts.Practical implications:The research will be helpful for general-purpose search engines to address search intents for news events.Originality/value:Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.展开更多
Real-world data always exhibit an imbalanced and long-tailed distribution,which leads to poor performance for neural network-based classification.Existing methods mainly tackle this problem by reweighting the loss fun...Real-world data always exhibit an imbalanced and long-tailed distribution,which leads to poor performance for neural network-based classification.Existing methods mainly tackle this problem by reweighting the loss function or rebalancing the classifier.However,one crucial aspect overlooked by previous research studies is the imbalanced feature space problem caused by the imbalanced angle distribution.In this paper,the authors shed light on the significance of the angle distribution in achieving a balanced feature space,which is essential for improving model performance under long-tailed distributions.Nevertheless,it is challenging to effectively balance both the classifier norms and angle distribution due to problems such as the low feature norm.To tackle these challenges,the authors first thoroughly analyse the classifier and feature space by decoupling the classification logits into three key components:classifier norm(i.e.the magnitude of the classifier vector),feature norm(i.e.the magnitude of the feature vector),and cosine similarity between the classifier vector and feature vector.In this way,the authors analyse the change of each component in the training process and reveal three critical problems that should be solved,that is,the imbalanced angle distribution,the lack of feature discrimination,and the low feature norm.Drawing from this analysis,the authors propose a novel loss function that incorporates hyperspherical uniformity,additive angular margin,and feature norm regularisation.Each component of the loss function addresses a specific problem and synergistically contributes to achieving a balanced classifier and feature space.The authors conduct extensive experiments on three popular benchmark datasets including CIFAR-10/100-LT,ImageNet-LT,and iNaturalist 2018.The experimental results demonstrate that the authors’loss function outperforms several previous state-of-the-art methods in addressing the challenges posed by imbalanced and longtailed datasets,that is,by improving upon the best-performing baselines on CIFAR-100-LT by 1.34,1.41,1.41 and 1.33,respectively.展开更多
Ride-hailing(e.g.,DiDi andUber)has become an important tool formodern urban mobility.To improve the utilization efficiency of ride-hailing vehicles,a novel query method,called Approachable k-nearest neighbor(A-kNN),ha...Ride-hailing(e.g.,DiDi andUber)has become an important tool formodern urban mobility.To improve the utilization efficiency of ride-hailing vehicles,a novel query method,called Approachable k-nearest neighbor(A-kNN),has recently been proposed in the industry.Unlike traditional kNN queries,A-kNN considers not only the road network distance but also the availability status of vehicles.In this context,even vehicles with passengers can still be considered potential candidates for dispatch if their destinations are near the requester’s location.The V-Treebased query method,due to its structural characteristics,is capable of efficiently finding k-nearest moving objects within a road network.It is a currently popular query solution in ride-hailing services.However,when vertices to be queried are close in the graph but distant in the index,the V-Tree-based method necessitates the traversal of numerous irrelevant subgraphs,which makes its processing of A-kNN queries less efficient.To address this issue,we optimize the V-Tree-based method and propose a novel index structure,the Path-Accelerated V-Tree(PAV-Tree),to improve query performance by introducing shortcuts.Leveraging this index,we introduce a novel query optimization algorithm,PAVA-kNN,specifically designed to processA-kNNqueries efficiently.Experimental results showthat PAV-A-kNNachieves query times up to 2.2–15 times faster than baseline methods,with microsecond-level latency.展开更多
Time-series databases(TSDBs)are essential for managing large-scale time-series data in fields like finance,IoT,and agriculture.However,traditional query optimization methods,such as dynamic programming,struggle with h...Time-series databases(TSDBs)are essential for managing large-scale time-series data in fields like finance,IoT,and agriculture.However,traditional query optimization methods,such as dynamic programming,struggle with high computational complexity and inaccurate cost estimates.This paper proposes a novel query optimization module for TSDBs using reinforcement learning(RL),specifically Deep Q-Networks(DQN)and Double Deep Q-Networks(DDQN).These algorithms dynamically learn optimal join orders based on query workloads and connection costs.Experiments show that RL-based methods achieve better optimization performance and stability compared to traditional heuristics,especially under complex cost models.This work highlights the potential of RL in improving query optimization for TSDBs.展开更多
Cardinality estimation is crucial for query optimization,but traditional methods struggle with complex queries.We propose LW-CQ,a lightweight machine learning-based algorithm that improves cardinality estimation for c...Cardinality estimation is crucial for query optimization,but traditional methods struggle with complex queries.We propose LW-CQ,a lightweight machine learning-based algorithm that improves cardinality estimation for complex queries by enhancing the LW-XGB method.LW-CQ introduces four feature-level improvements and extends support for disjunctive queries and LIKE predicates.Experimental results show that LW-CQ achieves competitive accuracy while significantly reducing training and inference time,making it a promising solution for real-world database applications.展开更多
In order to protect the privacy of the query user and database,some QKD-based quantum private query(QPQ)protocols were proposed.One example is the protocol proposed by Zhou et al,in which the user makes initial quantu...In order to protect the privacy of the query user and database,some QKD-based quantum private query(QPQ)protocols were proposed.One example is the protocol proposed by Zhou et al,in which the user makes initial quantum states and derives the key bit by comparing the initial quantum state and the outcome state returned from the database by ctrl or shift mode,instead of announcing two non-orthogonal qubits as others which may leak part secret information.To some extent,the security of the database and the privacy of the user are strengthened.Unfortunately,we find that in this protocol,the dishonest user could be obtained,utilizing unambiguous state discrimination,much more database information than that is analyzed in Zhou et al's original research.To strengthen the database security,we improved the mentioned protocol by modifying the information returned by the database in various ways.The analysis indicates that the security of the improved protocols is greatly enhanced.展开更多
聚焦于中小型企业,深入探讨借助Excel Power Query工具批量生成记账凭证的方法。通过分析中小型企业记账凭证处理的现状,对比手工录入的会计电算化记账方式(以下简称手工录账)与借助Excel Power Query批量生成记账凭证的模式,阐述Excel ...聚焦于中小型企业,深入探讨借助Excel Power Query工具批量生成记账凭证的方法。通过分析中小型企业记账凭证处理的现状,对比手工录入的会计电算化记账方式(以下简称手工录账)与借助Excel Power Query批量生成记账凭证的模式,阐述Excel Power Query在数据处理各环节的应用优势,详细介绍应用该工具批量生成记账凭证的具体步骤,并结合实际案例展示其应用效果。展开更多
To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
Aiming at the problem that only some types of SPARQL ( simple protocal and resource description framework query language) queries can be answered by using the current resource description framework link traversal ba...Aiming at the problem that only some types of SPARQL ( simple protocal and resource description framework query language) queries can be answered by using the current resource description framework link traversal based query execution (RDF-LTE) approach, this paper discusses how the execution order of the triple pattern affects the query results and cost based on concrete SPARQL queries, and analyzes two properties of the web of linked data, missing backward links and missing contingency solution. Then three heuristic principles for logic query plan optimization, namely, the filtered basic graph pattern (FBGP) principle, the triple pattern chain principle and the seed URIs principle, are proposed. The three principles contribute to decrease the intermediate solutions and increase the types of queries that can be answered. The effectiveness and feasibility of the proposed approach is evaluated. The experimental results show that more query results can be returned with less cost, thus enabling users to develop the full potential of the web of linked data.展开更多
Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and re...Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and realized. The model is an extension of the traditional co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Experimental results on TREC-9 collections show this query expansion method results in significant improvements over the IR without query expansion.展开更多
查询是数据库系统最主要的操作,查询性能直接决定了应用程序的响应速度和用户体验.多查询并行场景下,查询之间争用或共享数据库系统资源,产生查询交互(Query Interaction,QI),是影响查询性能的主要因素,准确度量QI是为查询选择合适执行...查询是数据库系统最主要的操作,查询性能直接决定了应用程序的响应速度和用户体验.多查询并行场景下,查询之间争用或共享数据库系统资源,产生查询交互(Query Interaction,QI),是影响查询性能的主要因素,准确度量QI是为查询选择合适执行计划及提升查询性能的关键.QI随着查询中操作的执行动态变化,现有度量方法只考虑新查询加入时刻系统资源的使用情况,不考虑系统资源在查询执行过程中的变化,度量不准确.为此,本文提出查询组合时序异构图,用于描述查询组合中QI随时间的动态变化;提出时间感知多边类型权重计算模型(Time-Aware Multi-edge Type Weight Calculation,TA-MTWC),计算异构图中操作节点之间任意执行时刻的边权重,捕捉QI随时间的动态变化;提出查询组合时序异构图分类模型(Query-mix Time-series Heterogeneous Graph Classification,QTHGC),采用长短期记忆神经网络(Long Short Term Memory,LSTM)学习多个时刻图表示之间的时序关系,为并行查询选择执行计划.在PostgreSQL上的实验证明,QTHGC的平均准确率比查询优化器提高51.2%,比现有最新的QHGC模型提高2.87%.展开更多
基金Supported by National Natural Science Foundation of China(Grant No.52272440)Suzhou Science Foundation(Grant Nos.SYG202323,ZXL2022027).
文摘Continual learning fault diagnosis(CLFD)has gained growing interest in mechanical systems for its ability to accumulate and transfer knowledge in dynamic fault diagnosis scenarios.However,existing CLFD methods typically assume balanced task distributions,neglecting the long-tailed nature of real-world fault occurrences,where certain faults dominate while others are rare.Due to the long-tailed distribution among different me-chanical conditions,excessive attention has been focused on the dominant type,leading to performance de-gradation in rarer types.In this paper,decoupling incremental classifier and representation learning(DICRL)is proposed to address the dual challenges of catastrophic forgetting introduced by incremental tasks and the bias in long-tailed CLFD(LT-CLFD).The core innovation lies in the structural decoupling of incremental classifier learning and representation learning.An instance-balanced sampling strategy is employed to learn more dis-criminative deep representations from the exemplars selected by the herding algorithm and new data.Then,the previous classifiers are frozen to prevent damage to representation learning during backward propagation.Cosine normalization classifier with learnable weight scaling is trained using a class-balanced sampling strategy to enhance classification accuracy.Experimental results demonstrate that DICRL outperforms existing continual learning methods across multiple benchmarks,demonstrating superior performance and robustness in both LT-CLFD and conventional CLFD.DICRL effectively tackles both catastrophic forgetting and long-tailed distribution in CLFD,enabling more reliable fault diagnosis in industrial applications.
基金Sichuan TCM Culture Coordinated Development Research Center Project(2023XT131)National Key Science and Technology Project of China(2023ZD0509405)National Natural Science Foundation of China(82174236).
文摘Objective To address the dual challenges of long-tail distribution and feature sparsity in traditional Chinese medicine(TCM)syndrome differentiation within real clinical settings,we propose a data-efficient learning framework enhanced by knowledge graphs.Methods We developed Agent-GNN,a three-stage decoupled learning framework,and validated it on the Traditional Chinese Medicine Syndrome Diagnosis(TCM-SD)dataset containing 54152 clinical records across 148 syndrome categories.First,we constructed a comprehensive medical knowledge graph encoding the complete TCM reasoning system.Second,we proposed a Functional Patient Profiling(FPP)method that utilizes large language models(LLMs)combined with Graph Retrieval-Augmented Generation(RAG)to extract structured symptom-etiology-pathogenesis subgraphs from medical records.Third,we employed heterogeneous graph neural networks to learn structured combination patterns explicitly.We compared our method against multiple baselines including BERT,ZY-BERT,ZY-BERT+Know,GAT,and GPT-4 Few-shot,using macro-F1 score as the primary evaluation metric.Additionally,ablation experiments were conducted to validate the contribution of each key component to model performance.Results Agent-GNN achieved an overall macro-F1 score of 72.4%,representing an 8.7 percentage points improvement over ZY-BERT+Know(63.7%),the strongest baseline among traditional methods.For long-tail syndromes with fewer than 10 samples,Agent-GNN reached a macro-F1 score of 58.6%,compared with 39.3%for ZY-BERT+Know and 41.2%for GPT-4 Few-shot,representing relative improvements of 49.2%and 42.2%,respectively.Ablation experiments confirmed that the explicit modeling of etiology-pathogenesis nodes contributed 12.4 percentage points to this enhanced long-tail syndrome performance.Conclusion This study proposes Agent-GNN,a knowledge graph-enhanced framework that effectively addresses the long-tail distribution challenge in TCM syndrome differentiation.By explicitly modeling manifestation-mechanism-essence patterns through structured knowledge graphs,our approach achieves superior performance in data-scarce scenarios while providing interpretable reasoning paths for TCM intelligent diagnosis.
文摘For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and resources of these smaller devices, current works mostly limit the queries that can be posed by users by having them predetermined by the developers. This limits the capability of these devices in supporting robust queries. Hence, this paper proposes a universal relation based database querying language which is targeted for small devices. The language allows formulation of relational database queries that uses minimal query terms. The formulation of the language and its structure will be described and usability test results will be presented to support the effectiveness of the language.
基金supported by the Social Science Planning Foundation of Chongqing(Grant No.:2011QNCB28)
文摘Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.Design/methodology/approach:First,we manually labeled 1,220 news queries from Sogou.com.Based on the analysis of these queries,we then identified three features of news queries in terms of query content,time of query occurrence and user click behavior.Afterwards,we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine(SVM)classifier.Finally,we compared the impacts of the features used in this paper on the identification of news queries.Findings:Compared with baseline features,the F-score has been improved from 0.6414 to0.8368 after the use of three newly-identified features,among which the burst point(bst)was the most effective while predicting news queries.In addition,query expression(qes)was more useful than query terms,and among the click behavior-based features,news URL was the most effective one.Research limitations:Analyses based on features extracted from query logs might lead to produce limited results.Instead of short queries,the segmentation tool used in this study has been more widely applied for long texts.Practical implications:The research will be helpful for general-purpose search engines to address search intents for news events.Originality/value:Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.
基金National Key Research and Development Program of China,Grant/Award Numbers:2022YFB3103900,2023YFB3106504Major Key Project of PCL,Grant/Award Numbers:PCL2022A03,PCL2023A09+5 种基金Shenzhen Basic Research,Grant/Award Number:JCYJ20220531095214031Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies,Grant/Award Number:2022B1212010005Shenzhen International Science and Technology Cooperation Project,Grant/Award Number:GJHZ20220913143008015Natural Science Foundation of Guangdong Province,Grant/Award Number:2023A1515011959Shenzhen-Hong Kong Jointly Funded Project,Grant/Award Number:SGDX20230116091246007Shenzhen Science and Technology Program,Grant/Award Numbers:RCBS20221008093131089,ZDSYS20210623091809029。
文摘Real-world data always exhibit an imbalanced and long-tailed distribution,which leads to poor performance for neural network-based classification.Existing methods mainly tackle this problem by reweighting the loss function or rebalancing the classifier.However,one crucial aspect overlooked by previous research studies is the imbalanced feature space problem caused by the imbalanced angle distribution.In this paper,the authors shed light on the significance of the angle distribution in achieving a balanced feature space,which is essential for improving model performance under long-tailed distributions.Nevertheless,it is challenging to effectively balance both the classifier norms and angle distribution due to problems such as the low feature norm.To tackle these challenges,the authors first thoroughly analyse the classifier and feature space by decoupling the classification logits into three key components:classifier norm(i.e.the magnitude of the classifier vector),feature norm(i.e.the magnitude of the feature vector),and cosine similarity between the classifier vector and feature vector.In this way,the authors analyse the change of each component in the training process and reveal three critical problems that should be solved,that is,the imbalanced angle distribution,the lack of feature discrimination,and the low feature norm.Drawing from this analysis,the authors propose a novel loss function that incorporates hyperspherical uniformity,additive angular margin,and feature norm regularisation.Each component of the loss function addresses a specific problem and synergistically contributes to achieving a balanced classifier and feature space.The authors conduct extensive experiments on three popular benchmark datasets including CIFAR-10/100-LT,ImageNet-LT,and iNaturalist 2018.The experimental results demonstrate that the authors’loss function outperforms several previous state-of-the-art methods in addressing the challenges posed by imbalanced and longtailed datasets,that is,by improving upon the best-performing baselines on CIFAR-100-LT by 1.34,1.41,1.41 and 1.33,respectively.
基金supported by the Special Project of Henan Provincial Key Research,Development and Promotion(Key Science and Technology Program)under Grant 252102210154in part by the National Natural Science Foundation of China under Grant 62403437.
文摘Ride-hailing(e.g.,DiDi andUber)has become an important tool formodern urban mobility.To improve the utilization efficiency of ride-hailing vehicles,a novel query method,called Approachable k-nearest neighbor(A-kNN),has recently been proposed in the industry.Unlike traditional kNN queries,A-kNN considers not only the road network distance but also the availability status of vehicles.In this context,even vehicles with passengers can still be considered potential candidates for dispatch if their destinations are near the requester’s location.The V-Treebased query method,due to its structural characteristics,is capable of efficiently finding k-nearest moving objects within a road network.It is a currently popular query solution in ride-hailing services.However,when vertices to be queried are close in the graph but distant in the index,the V-Tree-based method necessitates the traversal of numerous irrelevant subgraphs,which makes its processing of A-kNN queries less efficient.To address this issue,we optimize the V-Tree-based method and propose a novel index structure,the Path-Accelerated V-Tree(PAV-Tree),to improve query performance by introducing shortcuts.Leveraging this index,we introduce a novel query optimization algorithm,PAVA-kNN,specifically designed to processA-kNNqueries efficiently.Experimental results showthat PAV-A-kNNachieves query times up to 2.2–15 times faster than baseline methods,with microsecond-level latency.
基金supported by Sichuan Science and Technology Program(2024YFHZ0161).
文摘Time-series databases(TSDBs)are essential for managing large-scale time-series data in fields like finance,IoT,and agriculture.However,traditional query optimization methods,such as dynamic programming,struggle with high computational complexity and inaccurate cost estimates.This paper proposes a novel query optimization module for TSDBs using reinforcement learning(RL),specifically Deep Q-Networks(DQN)and Double Deep Q-Networks(DDQN).These algorithms dynamically learn optimal join orders based on query workloads and connection costs.Experiments show that RL-based methods achieve better optimization performance and stability compared to traditional heuristics,especially under complex cost models.This work highlights the potential of RL in improving query optimization for TSDBs.
基金supported by Sichuan Science and Technology Program(2024YFHZ0161).
文摘Cardinality estimation is crucial for query optimization,but traditional methods struggle with complex queries.We propose LW-CQ,a lightweight machine learning-based algorithm that improves cardinality estimation for complex queries by enhancing the LW-XGB method.LW-CQ introduces four feature-level improvements and extends support for disjunctive queries and LIKE predicates.Experimental results show that LW-CQ achieves competitive accuracy while significantly reducing training and inference time,making it a promising solution for real-world database applications.
基金supported by the National Key R&D Program of China(Grant No.2022YFC3801700)the National Natural Science Foundation of China(Grant No.62472052)Xinjiang Production and Construction Corps Key Laboratory of Computing Intelligence and Network Information Security(Grant No.CZ002702-3)。
文摘In order to protect the privacy of the query user and database,some QKD-based quantum private query(QPQ)protocols were proposed.One example is the protocol proposed by Zhou et al,in which the user makes initial quantum states and derives the key bit by comparing the initial quantum state and the outcome state returned from the database by ctrl or shift mode,instead of announcing two non-orthogonal qubits as others which may leak part secret information.To some extent,the security of the database and the privacy of the user are strengthened.Unfortunately,we find that in this protocol,the dishonest user could be obtained,utilizing unambiguous state discrimination,much more database information than that is analyzed in Zhou et al's original research.To strengthen the database security,we improved the mentioned protocol by modifying the information returned by the database in various ways.The analysis indicates that the security of the improved protocols is greatly enhanced.
文摘聚焦于中小型企业,深入探讨借助Excel Power Query工具批量生成记账凭证的方法。通过分析中小型企业记账凭证处理的现状,对比手工录入的会计电算化记账方式(以下简称手工录账)与借助Excel Power Query批量生成记账凭证的模式,阐述Excel Power Query在数据处理各环节的应用优势,详细介绍应用该工具批量生成记账凭证的具体步骤,并结合实际案例展示其应用效果。
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
基金The National Natural Science Foundation of China(No.61070170)the Natural Science Foundation of Higher Education Institutions of Jiangsu Province(No.11KJB520017)Suzhou Application Foundation Research Project(No.SYG201238)
文摘Aiming at the problem that only some types of SPARQL ( simple protocal and resource description framework query language) queries can be answered by using the current resource description framework link traversal based query execution (RDF-LTE) approach, this paper discusses how the execution order of the triple pattern affects the query results and cost based on concrete SPARQL queries, and analyzes two properties of the web of linked data, missing backward links and missing contingency solution. Then three heuristic principles for logic query plan optimization, namely, the filtered basic graph pattern (FBGP) principle, the triple pattern chain principle and the seed URIs principle, are proposed. The three principles contribute to decrease the intermediate solutions and increase the types of queries that can be answered. The effectiveness and feasibility of the proposed approach is evaluated. The experimental results show that more query results can be returned with less cost, thus enabling users to develop the full potential of the web of linked data.
文摘Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and realized. The model is an extension of the traditional co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Experimental results on TREC-9 collections show this query expansion method results in significant improvements over the IR without query expansion.
文摘查询是数据库系统最主要的操作,查询性能直接决定了应用程序的响应速度和用户体验.多查询并行场景下,查询之间争用或共享数据库系统资源,产生查询交互(Query Interaction,QI),是影响查询性能的主要因素,准确度量QI是为查询选择合适执行计划及提升查询性能的关键.QI随着查询中操作的执行动态变化,现有度量方法只考虑新查询加入时刻系统资源的使用情况,不考虑系统资源在查询执行过程中的变化,度量不准确.为此,本文提出查询组合时序异构图,用于描述查询组合中QI随时间的动态变化;提出时间感知多边类型权重计算模型(Time-Aware Multi-edge Type Weight Calculation,TA-MTWC),计算异构图中操作节点之间任意执行时刻的边权重,捕捉QI随时间的动态变化;提出查询组合时序异构图分类模型(Query-mix Time-series Heterogeneous Graph Classification,QTHGC),采用长短期记忆神经网络(Long Short Term Memory,LSTM)学习多个时刻图表示之间的时序关系,为并行查询选择执行计划.在PostgreSQL上的实验证明,QTHGC的平均准确率比查询优化器提高51.2%,比现有最新的QHGC模型提高2.87%.