In the paper, based on the job characteristics and resources availability, an optimistic checkpoint mechanism for dynamic grids(OCM4G) is proposed. It can determine whether to checkpoint a given job running on a giv...In the paper, based on the job characteristics and resources availability, an optimistic checkpoint mechanism for dynamic grids(OCM4G) is proposed. It can determine whether to checkpoint a given job running on a given resource node and establish optimal aperiodic checkpoint intervals by applying the knowledge of job characteristics and resource availability. We evaluate OCM4G over a real grid environment (ChitlaGrid) and the results show that OCM4G achieves better performance than the periodic checkpoint and the analytical method of calculating aperiodic checkpoint intervals.展开更多
Resource oversubscription optimizes the utilization of the computing resources. Many well-known virtual machine monitors(VMMs)such as Xen and KVM,adopt this approach to help maximize the yield of the cloud datacenters...Resource oversubscription optimizes the utilization of the computing resources. Many well-known virtual machine monitors(VMMs)such as Xen and KVM,adopt this approach to help maximize the yield of the cloud datacenters That is,with proper resource oversubscription strategies,more virtual machines(VMs) can be supported by limited resources. However performance interference among VMs hosting in the same physical machines(PMs) exists in cloud environment,and probably aggravated by resource oversubscription strategies,which aims to put more VMs into the same PM. In this paper,we present a resource oversubscription strategy called Sponge targeting cloud platforms Sponge mitigates the issue of performance interference among the oversubscribed co-hosting VMs. Sponge also provides a VM association strategy for each PM to handle with its besteffort. We performed our evaluation on a virtua datacenter simulated by Xen. Our evaluation results show that Sponge improves the resources utilization and manages to make each VM mee its performance requirement even hosting with other VMs in the same PM.展开更多
Dependability analysis is an important step in designing and analyzing safety computer systems and protection systems.Introducing multi-processor and virtual machine increases the system faults' complexity,diversi...Dependability analysis is an important step in designing and analyzing safety computer systems and protection systems.Introducing multi-processor and virtual machine increases the system faults' complexity,diversity and dynamic,in particular for software-induced failures,with an impact on the overall dependability.Moreover,it is very different for safety system to operate successfully at any active phase,since there is a huge difference in failure rate between hardware-induced and softwareinduced failures.To handle these difficulties and achieve accurate dependability evaluation,consistently reflecting the construct it measures,a new formalism derived from dynamic fault graphs(DFG) is developed in this paper.DFG exploits the concept of system event as fault state sequences to represent dynamic behaviors,which allows us to execute probabilistic measures at each timestamp when change occurs.The approach automatically combines the reliability analysis with the system dynamics.In this paper,we describe how to use the proposed methodology drives to the overall system dependability analysis through the phases of modeling,structural discovery and probability analysis,which is also discussed using an example of a virtual computing system.展开更多
Delta-based accumulative iterative computation (DAIC) model is currently proposed to support iterative algorithms in a synchronous or an asynchronous way. However, both the synchronous DAIC model and the asynchronou...Delta-based accumulative iterative computation (DAIC) model is currently proposed to support iterative algorithms in a synchronous or an asynchronous way. However, both the synchronous DAIC model and the asynchronous DAIC model only satisfy some given conditions, respectively, and perform poorly under other conditions either for high synchronization cost or for many redundant activations. As a result, the whole performance of both DAIC models suffers from the serious network jitter and load jitter caused by multi- tenancy in the cloud. In this paper, we develop a system, namely Hyblter, to guarantee the performance of iterative algorithms under different conditions. Through an adaptive execution model selection scheme, it can efficiently switch between synchronous and asynchronous DAIC model in order to be adapted to different conditions, always getting the best performance in the cloud. Experimental results show that our approach can improve the performance of current solutions up to 39.0%.展开更多
Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartG...Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartGrid II is the implemented database gird system whose goal is to provide a semantic solution for integrating database resources on the Web. Although many algorithms have been proposed for optimizing query-processing in order to minimize costs and/or response time, associated with obtaining the answer to query in a distributed database system, database grid query optimization problem is fundamentally different from traditional distributed query optimization. These differences are shown to be the consequences of autonomy and heterogeneity of database nodes in database grid. Therefore, more challenges have arisen for query optimization in database grid than traditional distributed database. Following this observation, the design of a query optimizer in DartGrid II is presented, and a heuristic, dynamic and parallel query optimization approach to processing query in database grid is proposed. A set of semantic tools supporting relational database integration and semantic-based information browsing has also been implemented to realize the above vision.展开更多
Reachability query plays a vital role in many graph analysis tasks.Previous researches proposed many methods to efficiently answer reachability queries between vertex pairs.Since many real graphs are labeled graph,it ...Reachability query plays a vital role in many graph analysis tasks.Previous researches proposed many methods to efficiently answer reachability queries between vertex pairs.Since many real graphs are labeled graph,it highly demands Label-Constrained Reachability(LCR)query inwhich constraint includes a set of labels besides vertex pairs.Recent researches proposed several methods for answering some LCR queries which require appearance of some labels specified in constraints in the path.Besides that constraint may be a label set,query constraint may be ordered labels,namely OLCR(Ordered-Label-Constrained Reachability)queries which retrieve paths matching a sequence of labels.Currently,no solutions are available for OLCR.Here,we propose DHL,a novel bloom filter based indexing technique for answering OLCR queries.DHL can be used to check reachability between vertex pairs.If the answers are not no,then constrained DFS is performed.So,we employ DHL followed by performing constrained DFS to answer OLCR queries.We show that DHL has a bounded false positive rate,and it's powerful in saving indexing time and space.Extensive experiments on 10 real-life graphs and 12 synthetic graphs demonstrate that DHL achieves about 4.8-22.5 times smaller index space and 4.6-114 times less index construction time than two state-of-art techniques for LCR queries,while achieving comparable query response time.The results also show that our algorithm can answer OLCR queries effectively.展开更多
The emergence of semantic web will result in an enormous amount of knowledge base resources on the web. In this paper, a generic Knowledge Base Grid Architecture (KB-Grid)for building large-scale knowledge systems on ...The emergence of semantic web will result in an enormous amount of knowledge base resources on the web. In this paper, a generic Knowledge Base Grid Architecture (KB-Grid)for building large-scale knowledge systems on the semantic web is presented. KB-Grid suggests a paradigm that emphasizes how to organize, discover, utilize, and manage web knowledge base resources. Four principal components are under development: a semantic browser for retrieving and browsing semantically enriched information, a knowledge server acting as the web container for knowledge, an ontology server for managing web ontologies, and a knowledge base directory server acting as the registry and catalog of KBs. Also a referential model of knowledge service and the mechanisms required for semantic communication within KB-Grid are defined. To verify the design rationale underlying the KB-Grid, an implementation of Traditional Chinese Medicine(TCM) is described.展开更多
General purpose graphics processing units(GPGPUs)can be used to improve computing performance considerably for regular applications.However,irregular memory access exists in many applications,and the benefits of graph...General purpose graphics processing units(GPGPUs)can be used to improve computing performance considerably for regular applications.However,irregular memory access exists in many applications,and the benefits of graphics processing units(GPUs)are less substantial for irregular applications.In recent years,several studies have presented some solutions to remove static irregular memory access.However,eliminating dynamic irregular memory access with software remains a serious challenge.A pure software solution without hardware extensions or offline profiling is proposed to eliminate dynamic irregular memory access,especially for indirect memory access.Data reordering and index redirection are suggested to reduce the number of memory transactions,thereby improving the performance of GPU kernels.To improve the efficiency of data reordering,an operation to reorder data is offloaded to a GPU to reduce overhead and thus transfer data.Through concurrently executing the compute unified device architecture(CUDA)streams of data reordering and the data processing kernel,the overhead of data reordering can be reduced.After these optimizations,the volume of memory transactions can be reduced by 16.7%-50%compared with CUSPARSE-based benchmarks,and the performance of irregular kernels can be improved by 9.64%-34.9%using an NVIDIA Tesla P4 GPU.展开更多
Knowledge graph(KG)representation learning aims to map entities and relations into a low-dimensional representation space,showing significant potential in many tasks.Existing approaches follow two categories:(1)Graph-...Knowledge graph(KG)representation learning aims to map entities and relations into a low-dimensional representation space,showing significant potential in many tasks.Existing approaches follow two categories:(1)Graph-based approaches encode KG elements into vectors using structural score functions.(2)Text-based approaches embed text descriptions of entities and relations via pre-trained language models(PLMs),further fine-tuned with triples.We argue that graph-based approaches struggle with sparse data,while text-based approaches face challenges with complex relations.To address these limitations,we propose a unified Text-Augmented Attention-based Recurrent Network,bridging the gap between graph and natural language.Specifically,we employ a graph attention network based on local influence weights to model local structural information and utilize a PLM based prompt learning to learn textual information,enhanced by a mask-reconstruction strategy based on global influence weights and textual contrastive learning for improved robustness and generalizability.Besides,to effectively model multi-hop relations,we propose a novel semantic-depth guided path extraction algorithm and integrate cross-attention layers into recurrent neural networks to facilitate learning the long-term relation dependency and offer an adaptive attention mechanism for varied-length information.Extensive experiments demonstrate that our model exhibits superiority over existing models across KG completion and question-answering tasks.展开更多
Knowledge Graphs(KGs)are pivotal for effectively organizing and managing structured information across various applications.Financial KGs have been successfully employed in advancing applications such as audit,anti-fr...Knowledge Graphs(KGs)are pivotal for effectively organizing and managing structured information across various applications.Financial KGs have been successfully employed in advancing applications such as audit,anti-fraud,and anti-money laundering.Despite their success,the construction of Chinese financial KGs has seen limited research due to the complex semantics.A significant challenge is the overlap triples problem,where entities feature in multiple relations within a sentence,hampering extraction accuracy-more than 39%of the triples in Chinese datasets exhibit the overlap triples.To address this,we propose the Entity-type-Enriched Cascaded Neural Network(E^(2)CNN),leveraging special tokens for entity boundaries and types.E^(2)CNN ensures consistency in entity types and excludes specific relations,mitigating overlap triple problems and enhancing relation extraction.Besides,we introduce the available Chinese financial dataset FINCORPUS.CN,annotated from annual reports of 2,000 companies,containing 48,389 entities and 23,368 triples.Experimental results on the DUIE dataset and FINCORPUS.CN underscore E^(2)CNN’s superiority over state-of-the-art models.展开更多
Container-based virtualization is increasingly popular in cloud computing due to its efficiency and flexibility.Isolation is a fundamental property of containers and weak isolation could cause significant performance ...Container-based virtualization is increasingly popular in cloud computing due to its efficiency and flexibility.Isolation is a fundamental property of containers and weak isolation could cause significant performance degradation and security vulnerability.However,existing works have almost not discussed the isolation problems of system log which is critical for monitoring and maintenance of containerized applications.In this paper,we present a detailed isolation analysis of system log in current container environment.First,we find several system log isolation problems which can cause significant impacts on system usability,security,and efficiency.For example,system log accidentally exposes information of host and co-resident containers to one container,causing information leakage.Second,we reveal that the root cause of these isolation problems is that containers share the global log configuration,the same log storage,and the global log view.To address these problems,we design and implement a system named private logs(POGs).POGs provides each container with its own log configuration and stores logs individually for each container,avoiding log configuration and storage sharing,respectively.In addition,POGs enables private log view to help distinguish which container the logs belong to.The experimental results show that POGs can effectively enhance system log isolation for containers with negligible performance overhead.展开更多
Graph neural networks(GNNs)have gained traction and have been applied to various graph-based data analysis tasks due to their high performance.However,a major concern is their robustness,particularly when faced with g...Graph neural networks(GNNs)have gained traction and have been applied to various graph-based data analysis tasks due to their high performance.However,a major concern is their robustness,particularly when faced with graph data that has been deliberately or accidentally polluted with noise.This presents a challenge in learning robust GNNs under noisy conditions.To address this issue,we propose a novel framework called Soft-GNN,which mitigates the influence of label noise by adapting the data utilized in training.Our approach employs a dynamic data utilization strategy that estimates adaptive weights based on prediction deviation,local deviation,and global deviation.By better utilizing significant training samples and reducing the impact of label noise through dynamic data selection,GNNs are trained to be more robust.We evaluate the performance,robustness,generality,and complexity of our model on five real-world datasets,and our experimental results demonstrate the superiority of our approach over existing methods.展开更多
Although many graph processing systems have been proposed, graphs in the real-world are often dynamic. It is important to keep the results of graph computation up-todate. Incremental computation is demonstrated to be ...Although many graph processing systems have been proposed, graphs in the real-world are often dynamic. It is important to keep the results of graph computation up-todate. Incremental computation is demonstrated to be an efficient solution to update calculated results. Recently, many incremental graph processing systems have been proposed to handle dynamic graphs in an asynchronous way and are able to achieve better performance than those processed in a synchronous way. However, these solutions still suffer from sub-optimal convergence speed due to their slow propagation of important vertex state (important to convergence speed) and poor locality. In order to solve these problems, we propose a novel graph processing framework. It introduces a dynamic partition method to gather the important vertices for high locality, and then uses a priority-based scheduling algorithm to assign them with a higher priority for an effective processing order. By such means, it is able to reduce the number of updates and increase the locality, thereby reducing the convergence time. Experimental results show that our method reduces the number of updates by 30%, and reduces the total execution time by 35%, compared with state-of-the-art systems.展开更多
To satisfy the rapid growth of cloud technologies, a large number of web applications have been developed and deployed, and these applications are being run in clouds. Due to the scalability provided by clouds, a sing...To satisfy the rapid growth of cloud technologies, a large number of web applications have been developed and deployed, and these applications are being run in clouds. Due to the scalability provided by clouds, a single web application may be concurrently visited by several millions or billions of users. Thus, the testing and performance evaluations of these applications are increasingly important. User model based evaluations can significantly reduce the manual work required, and can enable us to determine the performance of applications under real runtime environments. Hence, it has become one of the most popular evaluation methods in both industry and academia. Significant efforts have focused on building different kinds of models using mining web access logs, such as Markov models and Customer Behavior Model Graph (CBMG). This paper proposes a new kind of model, named the User Representation Model Graph (URMG), which is built based on CBMG. It uses an algorithm to refine CBMG and optimizes the evaluations execution process. Based on this model, an automatic testing and evaluation system for web applications is designed, implemented, and deployed in our test cloud, which is able to execute all of the analysis and testing operations using only web access logs. In our system, the error rate caused by random access to applications in the execution phase is also reduced, and the results show that the error rate of the evaluation that depends on URMG is 50% less than that which depends on CBMG.展开更多
Blockchain has recently emerged as a research trend,with potential applications in a broad range of industries and context.One particular successful Blockchain technology is smart contract,which is widely used in comm...Blockchain has recently emerged as a research trend,with potential applications in a broad range of industries and context.One particular successful Blockchain technology is smart contract,which is widely used in commercial settings(e.g.,high value financial transactions).This,however,has security implications due to the potential to financially benefit from a security incident(e.g.,identification and exploitation of a vulnerability in the smart contract or its implementation).Among,Ethereum is the most active and arresting.Hence,in this paper,we systematically review existing research efforts on Ethereum smart contract security,published between 2015 and 2019.Specifically,we focus on how smart contracts can be maliciously exploited and targeted,such as security issues of contract program model,vulnerabilities in the program and safety consideration introduced by program execution environment.We also identify potential research opportunities and future research agenda.展开更多
With the increasing amount of data,there is an urgent need for efficient sorting algorithms to process large data sets.Hardware sorting algorithms have attracted much attention because they can take advantage of diffe...With the increasing amount of data,there is an urgent need for efficient sorting algorithms to process large data sets.Hardware sorting algorithms have attracted much attention because they can take advantage of different hardware's parallelism.But the traditional hardware sort accelerators suffer“memory wall”problems since their multiple rounds of data transmission between the memory and the processor.In this paper,we utilize the in-situ processing ability of the ReRAM crossbar to design a new ReCAM array that can process the matrix-vector multiplication operation and the vector-scalar comparison in the same array simultaneously.Using this designed ReCAM array,we present ReCSA,which is the first dedicated ReCAM-based sort accelerator.Besides hardware designs,we also develop algorithms to maximize memory utilization and minimize memory exchanges to improve sorting performance.The sorting algorithm in ReCSA can process various data types,such as integer,float,double,and strings.We also present experiments to evaluate the performance and energy efficiency against the state-of-the-art sort accelerators.The experimental results show that ReCSA has 90.92×,46.13×,27.38×,84.57×,and 3.36×speedups against CPU-,GPU-,FPGA-,NDP-,and PIM-based platforms when processing numeric data sets.ReCSA also has 24.82×,32.94×,and 18.22×performance improvement when processing string data sets compared with CPU-,GPU-,and FPGA-based platforms.展开更多
Graph model has been widely applied in docu- ment summarization by using sentence as the graph node, and the similarity between sentences as the edge. In this paper, a novel graph model for document summarization is p...Graph model has been widely applied in docu- ment summarization by using sentence as the graph node, and the similarity between sentences as the edge. In this paper, a novel graph model for document summarization is presented, that not only sentences relevance but also phrases relevance information included in sentences are utilized. In a word, we construct a phrase-sentence two-layer graph structure model (PSG) to summarize document(s) . We use this model for generic document summarization and query-focused sum- marization. The experimental results show that our model greatly outperforms existing work.展开更多
As one of the early COVID-19 epidemic outbreak areas,China attracted the global news media’s attention at the beginning of 2020.During the epidemic period,Chinese people united and actively fought against the epidemi...As one of the early COVID-19 epidemic outbreak areas,China attracted the global news media’s attention at the beginning of 2020.During the epidemic period,Chinese people united and actively fought against the epidemic.However,in the eyes of the international public,the situation reported about China is not optimistic.To better understand how the international public portrays China,especially during the epidemic,we present a case study with big data technology.We aim to answer three questions:(1)What has the international media focused on during the COVID-19 epidemic period?(2)What is the media’s tone when they report China?(3)What is the media’s attitude when talking about China?In detail,we crawled more than 280000 pieces of news from 57 mainstream media agencies in 22 countries and made some interesting observations.For example,international media paid more attention to Chinese livelihood during the COVID-19 epidemic period.In March and April,“progress of Chinese vaccines,”“specific drugs and treatments,”and“virus outbreak in U.S.”became the media’s most common topics.In terms of news attitude,Cuba,Malaysia,and Venezuela had a positive attitude toward China,while France,Canada,and the United Kingdom had a negative attitude.Our study can help understand China’s image in the eyes of the international media and provide a sound basis for image analysis.展开更多
Hybrid memory systems composed of dynamic random access memory(DRAM)and Non-volatile memory(NVM)often exploit page migration technologies to fully take the advantages of different memory media.Most previous proposals ...Hybrid memory systems composed of dynamic random access memory(DRAM)and Non-volatile memory(NVM)often exploit page migration technologies to fully take the advantages of different memory media.Most previous proposals usually migrate data at a granularity of 4 KB pages,and thus waste memory bandwidth and DRAM resource.In this paper,we propose Mocha,a non-hierarchical architecture that organizes DRAM and NVM in a flat address space physically,but manages them in a cache/memory hierarchy.Since the commercial NVM device-Intel Optane DC Persistent Memory Modules(DCPMM)actually access the physical media at a granularity of 256 bytes(an Optane block),we manage the DRAM cache at the 256-byte size to adapt to this feature of Optane.This design not only enables fine-grained data migration and management for the DRAM cache,but also avoids write amplification for Intel Optane DCPMM.We also create an Indirect Address Cache(IAC)in Hybrid Memory Controller(HMC)and propose a reverse address mapping table in the DRAM to speed up address translation and cache replacement.Moreover,we exploit a utility-based caching mechanism to filter cold blocks in the NVM,and further improve the efficiency of the DRAM cache.We implement Mocha in an architectural simulator.Experimental results show that Mocha can improve application performance by 8.2%on average(up to 24.6%),reduce 6.9%energy consumption and 25.9%data migration traffic on average,compared with a typical hybrid memory architecture-HSCC.展开更多
基金Supported by the National Natural Science Foundation of China (90412010,60603058,and 60673174)the Ministry of Education of China and Program for New Century Excellent Talents in University (NCET-07-0334)
文摘In the paper, based on the job characteristics and resources availability, an optimistic checkpoint mechanism for dynamic grids(OCM4G) is proposed. It can determine whether to checkpoint a given job running on a given resource node and establish optimal aperiodic checkpoint intervals by applying the knowledge of job characteristics and resource availability. We evaluate OCM4G over a real grid environment (ChitlaGrid) and the results show that OCM4G achieves better performance than the periodic checkpoint and the analytical method of calculating aperiodic checkpoint intervals.
基金supported by National Science Foundation of China under grant No.61232008National 863 Hi-Tech Research and Development Program under grant No.2013AA01A208 and 2015AA011402
文摘Resource oversubscription optimizes the utilization of the computing resources. Many well-known virtual machine monitors(VMMs)such as Xen and KVM,adopt this approach to help maximize the yield of the cloud datacenters That is,with proper resource oversubscription strategies,more virtual machines(VMs) can be supported by limited resources. However performance interference among VMs hosting in the same physical machines(PMs) exists in cloud environment,and probably aggravated by resource oversubscription strategies,which aims to put more VMs into the same PM. In this paper,we present a resource oversubscription strategy called Sponge targeting cloud platforms Sponge mitigates the issue of performance interference among the oversubscribed co-hosting VMs. Sponge also provides a VM association strategy for each PM to handle with its besteffort. We performed our evaluation on a virtua datacenter simulated by Xen. Our evaluation results show that Sponge improves the resources utilization and manages to make each VM mee its performance requirement even hosting with other VMs in the same PM.
基金This work was supported in part by National Natural Science Foundation of China under grant No.61272411 and National 973 Basic Research Program of China under grant No.2014CB340600
文摘Dependability analysis is an important step in designing and analyzing safety computer systems and protection systems.Introducing multi-processor and virtual machine increases the system faults' complexity,diversity and dynamic,in particular for software-induced failures,with an impact on the overall dependability.Moreover,it is very different for safety system to operate successfully at any active phase,since there is a huge difference in failure rate between hardware-induced and softwareinduced failures.To handle these difficulties and achieve accurate dependability evaluation,consistently reflecting the construct it measures,a new formalism derived from dynamic fault graphs(DFG) is developed in this paper.DFG exploits the concept of system event as fault state sequences to represent dynamic behaviors,which allows us to execute probabilistic measures at each timestamp when change occurs.The approach automatically combines the reliability analysis with the system dynamics.In this paper,we describe how to use the proposed methodology drives to the overall system dependability analysis through the phases of modeling,structural discovery and probability analysis,which is also discussed using an example of a virtual computing system.
基金Acknowledgements This paper was supported by the National Natural Science Foundation of China (Grant Nos. 61272408, 61322210), National High-tech Research and Development Program of China (863 Program) (2012AA010905), CCCPC Youngth Talent Plan, Doctoral Fund of Ministry of Education of China (20130142110048).
文摘Delta-based accumulative iterative computation (DAIC) model is currently proposed to support iterative algorithms in a synchronous or an asynchronous way. However, both the synchronous DAIC model and the asynchronous DAIC model only satisfy some given conditions, respectively, and perform poorly under other conditions either for high synchronization cost or for many redundant activations. As a result, the whole performance of both DAIC models suffers from the serious network jitter and load jitter caused by multi- tenancy in the cloud. In this paper, we develop a system, namely Hyblter, to guarantee the performance of iterative algorithms under different conditions. Through an adaptive execution model selection scheme, it can efficiently switch between synchronous and asynchronous DAIC model in order to be adapted to different conditions, always getting the best performance in the cloud. Experimental results show that our approach can improve the performance of current solutions up to 39.0%.
文摘Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartGrid II is the implemented database gird system whose goal is to provide a semantic solution for integrating database resources on the Web. Although many algorithms have been proposed for optimizing query-processing in order to minimize costs and/or response time, associated with obtaining the answer to query in a distributed database system, database grid query optimization problem is fundamentally different from traditional distributed query optimization. These differences are shown to be the consequences of autonomy and heterogeneity of database nodes in database grid. Therefore, more challenges have arisen for query optimization in database grid than traditional distributed database. Following this observation, the design of a query optimizer in DartGrid II is presented, and a heuristic, dynamic and parallel query optimization approach to processing query in database grid is proposed. A set of semantic tools supporting relational database integration and semantic-based information browsing has also been implemented to realize the above vision.
基金supported by the National Natural Science Foundation of China(Grant Nos.61932004 and 62072205).
文摘Reachability query plays a vital role in many graph analysis tasks.Previous researches proposed many methods to efficiently answer reachability queries between vertex pairs.Since many real graphs are labeled graph,it highly demands Label-Constrained Reachability(LCR)query inwhich constraint includes a set of labels besides vertex pairs.Recent researches proposed several methods for answering some LCR queries which require appearance of some labels specified in constraints in the path.Besides that constraint may be a label set,query constraint may be ordered labels,namely OLCR(Ordered-Label-Constrained Reachability)queries which retrieve paths matching a sequence of labels.Currently,no solutions are available for OLCR.Here,we propose DHL,a novel bloom filter based indexing technique for answering OLCR queries.DHL can be used to check reachability between vertex pairs.If the answers are not no,then constrained DFS is performed.So,we employ DHL followed by performing constrained DFS to answer OLCR queries.We show that DHL has a bounded false positive rate,and it's powerful in saving indexing time and space.Extensive experiments on 10 real-life graphs and 12 synthetic graphs demonstrate that DHL achieves about 4.8-22.5 times smaller index space and 4.6-114 times less index construction time than two state-of-art techniques for LCR queries,while achieving comparable query response time.The results also show that our algorithm can answer OLCR queries effectively.
文摘The emergence of semantic web will result in an enormous amount of knowledge base resources on the web. In this paper, a generic Knowledge Base Grid Architecture (KB-Grid)for building large-scale knowledge systems on the semantic web is presented. KB-Grid suggests a paradigm that emphasizes how to organize, discover, utilize, and manage web knowledge base resources. Four principal components are under development: a semantic browser for retrieving and browsing semantically enriched information, a knowledge server acting as the web container for knowledge, an ontology server for managing web ontologies, and a knowledge base directory server acting as the registry and catalog of KBs. Also a referential model of knowledge service and the mechanisms required for semantic communication within KB-Grid are defined. To verify the design rationale underlying the KB-Grid, an implementation of Traditional Chinese Medicine(TCM) is described.
基金Project supported by the National Key Research and Development Program of China(No.2018YFB1003500)。
文摘General purpose graphics processing units(GPGPUs)can be used to improve computing performance considerably for regular applications.However,irregular memory access exists in many applications,and the benefits of graphics processing units(GPUs)are less substantial for irregular applications.In recent years,several studies have presented some solutions to remove static irregular memory access.However,eliminating dynamic irregular memory access with software remains a serious challenge.A pure software solution without hardware extensions or offline profiling is proposed to eliminate dynamic irregular memory access,especially for indirect memory access.Data reordering and index redirection are suggested to reduce the number of memory transactions,thereby improving the performance of GPU kernels.To improve the efficiency of data reordering,an operation to reorder data is offloaded to a GPU to reduce overhead and thus transfer data.Through concurrently executing the compute unified device architecture(CUDA)streams of data reordering and the data processing kernel,the overhead of data reordering can be reduced.After these optimizations,the volume of memory transactions can be reduced by 16.7%-50%compared with CUSPARSE-based benchmarks,and the performance of irregular kernels can be improved by 9.64%-34.9%using an NVIDIA Tesla P4 GPU.
基金supported in part by National Key R&D Program of China(2020AAA0108501).
文摘Knowledge graph(KG)representation learning aims to map entities and relations into a low-dimensional representation space,showing significant potential in many tasks.Existing approaches follow two categories:(1)Graph-based approaches encode KG elements into vectors using structural score functions.(2)Text-based approaches embed text descriptions of entities and relations via pre-trained language models(PLMs),further fine-tuned with triples.We argue that graph-based approaches struggle with sparse data,while text-based approaches face challenges with complex relations.To address these limitations,we propose a unified Text-Augmented Attention-based Recurrent Network,bridging the gap between graph and natural language.Specifically,we employ a graph attention network based on local influence weights to model local structural information and utilize a PLM based prompt learning to learn textual information,enhanced by a mask-reconstruction strategy based on global influence weights and textual contrastive learning for improved robustness and generalizability.Besides,to effectively model multi-hop relations,we propose a novel semantic-depth guided path extraction algorithm and integrate cross-attention layers into recurrent neural networks to facilitate learning the long-term relation dependency and offer an adaptive attention mechanism for varied-length information.Extensive experiments demonstrate that our model exhibits superiority over existing models across KG completion and question-answering tasks.
基金supported in part by the National Key R&D Program of China(Grant No.2020AAA0108501).
文摘Knowledge Graphs(KGs)are pivotal for effectively organizing and managing structured information across various applications.Financial KGs have been successfully employed in advancing applications such as audit,anti-fraud,and anti-money laundering.Despite their success,the construction of Chinese financial KGs has seen limited research due to the complex semantics.A significant challenge is the overlap triples problem,where entities feature in multiple relations within a sentence,hampering extraction accuracy-more than 39%of the triples in Chinese datasets exhibit the overlap triples.To address this,we propose the Entity-type-Enriched Cascaded Neural Network(E^(2)CNN),leveraging special tokens for entity boundaries and types.E^(2)CNN ensures consistency in entity types and excludes specific relations,mitigating overlap triple problems and enhancing relation extraction.Besides,we introduce the available Chinese financial dataset FINCORPUS.CN,annotated from annual reports of 2,000 companies,containing 48,389 entities and 23,368 triples.Experimental results on the DUIE dataset and FINCORPUS.CN underscore E^(2)CNN’s superiority over state-of-the-art models.
基金supported by the National Key R&D Program(2022YFB4500704)the National Natural Science Foundation of China(Grant No.62032008).
文摘Container-based virtualization is increasingly popular in cloud computing due to its efficiency and flexibility.Isolation is a fundamental property of containers and weak isolation could cause significant performance degradation and security vulnerability.However,existing works have almost not discussed the isolation problems of system log which is critical for monitoring and maintenance of containerized applications.In this paper,we present a detailed isolation analysis of system log in current container environment.First,we find several system log isolation problems which can cause significant impacts on system usability,security,and efficiency.For example,system log accidentally exposes information of host and co-resident containers to one container,causing information leakage.Second,we reveal that the root cause of these isolation problems is that containers share the global log configuration,the same log storage,and the global log view.To address these problems,we design and implement a system named private logs(POGs).POGs provides each container with its own log configuration and stores logs individually for each container,avoiding log configuration and storage sharing,respectively.In addition,POGs enables private log view to help distinguish which container the logs belong to.The experimental results show that POGs can effectively enhance system log isolation for containers with negligible performance overhead.
基金supported by the National Natural Science Foundation of China(Grant No.62127808).
文摘Graph neural networks(GNNs)have gained traction and have been applied to various graph-based data analysis tasks due to their high performance.However,a major concern is their robustness,particularly when faced with graph data that has been deliberately or accidentally polluted with noise.This presents a challenge in learning robust GNNs under noisy conditions.To address this issue,we propose a novel framework called Soft-GNN,which mitigates the influence of label noise by adapting the data utilized in training.Our approach employs a dynamic data utilization strategy that estimates adaptive weights based on prediction deviation,local deviation,and global deviation.By better utilizing significant training samples and reducing the impact of label noise through dynamic data selection,GNNs are trained to be more robust.We evaluate the performance,robustness,generality,and complexity of our model on five real-world datasets,and our experimental results demonstrate the superiority of our approach over existing methods.
基金the National Natural Science Foundation of China (Grant No. 61702202)China Postdoctoral Science Foundation Funded Project (2017M610477 and 2017T100555).
文摘Although many graph processing systems have been proposed, graphs in the real-world are often dynamic. It is important to keep the results of graph computation up-todate. Incremental computation is demonstrated to be an efficient solution to update calculated results. Recently, many incremental graph processing systems have been proposed to handle dynamic graphs in an asynchronous way and are able to achieve better performance than those processed in a synchronous way. However, these solutions still suffer from sub-optimal convergence speed due to their slow propagation of important vertex state (important to convergence speed) and poor locality. In order to solve these problems, we propose a novel graph processing framework. It introduces a dynamic partition method to gather the important vertices for high locality, and then uses a priority-based scheduling algorithm to assign them with a higher priority for an effective processing order. By such means, it is able to reduce the number of updates and increase the locality, thereby reducing the convergence time. Experimental results show that our method reduces the number of updates by 30%, and reduces the total execution time by 35%, compared with state-of-the-art systems.
基金supported by the National Natural Science Foundation of China(No.61232008)the National High-Tech Research and Development(863)Program of China(Nos.2013AA01A213 and 2013AA01A208)+1 种基金Chinese Universities Scientific Fund(No.2013TS094)Guangzhou Science and Technology Program(No.2012Y2-00040)
文摘To satisfy the rapid growth of cloud technologies, a large number of web applications have been developed and deployed, and these applications are being run in clouds. Due to the scalability provided by clouds, a single web application may be concurrently visited by several millions or billions of users. Thus, the testing and performance evaluations of these applications are increasingly important. User model based evaluations can significantly reduce the manual work required, and can enable us to determine the performance of applications under real runtime environments. Hence, it has become one of the most popular evaluation methods in both industry and academia. Significant efforts have focused on building different kinds of models using mining web access logs, such as Markov models and Customer Behavior Model Graph (CBMG). This paper proposes a new kind of model, named the User Representation Model Graph (URMG), which is built based on CBMG. It uses an algorithm to refine CBMG and optimizes the evaluations execution process. Based on this model, an automatic testing and evaluation system for web applications is designed, implemented, and deployed in our test cloud, which is able to execute all of the analysis and testing operations using only web access logs. In our system, the error rate caused by random access to applications in the execution phase is also reduced, and the results show that the error rate of the evaluation that depends on URMG is 50% less than that which depends on CBMG.
基金This work was supported by the National Key Research and Development(R&D)Plan of China(2019YFB2101700)the Science and Technology Program of Guangzhou(201902020016)+2 种基金the Shenzhen Fundamental Research Program(JCYJ20170413114215614)the Guangdong Provincial Science and Technology Plan Project(2017B010124001)the Guangdong Provincial Key R&D Plan Project(2019B010139001).
文摘Blockchain has recently emerged as a research trend,with potential applications in a broad range of industries and context.One particular successful Blockchain technology is smart contract,which is widely used in commercial settings(e.g.,high value financial transactions).This,however,has security implications due to the potential to financially benefit from a security incident(e.g.,identification and exploitation of a vulnerability in the smart contract or its implementation).Among,Ethereum is the most active and arresting.Hence,in this paper,we systematically review existing research efforts on Ethereum smart contract security,published between 2015 and 2019.Specifically,we focus on how smart contracts can be maliciously exploited and targeted,such as security issues of contract program model,vulnerabilities in the program and safety consideration introduced by program execution environment.We also identify potential research opportunities and future research agenda.
基金supported by the National Natural Science Foundation of China(Grant Nos.61832006,62072195,and 61825202).
文摘With the increasing amount of data,there is an urgent need for efficient sorting algorithms to process large data sets.Hardware sorting algorithms have attracted much attention because they can take advantage of different hardware's parallelism.But the traditional hardware sort accelerators suffer“memory wall”problems since their multiple rounds of data transmission between the memory and the processor.In this paper,we utilize the in-situ processing ability of the ReRAM crossbar to design a new ReCAM array that can process the matrix-vector multiplication operation and the vector-scalar comparison in the same array simultaneously.Using this designed ReCAM array,we present ReCSA,which is the first dedicated ReCAM-based sort accelerator.Besides hardware designs,we also develop algorithms to maximize memory utilization and minimize memory exchanges to improve sorting performance.The sorting algorithm in ReCSA can process various data types,such as integer,float,double,and strings.We also present experiments to evaluate the performance and energy efficiency against the state-of-the-art sort accelerators.The experimental results show that ReCSA has 90.92×,46.13×,27.38×,84.57×,and 3.36×speedups against CPU-,GPU-,FPGA-,NDP-,and PIM-based platforms when processing numeric data sets.ReCSA also has 24.82×,32.94×,and 18.22×performance improvement when processing string data sets compared with CPU-,GPU-,and FPGA-based platforms.
文摘Graph model has been widely applied in docu- ment summarization by using sentence as the graph node, and the similarity between sentences as the edge. In this paper, a novel graph model for document summarization is presented, that not only sentences relevance but also phrases relevance information included in sentences are utilized. In a word, we construct a phrase-sentence two-layer graph structure model (PSG) to summarize document(s) . We use this model for generic document summarization and query-focused sum- marization. The experimental results show that our model greatly outperforms existing work.
文摘As one of the early COVID-19 epidemic outbreak areas,China attracted the global news media’s attention at the beginning of 2020.During the epidemic period,Chinese people united and actively fought against the epidemic.However,in the eyes of the international public,the situation reported about China is not optimistic.To better understand how the international public portrays China,especially during the epidemic,we present a case study with big data technology.We aim to answer three questions:(1)What has the international media focused on during the COVID-19 epidemic period?(2)What is the media’s tone when they report China?(3)What is the media’s attitude when talking about China?In detail,we crawled more than 280000 pieces of news from 57 mainstream media agencies in 22 countries and made some interesting observations.For example,international media paid more attention to Chinese livelihood during the COVID-19 epidemic period.In March and April,“progress of Chinese vaccines,”“specific drugs and treatments,”and“virus outbreak in U.S.”became the media’s most common topics.In terms of news attitude,Cuba,Malaysia,and Venezuela had a positive attitude toward China,while France,Canada,and the United Kingdom had a negative attitude.Our study can help understand China’s image in the eyes of the international media and provide a sound basis for image analysis.
基金supported jointly by the National Key Research and Development Program of China (No.2022YFB4500303)the National Natural Science Foundation of China (NSFC) (Grant Nos.62072198,61832006,61825202,61929103).
文摘Hybrid memory systems composed of dynamic random access memory(DRAM)and Non-volatile memory(NVM)often exploit page migration technologies to fully take the advantages of different memory media.Most previous proposals usually migrate data at a granularity of 4 KB pages,and thus waste memory bandwidth and DRAM resource.In this paper,we propose Mocha,a non-hierarchical architecture that organizes DRAM and NVM in a flat address space physically,but manages them in a cache/memory hierarchy.Since the commercial NVM device-Intel Optane DC Persistent Memory Modules(DCPMM)actually access the physical media at a granularity of 256 bytes(an Optane block),we manage the DRAM cache at the 256-byte size to adapt to this feature of Optane.This design not only enables fine-grained data migration and management for the DRAM cache,but also avoids write amplification for Intel Optane DCPMM.We also create an Indirect Address Cache(IAC)in Hybrid Memory Controller(HMC)and propose a reverse address mapping table in the DRAM to speed up address translation and cache replacement.Moreover,we exploit a utility-based caching mechanism to filter cold blocks in the NVM,and further improve the efficiency of the DRAM cache.We implement Mocha in an architectural simulator.Experimental results show that Mocha can improve application performance by 8.2%on average(up to 24.6%),reduce 6.9%energy consumption and 25.9%data migration traffic on average,compared with a typical hybrid memory architecture-HSCC.