期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Benchmarking in-memory database
1
作者 Cheqing JIN Yangxin KONG +2 位作者 Qiangqiang KANG Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2016年第6期1067-1081,共15页
We have witnessed exciting development of RAM technology in the past decade. The memory size grows rapidly and the price continues to decrease, so that it is fea- sible to deploy large amounts of RAM in a computer sys... We have witnessed exciting development of RAM technology in the past decade. The memory size grows rapidly and the price continues to decrease, so that it is fea- sible to deploy large amounts of RAM in a computer system. Several companies and research institutions have devoted a lot of resources to develop in-memory databases (IMDB) that implement queries after loading data into (virtual) memory in advance. The bloom of various in-memory databases pursues us to test and evaluate their performance objectively and fairly. Although the existing database benchmarks like Wisconsin benchmark and TPC-X series have achieved great success, they cannot suit for in-memory databases due to the lack of consideration of unique characteristics of an IMDB. In this study, we propose MemTest, a novel benchmark that concerns some major characteristics of an in-memory database. This benchmark constructs particular metrics, which cover processing time, compression ratio, minimal memory space and column strength of an in-memory database. We design a data model based on inter-bank transaction applications, and a data generator to support uniform and skew data distributions. The MemTest workload includes a set of queries and transactions against the metrics and data model. Finally, we illustrate the efficacy of MemTest through the implementations on two different in-memory databases. 展开更多
关键词 BENCHMARK in-memory database MEMORY
原文传递
A retrospective of knowledge graphs 被引量:35
2
作者 Jihong YAN Chengyu WANG +2 位作者 Wenliang CHENG Ming GAO Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2018年第1期55-74,共20页
Information on the Internet is fragmented and presented in different data sources, which makes automatic knowledge harvesting and understanding formidable for ma- chines, and even for humans. Knowledge graphs have be-... Information on the Internet is fragmented and presented in different data sources, which makes automatic knowledge harvesting and understanding formidable for ma- chines, and even for humans. Knowledge graphs have be- come prevalent in both of industry and academic circles these years, to be one of the most efficient and effective knowledge integration approaches. Techniques for knowledge graph construction can mine information from either structured, semi-structured, or even unstructured data sources, and fi- nally integrate the information into knowledge, represented in a graph. Furthermore, knowledge graph is able to organize information in an easy-to-maintain, easy-to-understand and easy-to-use manner. In this paper, we give a summarization of techniques for constructing knowledge graphs. We review the existing knowledge graph systems developed by both academia and industry. We discuss in detail about the process of building knowledge graphs, and survey state-of-the-art techniques for automatic knowledge graph checking and expansion via log- ical inferring and reasoning. We also review the issues of graph data management by introducing the knowledge data models and graph databases, especially from a NoSQL point of view. Finally, we overview current knowledge graph sys- tems and discuss the future research directions. 展开更多
关键词 knowledge graph knowledge base informationextraction logical reasoning graph database
原文传递
Time-aware conversion prediction 被引量:1
3
作者 Wendi JI Xiaoling WANG Feida ZHU 《Frontiers of Computer Science》 SCIE EI CSCD 2017年第4期702-716,共15页
The importance of product recommendation has been well recognized as a central task in business intelligence for e-commerce websites. Interestingly, what has been less aware of is the fact that different products take... The importance of product recommendation has been well recognized as a central task in business intelligence for e-commerce websites. Interestingly, what has been less aware of is the fact that different products take different time periods for conversion. The "conversion" here refers to actu- ally a more general set of pre-defined actions, including for example purchases or registrations in recommendation and advertising systems. The mismatch between the product's ac- tual conversion period and the application's target conversion period has been the subtle culprit compromising many exist- ing recommendation algorithms. The challenging question: what products should be recom- mended for a given time period to maximize conversion--is what has motivated us in this paper to propose a rank-based time-aware conversion prediction model (rTCP), which con- siders both recommendation relevance and conversion time. We adopt lifetime models in survival analysis to model the conversion time and personalize the temporal prediction by incorporating context information such as user preference. A novel mixture lifetime model is proposed to further accom- modate the complexity of conversion intervals. Experimental results on two real-world data sets illustrate the high good- ness of fit of our proposed model rTCP and demonstrate its effectiveness in time-aware conversion rate prediction for ad- vertising and product recommendation. 展开更多
关键词 conversion time survival analysis product rec-ommendation ADVERTISING
原文传递
MapReduce-based entity matching with multiple blocking functions 被引量:1
4
作者 Cheqing JIN Jie CHEN Huiping LIU 《Frontiers of Computer Science》 SCIE EI CSCD 2017年第5期895-911,共17页
Entity matching that aims at finding some records belonging to the same real-world objects has been studied for decades. In order to avoid verifying every pair of records in a massive data set, a common method, known ... Entity matching that aims at finding some records belonging to the same real-world objects has been studied for decades. In order to avoid verifying every pair of records in a massive data set, a common method, known as the blocking- based method, tends to select a small proportion of record pairs for verification with a far lower cost than O(n2), where n is the size of the data set. Furthermore, executing multiple blocking functions independently is critical since much more matching records can be found in this way, so that the quality of the query result can be improved significantly. It is popular to use the MapReduce (MR) framework to improve the performance and the scalability of some compli- cated queries by running a lot of map (/reduce) tasks in parallel. However, entity matching upon the MapReduce frame- work is non-trivial due to two inevitable challenges: load balancing and pair deduplication. In this paper, we propose a novel solution, called M rEin, to handle these challenges with the support of multiple blocking functions. Although the existing work can deal with load balancing and pair deduplication respectively, it still cannot deal with both challenges at the same time. Theoretical analysis and experimental results upon real and synthetic data sets illustrate the high effectiveness and efficiency of our proposed solutions. 展开更多
关键词 entity matching MAPREDUCE load balancing pair deduplication
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部