In data streams or web scenarios at highly variable and unpredictable rates, a good join algorithm should be able to "hide" the delays by continuing to output join results. The non-blocking algorithms allow some tup...In data streams or web scenarios at highly variable and unpredictable rates, a good join algorithm should be able to "hide" the delays by continuing to output join results. The non-blocking algorithms allow some tuples to be flushed onto disk, with the goal of producing results continuously when data transmission is suspended. But state-of-the-art algorithms have trouble with the constraint of allocated memory. To make better use of memory, a novel non-blocking join algorithm based on hash-merge for improving query response times is proposed. The reduced data structure of in-memory tuples helps to improve memory utility. A replacement selection tree is applied to adjust memory by expanding or shrinking the size of the tree and separates one external join transaction into multi-subtasks. In addition, a cost model to estimate task output rate is proposed to select the in-disk portion that promises to produce the fastest results in the external join stage. Experiments show that the technique, with far less memory, delivers results faster than the three non-blocking join algorithms ( XJoin, HMJ and RPJ ) , with up to almost two-fold improvement in reliable network and one order of magnitude improvement in unreliable network in terms of the number of the reported tuples.展开更多
Background The full lifespan of long-lived trees includes a seedling phase,during which a seed germinates and grows to a size large enough to be measured in forest inventories.Seedling populations are usually studied ...Background The full lifespan of long-lived trees includes a seedling phase,during which a seed germinates and grows to a size large enough to be measured in forest inventories.Seedling populations are usually studied separately from adult trees,and the seedling lifespan,from seed to sapling,is poorly known.In the 50-ha Barro Colorado forest plot,we started intensive censuses of seeds and seedlings in 1994 in order to merge seedling and adult demography and document complete lifespans.Methods In 17 species abundant in seedling censuses,we subdivided populations into six size classes from seed to 1cm dbh,including seeds plus five seedling stages.The smallest seedling class was subdivided by age.Censuses in two consecutive years provided transition matrices describing the probability that a seedling in one stage moved to another one year later.For each species,we averaged the transition matrix across 25 censuses and used it to project the seedling lifespan,from seed until 1cm dbh or death.Results The predicted mean survival rate of seeds to 1cm dbh varied 1000-fold across species,from 2.9×10^(−6)to 4.4×10^(−3);the median was 2.0×10^(−4).The seedling lifespan,or the average time it takes a seed to grow to 1cm dbh,varied across species from 5.1 to 53.1 years,with a median of 20.3 years.In the median species,the 10%fastest-growing seeds would reach 1cm dbh in 9.0 years,and the slowest 10%in 34.6 years.Conclusions Combining seedling results with our previous study of lifespan after 1cm dbh,we estimate that the focal species have full lifespans varying from 41 years in a gap-demanding pioneer to 320 years in one shade-tolerant species.Lifetime demography can contribute precise survival rates and lifespans to forestry models.展开更多
Log-structured merge tree has been adopted by many distributed storage systems. It decomposes a large database into multiple parts: an in?writing part and several read-only ones. Records are firstly written into a mem...Log-structured merge tree has been adopted by many distributed storage systems. It decomposes a large database into multiple parts: an in?writing part and several read-only ones. Records are firstly written into a memoryoptimized structure and then compacted into in-disk struc? tures periodically. It achieves high write throughput. However, it brings side effect that read requests have to go through multiple structures to find the required record. In a distributed database system, different parts of the LSM-tree are stored in distributed fashion. To this end, a server in the query layer has to issues multiple network communications to pull data items from the underlying storage layer. Coming to its rescue, this work proposes a precise data access strategy which includes: an efficient structure with low maintaining overhead designed to test whether a record exists in the in?writing part of the LSM-tree;a lease-based synchronization strategy proposed to maintain consistent copies of the structure on remote query servers. We further prove the technique is capable of working robustly when the LSM-Tree is re?organizing multiple structures in the backend. It is also fault-tolerant, which is able to recover the structures used in data access after node failures happen. Experiments using the YCSB benchmark show that the solution has 6x throughput improvement over existing methods.展开更多
In an asynchronous cooperative editing workflow of a structured document, each of the co-authors receives in the different phases of the editing process, a copy of the document to insert its contribution. For confiden...In an asynchronous cooperative editing workflow of a structured document, each of the co-authors receives in the different phases of the editing process, a copy of the document to insert its contribution. For confidentiality reasons, this copy may be only a partial replica containing only parts of the (global) document which are of demonstrated interest for the considered co-author. Note that some parts may be a demonstrated interest over a co-author;they will therefore be accessible concurrently. When it’s synchronization time (e.g. at the end of an asynchronous editing phase of the process), we want to merge all contributions of all authors in a single document. Due to the asynchronism of edition and to the potential existence of the document parts offering concurrent access, conflicts may arise and make partial replicas unmergeable in their entirety: they are inconsistent, meaning that they contain conflictual parts. The purpose of this paper is to propose a merging approach said by consensus of such partial replicas using tree automata. Specifically, from the partial replicas updates, we build a tree automaton that accepts exactly the consensus documents. These documents are the maximum prefixes containing no conflict of partial replicas merged.展开更多
为提升高速公路合流区交通运行效率及驾乘人员舒适性,在保障安全的前提下,面向人工驾驶车辆(Human Driven Vehicles,HDV)和智能网联车辆(Connected and Autonomous Vehicles,CAV)混行的异质交通流环境,提出高速公路CAV合流次序优化与轨...为提升高速公路合流区交通运行效率及驾乘人员舒适性,在保障安全的前提下,面向人工驾驶车辆(Human Driven Vehicles,HDV)和智能网联车辆(Connected and Autonomous Vehicles,CAV)混行的异质交通流环境,提出高速公路CAV合流次序优化与轨迹规划方法.首先,以车辆通行时间和延误作为合流区交通运行效率表征指标,建立合流次序优化函数,采用并调整蒙特卡洛树搜索(Monte Carlo Tree Search,MCTS)算法,获得最优合流次序;其次,依据合流次序,建立最小化加速度和急动度的CAV合流轨迹规划(Minimize Acceleration and Jerk Trajectory Planning,MAJTP)函数,运用最优控制理论,求解车辆纵向最优轨迹解析解,进而形成高速公路合流区CAV协同控制方法;最后,联合运用SUMO软件和PYTHON库,对本文所提方法进行交通仿真验证.仿真结果表明:在CAV渗透率分别为0.2、0.4、0.6和0.8时,相较于先进先出(First In First Out,FIFO)算法,基于MCTS算法的合流次序优化方法累积延误分别降低5.75%、8.84%、12.24%和11.06%;相较于最小化加速度的车辆轨迹规划(Minimize Acceleration Trajectory Planning,MATP)方法,MAJTP方法平均急动度更趋近于零,驾乘人员舒适性有所提升,验证了方法的有效性.研究成果可为高速公路合流区交通运行管控研究提供理论支持.展开更多
针对大数据环境下基于Can树(canonical order tree)的增量关联规则算法存在树结构空间占用过大、频繁模式挖掘效率不佳以及MapReduce集群并行化性能不足等问题,提出了一种基于粗糙集和归并剪枝方法改进的并行关联规则增量挖掘算法MR-PAR...针对大数据环境下基于Can树(canonical order tree)的增量关联规则算法存在树结构空间占用过大、频繁模式挖掘效率不佳以及MapReduce集群并行化性能不足等问题,提出了一种基于粗糙集和归并剪枝方法改进的并行关联规则增量挖掘算法MR-PARIRM(MapReduce-based parallel association rules incremental mining algorithm using rough set and merge pruning)。首先,设计了一种基于粗糙集的相似项合并策略RS-SIM(rough set based similar item merge)对数据集的相似项进行合并处理,并根据合并后的数据进行Can树构造,从而降低树结构的空间占用;其次,提出了一种归并剪枝策略MPS(merge pruning strategy)对树结构中的传播路径进行修剪合并,通过压缩频繁模式搜索空间来加快频繁项挖掘;最后,通过动态调度策略DSS(dynamic scheduling strategy)对异构式MapReduce集群中的计算任务进行动态调度,实现了负载均衡,有效提升了集群的并行化运算能力。最终的实验仿真结果表明,MR-PARIRM在大数据环境下具有相对较好的性能表现,适用于对大规模数据进行并行化处理。展开更多
基金The National High Technology Research and Development Program of China(No.2007AA01Z309)the National Natural Science Foundation of China(No.60803160,No.60873030)
文摘In data streams or web scenarios at highly variable and unpredictable rates, a good join algorithm should be able to "hide" the delays by continuing to output join results. The non-blocking algorithms allow some tuples to be flushed onto disk, with the goal of producing results continuously when data transmission is suspended. But state-of-the-art algorithms have trouble with the constraint of allocated memory. To make better use of memory, a novel non-blocking join algorithm based on hash-merge for improving query response times is proposed. The reduced data structure of in-memory tuples helps to improve memory utility. A replacement selection tree is applied to adjust memory by expanding or shrinking the size of the tree and separates one external join transaction into multi-subtasks. In addition, a cost model to estimate task output rate is proposed to select the in-disk portion that promises to produce the fastest results in the external join stage. Experiments show that the technique, with far less memory, delivers results faster than the three non-blocking join algorithms ( XJoin, HMJ and RPJ ) , with up to almost two-fold improvement in reliable network and one order of magnitude improvement in unreliable network in terms of the number of the reported tuples.
基金funded by the Environmental Seed Arrival and Interspecific Associations in Seedling Sciences Program of the Smithsonian Institutionthe National Science Foundation (DEB-0075102,DEB-0823728,DEB-0640386,DEB-1242622,DEB-1464389)the Andrew Mellon Foundation,The Ohio State University,and Yale University
文摘Background The full lifespan of long-lived trees includes a seedling phase,during which a seed germinates and grows to a size large enough to be measured in forest inventories.Seedling populations are usually studied separately from adult trees,and the seedling lifespan,from seed to sapling,is poorly known.In the 50-ha Barro Colorado forest plot,we started intensive censuses of seeds and seedlings in 1994 in order to merge seedling and adult demography and document complete lifespans.Methods In 17 species abundant in seedling censuses,we subdivided populations into six size classes from seed to 1cm dbh,including seeds plus five seedling stages.The smallest seedling class was subdivided by age.Censuses in two consecutive years provided transition matrices describing the probability that a seedling in one stage moved to another one year later.For each species,we averaged the transition matrix across 25 censuses and used it to project the seedling lifespan,from seed until 1cm dbh or death.Results The predicted mean survival rate of seeds to 1cm dbh varied 1000-fold across species,from 2.9×10^(−6)to 4.4×10^(−3);the median was 2.0×10^(−4).The seedling lifespan,or the average time it takes a seed to grow to 1cm dbh,varied across species from 5.1 to 53.1 years,with a median of 20.3 years.In the median species,the 10%fastest-growing seeds would reach 1cm dbh in 9.0 years,and the slowest 10%in 34.6 years.Conclusions Combining seedling results with our previous study of lifespan after 1cm dbh,we estimate that the focal species have full lifespans varying from 41 years in a gap-demanding pioneer to 320 years in one shade-tolerant species.Lifetime demography can contribute precise survival rates and lifespans to forestry models.
基金National Hightech R&D Program (2015AA015307)the National Natural Science Foundation of China (Grant Nos. 61702189, 61432006 and 61672232)Youth Science and Technology -“Yang Fan” Program of Shanghai (17YF1427800).
文摘Log-structured merge tree has been adopted by many distributed storage systems. It decomposes a large database into multiple parts: an in?writing part and several read-only ones. Records are firstly written into a memoryoptimized structure and then compacted into in-disk struc? tures periodically. It achieves high write throughput. However, it brings side effect that read requests have to go through multiple structures to find the required record. In a distributed database system, different parts of the LSM-tree are stored in distributed fashion. To this end, a server in the query layer has to issues multiple network communications to pull data items from the underlying storage layer. Coming to its rescue, this work proposes a precise data access strategy which includes: an efficient structure with low maintaining overhead designed to test whether a record exists in the in?writing part of the LSM-tree;a lease-based synchronization strategy proposed to maintain consistent copies of the structure on remote query servers. We further prove the technique is capable of working robustly when the LSM-Tree is re?organizing multiple structures in the backend. It is also fault-tolerant, which is able to recover the structures used in data access after node failures happen. Experiments using the YCSB benchmark show that the solution has 6x throughput improvement over existing methods.
文摘In an asynchronous cooperative editing workflow of a structured document, each of the co-authors receives in the different phases of the editing process, a copy of the document to insert its contribution. For confidentiality reasons, this copy may be only a partial replica containing only parts of the (global) document which are of demonstrated interest for the considered co-author. Note that some parts may be a demonstrated interest over a co-author;they will therefore be accessible concurrently. When it’s synchronization time (e.g. at the end of an asynchronous editing phase of the process), we want to merge all contributions of all authors in a single document. Due to the asynchronism of edition and to the potential existence of the document parts offering concurrent access, conflicts may arise and make partial replicas unmergeable in their entirety: they are inconsistent, meaning that they contain conflictual parts. The purpose of this paper is to propose a merging approach said by consensus of such partial replicas using tree automata. Specifically, from the partial replicas updates, we build a tree automaton that accepts exactly the consensus documents. These documents are the maximum prefixes containing no conflict of partial replicas merged.
文摘为提升高速公路合流区交通运行效率及驾乘人员舒适性,在保障安全的前提下,面向人工驾驶车辆(Human Driven Vehicles,HDV)和智能网联车辆(Connected and Autonomous Vehicles,CAV)混行的异质交通流环境,提出高速公路CAV合流次序优化与轨迹规划方法.首先,以车辆通行时间和延误作为合流区交通运行效率表征指标,建立合流次序优化函数,采用并调整蒙特卡洛树搜索(Monte Carlo Tree Search,MCTS)算法,获得最优合流次序;其次,依据合流次序,建立最小化加速度和急动度的CAV合流轨迹规划(Minimize Acceleration and Jerk Trajectory Planning,MAJTP)函数,运用最优控制理论,求解车辆纵向最优轨迹解析解,进而形成高速公路合流区CAV协同控制方法;最后,联合运用SUMO软件和PYTHON库,对本文所提方法进行交通仿真验证.仿真结果表明:在CAV渗透率分别为0.2、0.4、0.6和0.8时,相较于先进先出(First In First Out,FIFO)算法,基于MCTS算法的合流次序优化方法累积延误分别降低5.75%、8.84%、12.24%和11.06%;相较于最小化加速度的车辆轨迹规划(Minimize Acceleration Trajectory Planning,MATP)方法,MAJTP方法平均急动度更趋近于零,驾乘人员舒适性有所提升,验证了方法的有效性.研究成果可为高速公路合流区交通运行管控研究提供理论支持.
文摘针对大数据环境下基于Can树(canonical order tree)的增量关联规则算法存在树结构空间占用过大、频繁模式挖掘效率不佳以及MapReduce集群并行化性能不足等问题,提出了一种基于粗糙集和归并剪枝方法改进的并行关联规则增量挖掘算法MR-PARIRM(MapReduce-based parallel association rules incremental mining algorithm using rough set and merge pruning)。首先,设计了一种基于粗糙集的相似项合并策略RS-SIM(rough set based similar item merge)对数据集的相似项进行合并处理,并根据合并后的数据进行Can树构造,从而降低树结构的空间占用;其次,提出了一种归并剪枝策略MPS(merge pruning strategy)对树结构中的传播路径进行修剪合并,通过压缩频繁模式搜索空间来加快频繁项挖掘;最后,通过动态调度策略DSS(dynamic scheduling strategy)对异构式MapReduce集群中的计算任务进行动态调度,实现了负载均衡,有效提升了集群的并行化运算能力。最终的实验仿真结果表明,MR-PARIRM在大数据环境下具有相对较好的性能表现,适用于对大规模数据进行并行化处理。