Accurate capacity and State of Charge(SOC)estimation are crucial for ensuring the safety and longevity of lithium-ion batteries in electric vehicles.This study examines ten machine learning architectures,Including Dee...Accurate capacity and State of Charge(SOC)estimation are crucial for ensuring the safety and longevity of lithium-ion batteries in electric vehicles.This study examines ten machine learning architectures,Including Deep Belief Network(DBN),Bidirectional Recurrent Neural Network(BiDirRNN),Gated Recurrent Unit(GRU),and others using the NASA B0005 dataset of 591,458 instances.Results indicate that DBN excels in capacity estimation,achieving orders-of-magnitude lower error values and explaining over 99.97%of the predicted variable’s variance.When computational efficiency is paramount,the Deep Neural Network(DNN)offers a strong alternative,delivering near-competitive accuracy with significantly reduced prediction times.The GRU achieves the best overall performance for SOC estimation,attaining an R^(2) of 0.9999,while the BiDirRNN provides a marginally lower error at a slightly higher computational speed.In contrast,Convolutional Neural Networks(CNN)and Radial Basis Function Networks(RBFN)exhibit relatively high error rates,making them less viable for real-world battery management.Analyses of error distributions reveal that the top-performing models cluster most predictions within tight bounds,limiting the risk of overcharging or deep discharging.These findings highlight the trade-off between accuracy and computational overhead,offering valuable guidance for battery management system(BMS)designers seeking optimal performance under constrained resources.Future work may further explore advanced data augmentation and domain adaptation techniques to enhance these models’robustness in diverse operating conditions.展开更多
Many datasets in E-commerce have rich information about items and users who purchase or rate them. This information can enable advanced machine learning algorithms to extract and assign user sentiments to various aspe...Many datasets in E-commerce have rich information about items and users who purchase or rate them. This information can enable advanced machine learning algorithms to extract and assign user sentiments to various aspects of the items thus leading to more sophisticated and justifiable recommendations. However, most Collaborative Filtering (CF) techniques rely mainly on the overall preferences of users toward items only. And there is lack of conceptual and computational framework that enables an understandable aspect-based AI approach to recommending items to users. In this paper, we propose concepts and computational tools that can sharpen the logic of recommendations and that rely on users’ sentiments along various aspects of items. These concepts include: The sentiment of a user towards a specific aspect of a specific item, the emphasis that a given user places on a specific aspect in general, the popularity and controversy of an aspect among groups of users, clusters of users emphasizing a given aspect, clusters of items that are popular among a group of users and so forth. The framework introduced in this study is developed in terms of user emphasis, aspect popularity, aspect controversy, and users and items similarity. Towards this end, we introduce the Aspect-Based Collaborative Filtering Toolbox (ABCFT), where the tools are all developed based on the three-index sentiment tensor with the indices being the user, item, and aspect. The toolbox computes solutions to the questions alluded to above. We illustrate the methodology using a hotel review dataset having around 6000 users, 400 hotels and 6 aspects.展开更多
Cover ratio of cloud is a very important factor which affects the quality of a satellite image, therefore cloud detection from satellite images is a necessary step in assessing the image quality. The study on cloud de...Cover ratio of cloud is a very important factor which affects the quality of a satellite image, therefore cloud detection from satellite images is a necessary step in assessing the image quality. The study on cloud detection from the visual band of a satellite image is developed. Firstly, we consider the differences between the cloud and ground including high grey level, good continuity of grey level, area of cloud region, and the variance of local fractal dimension (VLFD) of the cloud region. A single cloud region detection method is proposed. Secondly, by introducing a reference satellite image and by comparing the variance in the dimensions corresponding to the reference and the tested images, a method that detects multiple cloud regions and determines whether or not the cloud exists in an image is described. By using several Ikonos images, the performance of the proposed method is demonstrated.展开更多
Lack of temperature sensation of myoelectric prosthetic hand limits the daily activities of amputees.To this end,a noninvasive temperature sensation method is proposed to train amputees to sense temperature with psych...Lack of temperature sensation of myoelectric prosthetic hand limits the daily activities of amputees.To this end,a noninvasive temperature sensation method is proposed to train amputees to sense temperature with psychophysical sensory substitution.In this study,22 healthy participants took part besides 5 amputee participants.The duration time of the study was 31 days with five test steps according to the Leitner technique.An adjustable temperature mug and a Peltier were used to change the temperature of the water/phantom digits to induce temperature to participants.Also,to isolate the surroundings and show colors,a Virtual Reality(VR)glass was employed.The statistical results conducted are based on the response of participants with questionnaire method.Using Chi-square tests,it is concluded that participants answer the experiment significantly correctly using the Leitner technique(P value<0.05).Also,by applying the“Repeated Measures ANOVA”,it is noticed that the time of numbness felt by participants had significant(P value<0.001)difference.Participants could remember lowest and highest temperatures significantly better than other temperatures(P value<0.001);furthermore,the well-trained amputee participant practically using the prosthesis with 72.58%could identify object’s temperature with only once time experimenting the color temperature.展开更多
This study introduces a novel single-layer meshless method,the space-time collocation method based on multiquadric-radial basis functions(MQ-RBF),for solving the Benjamin-Bona-Mahony-Burgers(BBMB)equation.By reconstru...This study introduces a novel single-layer meshless method,the space-time collocation method based on multiquadric-radial basis functions(MQ-RBF),for solving the Benjamin-Bona-Mahony-Burgers(BBMB)equation.By reconstructing the time variable as a space variable,this method establishes a combined space-time structure that can eliminate the two-step computational process required in traditional grid methods.By introducing shape parameteroptimized MQ-RBF,high-precision discretization of the nonlinear,dispersive,and dissipative terms in the BBMB equation is achieved.The numerical experiment section validates the effectiveness of the proposed method through three benchmark examples.This method shows significant advantages in computational efficiency,providing a new numerical tool for engineering applications in fields such as shallow water wave dynamics.展开更多
Spatial Data Intelligence(SDI)encompasses acquiring,storing,analyzing,mining,and visualizing spatial data to gain insights into the physical world and uncover valuable knowledge.These understandings and knowledge play...Spatial Data Intelligence(SDI)encompasses acquiring,storing,analyzing,mining,and visualizing spatial data to gain insights into the physical world and uncover valuable knowledge.These understandings and knowledge play a crucial role in connecting physical and virtual realms,such as in developing a City Metaverse(CM)aimed at enhancing and optimizing modern urban environments.The advancement of CM holds immense potential to benefit urban dwellers,making research on SDI an increasingly prominent area of focus.This paper contributes significantly by organizing the relevant research and technologies within a coherent framework.Firstly,we identify SDI technologies capable of collecting real-world information to construct a virtual CM.Subsequently,we delve into the technologies that can be compositely integrated with SDI to facilitate interaction with and management of actual cities from the virtual perspective.Additionally,we emphasize the effectiveness and potential of these methods in practical applications.Lastly,we conclude our survey by discussing emerging challenges associated with technological progress,the industrial chain,legal and regulatory aspects,and ethical and moral considerations.展开更多
Software engineering has been embraced by almost all industries to promote work efficiency,improve user experience or cut cost.In line with this,the education on software engineering should be made more adaptable to m...Software engineering has been embraced by almost all industries to promote work efficiency,improve user experience or cut cost.In line with this,the education on software engineering should be made more adaptable to meet the needs of industries.Industry-university-research(IUR)collaboration project,which was initially designed to reinforce the association between universities and enterprises,brought added value to this end.In this paper,an IUR collaboration project on tele-rehabilitation is presented as an example for education practice,where emphasis is laid on the ways of analyzing users’needs,converting users’needs to infrastructure design,decomposing a project into tasks,etc.The project had been used as both student assignments and case studies in software engineering courses,where students were motivated to deal with real medical problems from an engineering perspective.It was shown that by introducing the IUR collaboration project,it helped the students to build up engineering-oriented mindset besides improving their R&D ability on software engineering.展开更多
State machine replication has been widely used in modern cluster-based database systems.Most commonly deployed configurations adopt the Raft-like consensus protocol,which has a single strong leader which replicates th...State machine replication has been widely used in modern cluster-based database systems.Most commonly deployed configurations adopt the Raft-like consensus protocol,which has a single strong leader which replicates the log to other followers.Since the followers can handle read requests and many real workloads are usually read-intensive,the recovery speed of a crashed follower may significantly impact on the throughput.Different from traditional database recovery,the recovering follower needs to repair its local log first.Original Raft protocol takes many network round trips to do log comparison between leader and the crashed follower.To reduce network round trips,an optimization method is to truncate the follower’s uncertain log entries behind the latest local commit point,and then to directly fetch all committed log entries from the leader in one round trip.However,if the commit point is not persisted,the recovering follower has to get the whole log from the leader.In this paper,we propose an accurate and efficient log repair(AELR)algorithm for follower recovery.AELR is more robust and resilient to follower failure,and it only needs one network round trip to fetch the least number of log entries for follower recovery.This approach is implemented in the open source database system OceanBase.We experimentally show that the system adopting AELR has a good performance in terms of recovery time.展开更多
Filling techniques are often used in the restoration of images.Yet the existing filling technique approaches either have high computational costs or present problems such as filling holes redundantly.This paper propos...Filling techniques are often used in the restoration of images.Yet the existing filling technique approaches either have high computational costs or present problems such as filling holes redundantly.This paper proposes a novel algorithm for filling holes and regions of the images.The proposed algorithm combines the advantages of both the parity-check filling approach and the region-growing inpainting technique.Pairing points of the region’s boundary are used to search and to fill the region.The scanning range of the filling method is within the target regions.The proposed method does not require additional working memory or assistant colors,and it can correctly fill any complex contours.Experimental results show that,compared to other approaches,the proposed algorithm fills regions faster and with lower computational cost.展开更多
The modern in-memory database(IMDB)can support highly concurrent on-line transaction processing(OLTP)workloads and generate massive transactional logs per second.Quorum-based replication protocols such as Paxos or Raf...The modern in-memory database(IMDB)can support highly concurrent on-line transaction processing(OLTP)workloads and generate massive transactional logs per second.Quorum-based replication protocols such as Paxos or Raft have been widely used in the distributed databases to offer higher availability and fault-tolerance.However,it is non-trivial to replicate IMDB because high transaction rate has brought new challenges.First,the leader node in quorum replication should have adaptivity by considering various transaction arrival rates and the processing capability of follower nodes.Second,followers are required to replay logs to catch up the state of the leader in the highly concurrent setting to reduce visibility gap.Third,modern databases are often built with a cluster of commodity machines connected by low configuration networks,in which the network anomalies often happen.In this case,the performance would be significantly affected because the follower node falls into the long-duration exception handling process(e.g.,fetch lost logs from the leader).To this end,we build QuorumX,an efficient and stable quorum-based replication framework for IMDB under heavy OLTP workloads.QuorumX combines critical path based batching and pipeline batching to provide an adaptive log propagation scheme to obtain a stable and high performance at various settings.Further,we propose a safe and coordination-free log replay scheme to minimize the visibility gap between the leader and follower IMDBs.We further carefully design the process for the follower node in order to alleviate the influence of the unreliable network on the replication performance.Our evaluation results with the YCSB,TPC-C and a realistic microbenchmark demonstrate that QuorumX achieves the performance close to asynchronous primary-backup replication and could always provide a stable service with data consistency and a low-level visibility gap.展开更多
Massive scale of transactions with critical requirements become popular for emerging businesses,especially in E-commerce.One of the most representative applications is the promotional event running on Alibaba's pl...Massive scale of transactions with critical requirements become popular for emerging businesses,especially in E-commerce.One of the most representative applications is the promotional event running on Alibaba's platform on some special dates,widely expected by global customers.Although we have achieved significant progress in improving the scalability of transactional database systems(OLTP),the presence of contention operations in workloads is still one of the fundamental obstacles to performance improving.The reason is that the overhead of managing conflict transactions with concurrency control mechanisms is proportional to the amount of contentions.As a consequence,generating contented workloads is urgent to evaluate performance of modern OLTP database systems.Though we have kinds of standard benchmarks which provide some ways in simulating contentions,e.g.,skew distribution control of transactions,they can not control the generation of contention quantitatively;even worse,the simulation effectiveness of these methods is affected by the scale of data.So in this paper we design a scalable quantitative contention generation method with fine contention granularity control.We conduct a comprehensive set of experiments on popular opensourced DBMSs compared with the latest contention simulation method to demonstrate the effectiveness of our generation work.展开更多
In this paper,a simple direct space-time semi-analytical meshless scheme is proposed for the numerical approximation of the coupled Burgers'equations.During the whole solution procedure,two different schemes are c...In this paper,a simple direct space-time semi-analytical meshless scheme is proposed for the numerical approximation of the coupled Burgers'equations.During the whole solution procedure,two different schemes are considered in terms of radial and non-radial basis functions.The time-dependent variable in the first radial scheme is directly considered as the normal space variables to formulate an"isotropic"space-time radial basis function.The second non-radial scheme considered relationship between time-dependent and spacedependent variables.Under such circumstance,we can get a one-step space-time meshless scheme.The numerical findings demonstrate that the proposed meshless schemes are precise,user-friendly,and effective in solving the coupled Burgers'equations.展开更多
Instruction fine-tuning is a key method for adapting large language models(LLMs)to domain-specific tasks,and instruction quality significantly impacts model performance after fine-tuning.Hence,evaluating the quality o...Instruction fine-tuning is a key method for adapting large language models(LLMs)to domain-specific tasks,and instruction quality significantly impacts model performance after fine-tuning.Hence,evaluating the quality of instruction and selecting high-quality instructions are essential steps in the process of LLM instruction fine-tuning.Although existing studies provide important theoretical foundations and techniques for this,there is still room for improvement in terms of generality,the relationship between methods and experimental verification.Current methods for evaluating instruction quality can be classified into four main categories:human evaluation,statistics-based evaluation,model-based evaluation,and LLMs-based evaluation.Among these methods,human evaluation relies on the subjective judgment and domain expertise of the evaluators,which offers interpretability and is suitable for scenarios involving small-scale data and sufficient budgets.Statistics-based evaluation estimates the quality of instructions using indicators such as stopwords and lexical diversity,providing high efficiency and a suitable evaluation for large-scale data.Model-based evaluation employs specific models to quantify indicators such as perplexity(PPL)and instruction following difficulty(IFD),which is flexible and suitable for specific tasks.The LLMs-based evaluation rates the quality of instructions through prompt-based interaction with LLMs,focusing on aspects such as accuracy and coherence,which is highly automated and customizable,simplifying the evaluation process.Finally,considering the limitations of current quality evaluation methods,some future research directions are proposed for improvement.These include refining instruction categories,extending evaluation indicators,enhancing human-AI interaction evaluation method,applying agents in instruction quality evaluation,and developing a comprehensive evaluation framework.展开更多
This investigation is focused on conducting a thorough analysis of Municipal Solid Waste Management (MSWM). MSWM encompasses a range of interdisciplinary measures that govern the various stages involved in managing un...This investigation is focused on conducting a thorough analysis of Municipal Solid Waste Management (MSWM). MSWM encompasses a range of interdisciplinary measures that govern the various stages involved in managing unwanted or non-utilizable solid materials, commonly known as rubbish, trash, junk, refuse, and garbage. These stages include generation, storage, collection, recycling, transportation, handling, disposal, and monitoring. The waste materials mentioned in this context exhibit a wide range of items, such as organic waste from food and vegetables, paper, plastic, polyethylene, iron, tin cans, deceased animals, byproducts from demolition activities, manure, and various other discarded materials. This study aims to provide insights into the possibilities of enhancing solid waste management in the Farmgate area of Dhaka North City Corporation (DNCC). To accomplish this objective, the research examines the conventional waste management methods employed in this area. It conducts extensive field surveys, collecting valuable data through interviews with local residents and key individuals involved in waste management, such as waste collectors, dealers, intermediate dealers, recyclers, and shopkeepers. The results indicate that significant amounts of distinct waste categories are produced daily. These include food and vegetable waste, which amount to 52.1 tons/day;polythene and plastic, which total 4.5 tons/day;metal and tin-can waste, which amounts to 1.4 tons/day;and paper waste, which totals 5.9 tons/day. This study highlights the significance of promoting environmental consciousness to effectively shape the attitudes of urban residents toward waste disposal and management. It emphasizes the need for collaboration between authorities and researchers to improve the current waste management system.展开更多
1 Introduction With rapid development in computing power and breakthroughs in deep learning,the concept of“foundation models”has been introduced into the AI community.Generally,foundation models are large models tra...1 Introduction With rapid development in computing power and breakthroughs in deep learning,the concept of“foundation models”has been introduced into the AI community.Generally,foundation models are large models trained on massive data and can be easily adapted to different domains for various tasks.With specific prompts,foundation models can generate texts and images,or even animate scenarios based on the given descriptions.Due to powerful capabilities,there is a growing trend to build agents based on foundation models.In this paper,we conduct an investigation into agents empowered by the foundation models.展开更多
Road damage detection is an important aspect of road maintenance.Traditional manual inspections are laborious and imprecise.With the rise of deep learning technology,pavement detection methods employing deep neural ne...Road damage detection is an important aspect of road maintenance.Traditional manual inspections are laborious and imprecise.With the rise of deep learning technology,pavement detection methods employing deep neural networks give an efficient and accurate solution.However,due to background diversity,limited resolution,and fracture similarity,it is tough to detect road cracks with high accuracy.In this study,we offer a unique,efficient and accurate road crack damage detection,namely YOLOv8-ES.We present a novel dynamic convolutional layer(EDCM)that successfully increases the feature extraction capabilities for small fractures.At the same time,we also present a new attention mechanism(SGAM).It can effectively retain crucial information and increase the network feature extraction capacity.The Wise-IoU technique contains a dynamic,non-monotonic focusing mechanism designed to return to the goal-bounding box more precisely,especially for low-quality samples.We validate our method on both RDD2022 and VOC2007 datasets.The experimental results suggest that YOLOv8-ES performs well.This unique approach provides great support for the development of intelligent road maintenance systems and is projected to achieve further advances in future applications.展开更多
Log-structured merge tree has been adopted by many distributed storage systems. It decomposes a large database into multiple parts: an in?writing part and several read-only ones. Records are firstly written into a mem...Log-structured merge tree has been adopted by many distributed storage systems. It decomposes a large database into multiple parts: an in?writing part and several read-only ones. Records are firstly written into a memoryoptimized structure and then compacted into in-disk struc? tures periodically. It achieves high write throughput. However, it brings side effect that read requests have to go through multiple structures to find the required record. In a distributed database system, different parts of the LSM-tree are stored in distributed fashion. To this end, a server in the query layer has to issues multiple network communications to pull data items from the underlying storage layer. Coming to its rescue, this work proposes a precise data access strategy which includes: an efficient structure with low maintaining overhead designed to test whether a record exists in the in?writing part of the LSM-tree;a lease-based synchronization strategy proposed to maintain consistent copies of the structure on remote query servers. We further prove the technique is capable of working robustly when the LSM-Tree is re?organizing multiple structures in the backend. It is also fault-tolerant, which is able to recover the structures used in data access after node failures happen. Experiments using the YCSB benchmark show that the solution has 6x throughput improvement over existing methods.展开更多
Entity alignment is the problem of identifying which entities in a data source refer to the same real-world entity in the others.Identifying entities across heterogeneous data sources is paramount to many research fie...Entity alignment is the problem of identifying which entities in a data source refer to the same real-world entity in the others.Identifying entities across heterogeneous data sources is paramount to many research fields,such as data cleaning,data integration,.information retrieval and machine learning.The aligning process is not only overwhelmingly expensive for large data sources since it involves all tuples from two or more data sources,but also need to handle heterogeneous entity attributes.In this paper,we propose an unsupervised approach,called EnAli,to match entities across two or more heterogeneous data sources.EnAli employs a generative probabilistic model to incorporate the heterogeneous entity attributes via employing exponential family,handle missing values,and also utilize the locality sensitive hashing schema to reduce the candidate tuples and speed up the aligning process.EnAli is highly accurate and efficient even without any ground-truth tuples.We illustrate the performance of EnAli on re-identifying entities from the same data source,as well as aligning entities across three real data sources.Our experimental results manifest that our proposed approach outperforms the comparable baseline.展开更多
Temporal information is pervasive and crucial in medical records and other clinical text,as it formulates the development process of medical conditions and is vital for clinical decision making.However,providing a hol...Temporal information is pervasive and crucial in medical records and other clinical text,as it formulates the development process of medical conditions and is vital for clinical decision making.However,providing a holistic knowledge representation and reasoning framework for various time expressions in the clinical text is challenging.In order to capture complex temporal semantics in clinical text,we propose a novel Clinical Time Ontology(CTO)as an extension from OWL framework.More specifically,we identified eight timerelated problems in clinical text and created 11 core temporal classes to conceptualize the fuzzy time,cyclic time,irregular time,negations and other complex aspects of clinical time.Then,we extended Allen’s and TEO’s temporal relations and defined the relation concept description between complex and simple time.Simultaneously,we provided a formulaic and graphical presentation of complex time and complex time relationships.We carried out empirical study on the expressiveness and usability of CTO using real-world healthcare datasets.Finally,experiment results demonstrate that CTO could faithfully represent and reason over 93%of the temporal expressions,and it can cover a wider range of time-related classes in clinical domain.展开更多
Dropping fractions of users or items judiciously can reduce the computational cost of Collaborative Filtering(CF)algorithms.The effect of this subsampling on the computing time and accuracy of CF is not fully understo...Dropping fractions of users or items judiciously can reduce the computational cost of Collaborative Filtering(CF)algorithms.The effect of this subsampling on the computing time and accuracy of CF is not fully understood,and clear guidelines for selecting optimal or even appropriate subsampling levels are not available.In this paper,we present a Density-based Random Stratified Subsampling using Clustering(DRSC)algorithm in which the desired Fraction of Users Dropped(FUD)and Fraction of Items Dropped(FID)are specified,and the overall density during subsampling is maintained.Subsequently,we develop simple models of the Training Time Improvement(TTI)and the Accuracy Loss(AL)as functions of FUD and FID,based on extensive simulations of seven standard CF algorithms as applied to various primary matrices from MovieLens,Yahoo Music Rating,and Amazon Automotive data.Simulations show that both TTI and a scaled AL are bi-linear in FID and FUD for all seven methods.The TTI linear regression of a CF method appears to be same for all datasets.Extensive simulations illustrate that TTI can be estimated reliably with FUD and FID only,but AL requires considering additional dataset characteristics.The derived models are then used to optimize the levels of subsampling addressing the tradeoff between TTI and AL.A simple sub-optimal approximation was found,in which the optimal AL is proportional to the optimal Training Time Reduction Factor(TTRF)for higher values of TTRF,and the optimal subsampling levels,like optimal FID/(1-FID),are proportional to the square root of TTRF.展开更多
文摘Accurate capacity and State of Charge(SOC)estimation are crucial for ensuring the safety and longevity of lithium-ion batteries in electric vehicles.This study examines ten machine learning architectures,Including Deep Belief Network(DBN),Bidirectional Recurrent Neural Network(BiDirRNN),Gated Recurrent Unit(GRU),and others using the NASA B0005 dataset of 591,458 instances.Results indicate that DBN excels in capacity estimation,achieving orders-of-magnitude lower error values and explaining over 99.97%of the predicted variable’s variance.When computational efficiency is paramount,the Deep Neural Network(DNN)offers a strong alternative,delivering near-competitive accuracy with significantly reduced prediction times.The GRU achieves the best overall performance for SOC estimation,attaining an R^(2) of 0.9999,while the BiDirRNN provides a marginally lower error at a slightly higher computational speed.In contrast,Convolutional Neural Networks(CNN)and Radial Basis Function Networks(RBFN)exhibit relatively high error rates,making them less viable for real-world battery management.Analyses of error distributions reveal that the top-performing models cluster most predictions within tight bounds,limiting the risk of overcharging or deep discharging.These findings highlight the trade-off between accuracy and computational overhead,offering valuable guidance for battery management system(BMS)designers seeking optimal performance under constrained resources.Future work may further explore advanced data augmentation and domain adaptation techniques to enhance these models’robustness in diverse operating conditions.
文摘Many datasets in E-commerce have rich information about items and users who purchase or rate them. This information can enable advanced machine learning algorithms to extract and assign user sentiments to various aspects of the items thus leading to more sophisticated and justifiable recommendations. However, most Collaborative Filtering (CF) techniques rely mainly on the overall preferences of users toward items only. And there is lack of conceptual and computational framework that enables an understandable aspect-based AI approach to recommending items to users. In this paper, we propose concepts and computational tools that can sharpen the logic of recommendations and that rely on users’ sentiments along various aspects of items. These concepts include: The sentiment of a user towards a specific aspect of a specific item, the emphasis that a given user places on a specific aspect in general, the popularity and controversy of an aspect among groups of users, clusters of users emphasizing a given aspect, clusters of items that are popular among a group of users and so forth. The framework introduced in this study is developed in terms of user emphasis, aspect popularity, aspect controversy, and users and items similarity. Towards this end, we introduce the Aspect-Based Collaborative Filtering Toolbox (ABCFT), where the tools are all developed based on the three-index sentiment tensor with the indices being the user, item, and aspect. The toolbox computes solutions to the questions alluded to above. We illustrate the methodology using a hotel review dataset having around 6000 users, 400 hotels and 6 aspects.
基金supported by the National Natural Science Foundation of China(61702385)the Key Projects of National Social Science Foundation of China(11&ZD189)
文摘Cover ratio of cloud is a very important factor which affects the quality of a satellite image, therefore cloud detection from satellite images is a necessary step in assessing the image quality. The study on cloud detection from the visual band of a satellite image is developed. Firstly, we consider the differences between the cloud and ground including high grey level, good continuity of grey level, area of cloud region, and the variance of local fractal dimension (VLFD) of the cloud region. A single cloud region detection method is proposed. Secondly, by introducing a reference satellite image and by comparing the variance in the dimensions corresponding to the reference and the tested images, a method that detects multiple cloud regions and determines whether or not the cloud exists in an image is described. By using several Ikonos images, the performance of the proposed method is demonstrated.
基金supported by National Key Research and Development Program of China(2017YFC0822204)National Natural Science Foundation of China(NSFC)(51935010)+1 种基金Beijing Municipal Natural Science Foundation(LI92001)Tsing-hua University Initiative Scientific Research Program(20197010009).
文摘Lack of temperature sensation of myoelectric prosthetic hand limits the daily activities of amputees.To this end,a noninvasive temperature sensation method is proposed to train amputees to sense temperature with psychophysical sensory substitution.In this study,22 healthy participants took part besides 5 amputee participants.The duration time of the study was 31 days with five test steps according to the Leitner technique.An adjustable temperature mug and a Peltier were used to change the temperature of the water/phantom digits to induce temperature to participants.Also,to isolate the surroundings and show colors,a Virtual Reality(VR)glass was employed.The statistical results conducted are based on the response of participants with questionnaire method.Using Chi-square tests,it is concluded that participants answer the experiment significantly correctly using the Leitner technique(P value<0.05).Also,by applying the“Repeated Measures ANOVA”,it is noticed that the time of numbness felt by participants had significant(P value<0.001)difference.Participants could remember lowest and highest temperatures significantly better than other temperatures(P value<0.001);furthermore,the well-trained amputee participant practically using the prosthesis with 72.58%could identify object’s temperature with only once time experimenting the color temperature.
基金supported by the Horizontal Scientific Research Funds in Huaibei Normal University(No.2024340603000006)the Science and Technology General Project of Jiangxi Provincial Department of Education(Nos.GJJ2203203,GJJ2203213)。
文摘This study introduces a novel single-layer meshless method,the space-time collocation method based on multiquadric-radial basis functions(MQ-RBF),for solving the Benjamin-Bona-Mahony-Burgers(BBMB)equation.By reconstructing the time variable as a space variable,this method establishes a combined space-time structure that can eliminate the two-step computational process required in traditional grid methods.By introducing shape parameteroptimized MQ-RBF,high-precision discretization of the nonlinear,dispersive,and dissipative terms in the BBMB equation is achieved.The numerical experiment section validates the effectiveness of the proposed method through three benchmark examples.This method shows significant advantages in computational efficiency,providing a new numerical tool for engineering applications in fields such as shallow water wave dynamics.
文摘Spatial Data Intelligence(SDI)encompasses acquiring,storing,analyzing,mining,and visualizing spatial data to gain insights into the physical world and uncover valuable knowledge.These understandings and knowledge play a crucial role in connecting physical and virtual realms,such as in developing a City Metaverse(CM)aimed at enhancing and optimizing modern urban environments.The advancement of CM holds immense potential to benefit urban dwellers,making research on SDI an increasingly prominent area of focus.This paper contributes significantly by organizing the relevant research and technologies within a coherent framework.Firstly,we identify SDI technologies capable of collecting real-world information to construct a virtual CM.Subsequently,we delve into the technologies that can be compositely integrated with SDI to facilitate interaction with and management of actual cities from the virtual perspective.Additionally,we emphasize the effectiveness and potential of these methods in practical applications.Lastly,we conclude our survey by discussing emerging challenges associated with technological progress,the industrial chain,legal and regulatory aspects,and ethical and moral considerations.
基金supported in part by Joint Education Base Project for Postgraduates of Guangdong(866[2024]1-032)Joint Education Project for Postgraduates of Foshan Base(2023FCXM004)+1 种基金Teaching Reformation Projects of South China Normal University([2023]71,027,039,099,191)Selected Projects of the“Challenge-Based Leadership”Action Plan([2025]6,19,20,21).
文摘Software engineering has been embraced by almost all industries to promote work efficiency,improve user experience or cut cost.In line with this,the education on software engineering should be made more adaptable to meet the needs of industries.Industry-university-research(IUR)collaboration project,which was initially designed to reinforce the association between universities and enterprises,brought added value to this end.In this paper,an IUR collaboration project on tele-rehabilitation is presented as an example for education practice,where emphasis is laid on the ways of analyzing users’needs,converting users’needs to infrastructure design,decomposing a project into tasks,etc.The project had been used as both student assignments and case studies in software engineering courses,where students were motivated to deal with real medical problems from an engineering perspective.It was shown that by introducing the IUR collaboration project,it helped the students to build up engineering-oriented mindset besides improving their R&D ability on software engineering.
基金This research was supported in part by National Key R&D Program of China(2018YFB1003303)the National Natural Science Foundation of China(Grant Nos.61432006,61732014 and 61972149).
文摘State machine replication has been widely used in modern cluster-based database systems.Most commonly deployed configurations adopt the Raft-like consensus protocol,which has a single strong leader which replicates the log to other followers.Since the followers can handle read requests and many real workloads are usually read-intensive,the recovery speed of a crashed follower may significantly impact on the throughput.Different from traditional database recovery,the recovering follower needs to repair its local log first.Original Raft protocol takes many network round trips to do log comparison between leader and the crashed follower.To reduce network round trips,an optimization method is to truncate the follower’s uncertain log entries behind the latest local commit point,and then to directly fetch all committed log entries from the leader in one round trip.However,if the commit point is not persisted,the recovering follower has to get the whole log from the leader.In this paper,we propose an accurate and efficient log repair(AELR)algorithm for follower recovery.AELR is more robust and resilient to follower failure,and it only needs one network round trip to fetch the least number of log entries for follower recovery.This approach is implemented in the open source database system OceanBase.We experimentally show that the system adopting AELR has a good performance in terms of recovery time.
基金The research is jointly supported by the National Natural Science Foundation of China No.61561035by Ukrainian government project No.0117U007177the Slovak Research and Development Agency project number APVV-18-0214.
文摘Filling techniques are often used in the restoration of images.Yet the existing filling technique approaches either have high computational costs or present problems such as filling holes redundantly.This paper proposes a novel algorithm for filling holes and regions of the images.The proposed algorithm combines the advantages of both the parity-check filling approach and the region-growing inpainting technique.Pairing points of the region’s boundary are used to search and to fill the region.The scanning range of the filling method is within the target regions.The proposed method does not require additional working memory or assistant colors,and it can correctly fill any complex contours.Experimental results show that,compared to other approaches,the proposed algorithm fills regions faster and with lower computational cost.
基金This work was partially supported by National Key R&D Program of China(2018YFB1003404)NSFC(Grant Nos.61972149,61977026)ECNU Academic Innovation Promotion Program for Excellent Doctoral Students.
文摘The modern in-memory database(IMDB)can support highly concurrent on-line transaction processing(OLTP)workloads and generate massive transactional logs per second.Quorum-based replication protocols such as Paxos or Raft have been widely used in the distributed databases to offer higher availability and fault-tolerance.However,it is non-trivial to replicate IMDB because high transaction rate has brought new challenges.First,the leader node in quorum replication should have adaptivity by considering various transaction arrival rates and the processing capability of follower nodes.Second,followers are required to replay logs to catch up the state of the leader in the highly concurrent setting to reduce visibility gap.Third,modern databases are often built with a cluster of commodity machines connected by low configuration networks,in which the network anomalies often happen.In this case,the performance would be significantly affected because the follower node falls into the long-duration exception handling process(e.g.,fetch lost logs from the leader).To this end,we build QuorumX,an efficient and stable quorum-based replication framework for IMDB under heavy OLTP workloads.QuorumX combines critical path based batching and pipeline batching to provide an adaptive log propagation scheme to obtain a stable and high performance at various settings.Further,we propose a safe and coordination-free log replay scheme to minimize the visibility gap between the leader and follower IMDBs.We further carefully design the process for the follower node in order to alleviate the influence of the unreliable network on the replication performance.Our evaluation results with the YCSB,TPC-C and a realistic microbenchmark demonstrate that QuorumX achieves the performance close to asynchronous primary-backup replication and could always provide a stable service with data consistency and a low-level visibility gap.
基金supported by the National Natural Science Foundation of China(Grant No.62072179)ECNUOceanBase Joint Lab of Distributed Database System and 2020 the Key Software Adaptation and Verification Project(Database).
文摘Massive scale of transactions with critical requirements become popular for emerging businesses,especially in E-commerce.One of the most representative applications is the promotional event running on Alibaba's platform on some special dates,widely expected by global customers.Although we have achieved significant progress in improving the scalability of transactional database systems(OLTP),the presence of contention operations in workloads is still one of the fundamental obstacles to performance improving.The reason is that the overhead of managing conflict transactions with concurrency control mechanisms is proportional to the amount of contentions.As a consequence,generating contented workloads is urgent to evaluate performance of modern OLTP database systems.Though we have kinds of standard benchmarks which provide some ways in simulating contentions,e.g.,skew distribution control of transactions,they can not control the generation of contention quantitatively;even worse,the simulation effectiveness of these methods is affected by the scale of data.So in this paper we design a scalable quantitative contention generation method with fine contention granularity control.We conduct a comprehensive set of experiments on popular opensourced DBMSs compared with the latest contention simulation method to demonstrate the effectiveness of our generation work.
基金the Science and Technology Research Project of Henan Province (242102231052)the Key Scientific Research Plan of Colleges and Universities in Henan Province (23B140006)the Natural Science Foundation of Jiangxi Province (20224BAB201018)。
文摘In this paper,a simple direct space-time semi-analytical meshless scheme is proposed for the numerical approximation of the coupled Burgers'equations.During the whole solution procedure,two different schemes are considered in terms of radial and non-radial basis functions.The time-dependent variable in the first radial scheme is directly considered as the normal space variables to formulate an"isotropic"space-time radial basis function.The second non-radial scheme considered relationship between time-dependent and spacedependent variables.Under such circumstance,we can get a one-step space-time meshless scheme.The numerical findings demonstrate that the proposed meshless schemes are precise,user-friendly,and effective in solving the coupled Burgers'equations.
基金supported by National Natural Science Foundation of China(No.62261023)National Natural Science Foundation of China(No.U1836118)Science and Technology Innovation 2030“New Generation of Artificial Intelligence”(2020AAA0108501).
文摘Instruction fine-tuning is a key method for adapting large language models(LLMs)to domain-specific tasks,and instruction quality significantly impacts model performance after fine-tuning.Hence,evaluating the quality of instruction and selecting high-quality instructions are essential steps in the process of LLM instruction fine-tuning.Although existing studies provide important theoretical foundations and techniques for this,there is still room for improvement in terms of generality,the relationship between methods and experimental verification.Current methods for evaluating instruction quality can be classified into four main categories:human evaluation,statistics-based evaluation,model-based evaluation,and LLMs-based evaluation.Among these methods,human evaluation relies on the subjective judgment and domain expertise of the evaluators,which offers interpretability and is suitable for scenarios involving small-scale data and sufficient budgets.Statistics-based evaluation estimates the quality of instructions using indicators such as stopwords and lexical diversity,providing high efficiency and a suitable evaluation for large-scale data.Model-based evaluation employs specific models to quantify indicators such as perplexity(PPL)and instruction following difficulty(IFD),which is flexible and suitable for specific tasks.The LLMs-based evaluation rates the quality of instructions through prompt-based interaction with LLMs,focusing on aspects such as accuracy and coherence,which is highly automated and customizable,simplifying the evaluation process.Finally,considering the limitations of current quality evaluation methods,some future research directions are proposed for improvement.These include refining instruction categories,extending evaluation indicators,enhancing human-AI interaction evaluation method,applying agents in instruction quality evaluation,and developing a comprehensive evaluation framework.
文摘This investigation is focused on conducting a thorough analysis of Municipal Solid Waste Management (MSWM). MSWM encompasses a range of interdisciplinary measures that govern the various stages involved in managing unwanted or non-utilizable solid materials, commonly known as rubbish, trash, junk, refuse, and garbage. These stages include generation, storage, collection, recycling, transportation, handling, disposal, and monitoring. The waste materials mentioned in this context exhibit a wide range of items, such as organic waste from food and vegetables, paper, plastic, polyethylene, iron, tin cans, deceased animals, byproducts from demolition activities, manure, and various other discarded materials. This study aims to provide insights into the possibilities of enhancing solid waste management in the Farmgate area of Dhaka North City Corporation (DNCC). To accomplish this objective, the research examines the conventional waste management methods employed in this area. It conducts extensive field surveys, collecting valuable data through interviews with local residents and key individuals involved in waste management, such as waste collectors, dealers, intermediate dealers, recyclers, and shopkeepers. The results indicate that significant amounts of distinct waste categories are produced daily. These include food and vegetable waste, which amount to 52.1 tons/day;polythene and plastic, which total 4.5 tons/day;metal and tin-can waste, which amounts to 1.4 tons/day;and paper waste, which totals 5.9 tons/day. This study highlights the significance of promoting environmental consciousness to effectively shape the attitudes of urban residents toward waste disposal and management. It emphasizes the need for collaboration between authorities and researchers to improve the current waste management system.
文摘1 Introduction With rapid development in computing power and breakthroughs in deep learning,the concept of“foundation models”has been introduced into the AI community.Generally,foundation models are large models trained on massive data and can be easily adapted to different domains for various tasks.With specific prompts,foundation models can generate texts and images,or even animate scenarios based on the given descriptions.Due to powerful capabilities,there is a growing trend to build agents based on foundation models.In this paper,we conduct an investigation into agents empowered by the foundation models.
文摘Road damage detection is an important aspect of road maintenance.Traditional manual inspections are laborious and imprecise.With the rise of deep learning technology,pavement detection methods employing deep neural networks give an efficient and accurate solution.However,due to background diversity,limited resolution,and fracture similarity,it is tough to detect road cracks with high accuracy.In this study,we offer a unique,efficient and accurate road crack damage detection,namely YOLOv8-ES.We present a novel dynamic convolutional layer(EDCM)that successfully increases the feature extraction capabilities for small fractures.At the same time,we also present a new attention mechanism(SGAM).It can effectively retain crucial information and increase the network feature extraction capacity.The Wise-IoU technique contains a dynamic,non-monotonic focusing mechanism designed to return to the goal-bounding box more precisely,especially for low-quality samples.We validate our method on both RDD2022 and VOC2007 datasets.The experimental results suggest that YOLOv8-ES performs well.This unique approach provides great support for the development of intelligent road maintenance systems and is projected to achieve further advances in future applications.
基金National Hightech R&D Program (2015AA015307)the National Natural Science Foundation of China (Grant Nos. 61702189, 61432006 and 61672232)Youth Science and Technology -“Yang Fan” Program of Shanghai (17YF1427800).
文摘Log-structured merge tree has been adopted by many distributed storage systems. It decomposes a large database into multiple parts: an in?writing part and several read-only ones. Records are firstly written into a memoryoptimized structure and then compacted into in-disk struc? tures periodically. It achieves high write throughput. However, it brings side effect that read requests have to go through multiple structures to find the required record. In a distributed database system, different parts of the LSM-tree are stored in distributed fashion. To this end, a server in the query layer has to issues multiple network communications to pull data items from the underlying storage layer. Coming to its rescue, this work proposes a precise data access strategy which includes: an efficient structure with low maintaining overhead designed to test whether a record exists in the in?writing part of the LSM-tree;a lease-based synchronization strategy proposed to maintain consistent copies of the structure on remote query servers. We further prove the technique is capable of working robustly when the LSM-Tree is re?organizing multiple structures in the backend. It is also fault-tolerant, which is able to recover the structures used in data access after node failures happen. Experiments using the YCSB benchmark show that the solution has 6x throughput improvement over existing methods.
基金the National Key Research and Development Program of China (2016YFB1000905)the National Natural Science Foundation of China (Grant Nos.U1401256, 61402177,61672234,61402180 and 61232002)NSF of Shanghai (14ZR1412600).
文摘Entity alignment is the problem of identifying which entities in a data source refer to the same real-world entity in the others.Identifying entities across heterogeneous data sources is paramount to many research fields,such as data cleaning,data integration,.information retrieval and machine learning.The aligning process is not only overwhelmingly expensive for large data sources since it involves all tuples from two or more data sources,but also need to handle heterogeneous entity attributes.In this paper,we propose an unsupervised approach,called EnAli,to match entities across two or more heterogeneous data sources.EnAli employs a generative probabilistic model to incorporate the heterogeneous entity attributes via employing exponential family,handle missing values,and also utilize the locality sensitive hashing schema to reduce the candidate tuples and speed up the aligning process.EnAli is highly accurate and efficient even without any ground-truth tuples.We illustrate the performance of EnAli on re-identifying entities from the same data source,as well as aligning entities across three real data sources.Our experimental results manifest that our proposed approach outperforms the comparable baseline.
基金supported by the National Natural Science Foundation of China(No.U1836118)the Open Fund of Key Laboratory of Content Organization and Knowledge Services for Rich Media Digital Publishing(ZD2021-11/01)the Natural Science Foundation of Hubei Province educational Committee(B2019009)
文摘Temporal information is pervasive and crucial in medical records and other clinical text,as it formulates the development process of medical conditions and is vital for clinical decision making.However,providing a holistic knowledge representation and reasoning framework for various time expressions in the clinical text is challenging.In order to capture complex temporal semantics in clinical text,we propose a novel Clinical Time Ontology(CTO)as an extension from OWL framework.More specifically,we identified eight timerelated problems in clinical text and created 11 core temporal classes to conceptualize the fuzzy time,cyclic time,irregular time,negations and other complex aspects of clinical time.Then,we extended Allen’s and TEO’s temporal relations and defined the relation concept description between complex and simple time.Simultaneously,we provided a formulaic and graphical presentation of complex time and complex time relationships.We carried out empirical study on the expressiveness and usability of CTO using real-world healthcare datasets.Finally,experiment results demonstrate that CTO could faithfully represent and reason over 93%of the temporal expressions,and it can cover a wider range of time-related classes in clinical domain.
文摘Dropping fractions of users or items judiciously can reduce the computational cost of Collaborative Filtering(CF)algorithms.The effect of this subsampling on the computing time and accuracy of CF is not fully understood,and clear guidelines for selecting optimal or even appropriate subsampling levels are not available.In this paper,we present a Density-based Random Stratified Subsampling using Clustering(DRSC)algorithm in which the desired Fraction of Users Dropped(FUD)and Fraction of Items Dropped(FID)are specified,and the overall density during subsampling is maintained.Subsequently,we develop simple models of the Training Time Improvement(TTI)and the Accuracy Loss(AL)as functions of FUD and FID,based on extensive simulations of seven standard CF algorithms as applied to various primary matrices from MovieLens,Yahoo Music Rating,and Amazon Automotive data.Simulations show that both TTI and a scaled AL are bi-linear in FID and FUD for all seven methods.The TTI linear regression of a CF method appears to be same for all datasets.Extensive simulations illustrate that TTI can be estimated reliably with FUD and FID only,but AL requires considering additional dataset characteristics.The derived models are then used to optimize the levels of subsampling addressing the tradeoff between TTI and AL.A simple sub-optimal approximation was found,in which the optimal AL is proportional to the optimal Training Time Reduction Factor(TTRF)for higher values of TTRF,and the optimal subsampling levels,like optimal FID/(1-FID),are proportional to the square root of TTRF.