In the past two decades, software aging has been studied by both academic and industry communities. Many scholars focused on analytical methods or time series to model software aging process. While machine learning ha...In the past two decades, software aging has been studied by both academic and industry communities. Many scholars focused on analytical methods or time series to model software aging process. While machine learning has been shown as a very promising technique in application to forecast software state: normal or aging. In this paper, we proposed a method which can give practice guide to forecast software aging using machine learning algorithm. Firstly, we collected data from a running commercial web server and preprocessed these data. Secondly, feature selection algorithm was applied to find a subset of model parameters set. Thirdly, time series model was used to predict values of selected parameters in advance. Fourthly, some machine learning algorithms were used to model software aging process and to predict software aging. Fifthly, we used sensitivity analysis to analyze how heavily outcomes changed following input variables change. In the last, we applied our method to an IIS web server. Through analysis of the experiment results, we find that our proposed method can predict software aging in the early stage of system development life cycle.展开更多
Nowadays,software requirements are still mainly analyzed manually,which has many drawbacks(such as a large amount of labor consumption,inefficiency,and even inaccuracy of the results).The problems are even worse in do...Nowadays,software requirements are still mainly analyzed manually,which has many drawbacks(such as a large amount of labor consumption,inefficiency,and even inaccuracy of the results).The problems are even worse in domain analysis scenarios because a large number of requirements from many users need to be analyzed.In this sense,automatic analysis of software requirements can bring benefits to software companies.For this purpose,we proposed an approach to automatically analyze software requirement specifications(SRSs) and extract the semantic information.In this approach,a machine learning and ontology based semantic role labeling(SRL) method was used.First of all,some common verbs were calculated from SRS documents in the E-commerce domain,and then semantic frames were designed for those verbs.Based on the frames,sentences from SRSs were selected and labeled manually,and the labeled sentences were used as training examples in the machine learning stage.Besides the training examples labeled with semantic roles,external ontology knowledge was used to relieve the data sparsity problem and obtain reliable results.Based on the Sem Cor and Word Net corpus,the senses of nouns and verbs were identified in a sequential manner through the K-nearest neighbor approach.Then the senses of the verbs were used to identify the frame types.After that,we trained the SRL labeling classifier with the maximum entropy method,in which we added some new features based on word sense,such as the hypernyms and hyponyms of the word senses in the ontology.Experimental results show that this new approach for automatic functional requirements analysis is effective.展开更多
<p align="justify"> <span style="font-family:Verdana;">Amid the Covid-19 widespread, it has been challenging for educational institutions to conduct online classes, facing multiples cha...<p align="justify"> <span style="font-family:Verdana;">Amid the Covid-19 widespread, it has been challenging for educational institutions to conduct online classes, facing multiples challenges. This paper provides an insight into different approaches in facing those challenges which includes conducting a fair online class for students. It is tough for an instructor to keep track of their students at the same time because it is difficult to screen if any of the understudies within the class are not present, mindful, or drowsing. This paper discusses a possible solution, something new that can offer support to instructors seeing things from a more significant point of view. The solution is a facial analysis computer program that can let instructors know which students are attentive and who is not. There’s a green and red square box for face detection, for which Instructors can watch by seeing a green box on those mindful students conjointly, a red box on those who are not mindful at all. This paper finds that the program can automatically give attendance by analyzing data from face detection. It has other features for which the teacher can also know if any student leaves the class early. In this paper, model design, performance analysis, and online class assistant aspects of the program have been discussed.</span> </p>展开更多
Software cost estimation is a crucial aspect of software project management,significantly impacting productivity and planning.This research investigates the impact of various feature selection techniques on software c...Software cost estimation is a crucial aspect of software project management,significantly impacting productivity and planning.This research investigates the impact of various feature selection techniques on software cost estimation accuracy using the CoCoMo NASA dataset,which comprises data from 93 unique software projects with 24 attributes.By applying multiple machine learning algorithms alongside three feature selection methods,this study aims to reduce data redundancy and enhance model accuracy.Our findings reveal that the principal component analysis(PCA)-based feature selection technique achieved the highest performance,underscoring the importance of optimal feature selection in improving software cost estimation accuracy.It is demonstrated that our proposed method outperforms the existing method while achieving the highest precision,accuracy,and recall rates.展开更多
With the development of the 5th generation of mobile communi-cation(5G)networks and artificial intelligence(AI)technologies,the use of the Internet of Things(IoT)has expanded throughout industry.Although IoT networks ...With the development of the 5th generation of mobile communi-cation(5G)networks and artificial intelligence(AI)technologies,the use of the Internet of Things(IoT)has expanded throughout industry.Although IoT networks have improved industrial productivity and convenience,they are highly dependent on nonstandard protocol stacks and open-source-based,poorly validated software,resulting in several security vulnerabilities.How-ever,conventional AI-based software vulnerability discovery technologies cannot be applied to IoT because they require excessive memory and com-puting power.This study developed a technique for optimizing training data size to detect software vulnerabilities rapidly while maintaining learning accuracy.Experimental results using a software vulnerability classification dataset showed that different optimal data sizes did not affect the learning performance of the learning models.Moreover,the minimal data size required to train a model without performance degradation could be determined in advance.For example,the random forest model saved 85.18%of memory and improved latency by 97.82%while maintaining a learning accuracy similar to that achieved when using 100%of data,despite using only 1%.展开更多
The software engineering field has long focused on creating high-quality software despite limited resources.Detecting defects before the testing stage of software development can enable quality assurance engineers to ...The software engineering field has long focused on creating high-quality software despite limited resources.Detecting defects before the testing stage of software development can enable quality assurance engineers to con-centrate on problematic modules rather than all the modules.This approach can enhance the quality of the final product while lowering development costs.Identifying defective modules early on can allow for early corrections and ensure the timely delivery of a high-quality product that satisfies customers and instills greater confidence in the development team.This process is known as software defect prediction,and it can improve end-product quality while reducing the cost of testing and maintenance.This study proposes a software defect prediction system that utilizes data fusion,feature selection,and ensemble machine learning fusion techniques.A novel filter-based metric selection technique is proposed in the framework to select the optimum features.A three-step nested approach is presented for predicting defective modules to achieve high accuracy.In the first step,three supervised machine learning techniques,including Decision Tree,Support Vector Machines,and Naïve Bayes,are used to detect faulty modules.The second step involves integrating the predictive accuracy of these classification techniques through three ensemble machine-learning methods:Bagging,Voting,and Stacking.Finally,in the third step,a fuzzy logic technique is employed to integrate the predictive accuracy of the ensemble machine learning techniques.The experiments are performed on a fused software defect dataset to ensure that the developed fused ensemble model can perform effectively on diverse datasets.Five NASA datasets are integrated to create the fused dataset:MW1,PC1,PC3,PC4,and CM1.According to the results,the proposed system exhibited superior performance to other advanced techniques for predicting software defects,achieving a remarkable accuracy rate of 92.08%.展开更多
An essential objective of software development is to locate and fix defects ahead of schedule that could be expected under diverse circumstances. Many software development activities are performed by individuals, whic...An essential objective of software development is to locate and fix defects ahead of schedule that could be expected under diverse circumstances. Many software development activities are performed by individuals, which may lead to different software bugs over the development to occur, causing disappointments in the not-so-distant future. Thus, the prediction of software defects in the first stages has become a primary interest in the field of software engineering. Various software defect prediction (SDP) approaches that rely on software metrics have been proposed in the last two decades. Bagging, support vector machines (SVM), decision tree (DS), and random forest (RF) classifiers are known to perform well to predict defects. This paper studies and compares these supervised machine learning and ensemble classifiers on 10 NASA datasets. The experimental results showed that, in the majority of cases, RF was the best performing classifier compared to the others.展开更多
Machine learning(ML)has strong potential for soil settlement prediction,but determining hyperparameters for ML models is often intricate and laborious.Therefore,we apply Bayesian optimization to determine the optimal ...Machine learning(ML)has strong potential for soil settlement prediction,but determining hyperparameters for ML models is often intricate and laborious.Therefore,we apply Bayesian optimization to determine the optimal hyperparameter combinations,enhancing the effectiveness of ML models for soil parameter inversion.The ML models are trained using numerical simulation data generated with the modified Cam-Clay(MCC)model in ABAQUS software,and their performance is evaluated using ground settlement monitoring data from an airport runway.Five optimized ML models—decision tree(DT),random forest(RF),support vector regression(SVR),deep neural network(DNN),and one-dimensional convolutional neural network(1D-CNN)—are compared in terms of their accuracy for soil parameter inversion and settlement prediction.The results indicate that Bayesian optimization efficiently utilizes prior knowledge to identify the optimal hyperparameters,significantly improving model performance.Among the evaluated models,the 1D-CNN achieves the highest accuracy in soil parameter inversion,generating settlement predictions that closely match real monitoring data.These findings demonstrate the effectiveness of the proposed approach for soil parameter inversion and settlement prediction,and reveal how Bayesian optimization can refine the model selection process.展开更多
With the rising demand for data access,network service providers face the challenge of growing their capital and operating costs while at the same time enhancing network capacity and meeting the increased demand for a...With the rising demand for data access,network service providers face the challenge of growing their capital and operating costs while at the same time enhancing network capacity and meeting the increased demand for access.To increase efficacy of Software Defined Network(SDN)and Network Function Virtualization(NFV)framework,we need to eradicate network security configuration errors that may create vulnerabilities to affect overall efficiency,reduce network performance,and increase maintenance cost.The existing frameworks lack in security,and computer systems face few abnormalities,which prompts the need for different recognition and mitigation methods to keep the system in the operational state proactively.The fundamental concept behind SDN-NFV is the encroachment from specific resource execution to the programming-based structure.This research is around the combination of SDN and NFV for rational decision making to control and monitor traffic in the virtualized environment.The combination is often seen as an extra burden in terms of resources usage in a heterogeneous network environment,but as well as it provides the solution for critical problems specially regarding massive network traffic issues.The attacks have been expanding step by step;therefore,it is hard to recognize and protect by conventional methods.To overcome these issues,there must be an autonomous system to recognize and characterize the network traffic’s abnormal conduct if there is any.Only four types of assaults,including HTTP Flood,UDP Flood,Smurf Flood,and SiDDoS Flood,are considered in the identified dataset,to optimize the stability of the SDN-NFVenvironment and security management,through several machine learning based characterization techniques like Support Vector Machine(SVM),K-Nearest Neighbors(KNN),Logistic Regression(LR)and Isolation Forest(IF).Python is used for simulation purposes,including several valuable utilities like the mine package,the open-source Python ML libraries Scikit-learn,NumPy,SciPy,Matplotlib.Few Flood assaults and Structured Query Language(SQL)injections anomalies are validated and effectively-identified through the anticipated procedure.The classification results are promising and show that overall accuracy lies between 87%to 95%for SVM,LR,KNN,and IF classifiers in the scrutiny of traffic,whether the network traffic is normal or anomalous in the SDN-NFV environment.展开更多
Software Defined Network(SDN)and Network Function Virtualization(NFV)technology promote several benefits to network operators,including reduced maintenance costs,increased network operational performance,simplified ne...Software Defined Network(SDN)and Network Function Virtualization(NFV)technology promote several benefits to network operators,including reduced maintenance costs,increased network operational performance,simplified network lifecycle,and policies management.Network vulnerabilities try to modify services provided by Network Function Virtualization MANagement and Orchestration(NFV MANO),and malicious attacks in different scenarios disrupt the NFV Orchestrator(NFVO)and Virtualized Infrastructure Manager(VIM)lifecycle management related to network services or individual Virtualized Network Function(VNF).This paper proposes an anomaly detection mechanism that monitors threats in NFV MANO and manages promptly and adaptively to implement and handle security functions in order to enhance the quality of experience for end users.An anomaly detector investigates these identified risks and provides secure network services.It enables virtual network security functions and identifies anomalies in Kubernetes(a cloud-based platform).For training and testing purpose of the proposed approach,an intrusion-containing dataset is used that hold multiple malicious activities like a Smurf,Neptune,Teardrop,Pod,Land,IPsweep,etc.,categorized as Probing(Prob),Denial of Service(DoS),User to Root(U2R),and Remote to User(R2L)attacks.An anomaly detector is anticipated with the capabilities of a Machine Learning(ML)technique,making use of supervised learning techniques like Logistic Regression(LR),Support Vector Machine(SVM),Random Forest(RF),Naïve Bayes(NB),and Extreme Gradient Boosting(XGBoost).The proposed framework has been evaluated by deploying the identified ML algorithm on a Jupyter notebook in Kubeflow to simulate Kubernetes for validation purposes.RF classifier has shown better outcomes(99.90%accuracy)than other classifiers in detecting anomalies/intrusions in the containerized environment.展开更多
Forest habitats are critical for biodiversity,ecosystem services,human livelihoods,and well-being.Capacity to conduct theoretical and applied forest ecology research addressing direct(e.g.,deforestation)and indirect(e...Forest habitats are critical for biodiversity,ecosystem services,human livelihoods,and well-being.Capacity to conduct theoretical and applied forest ecology research addressing direct(e.g.,deforestation)and indirect(e.g.,climate change)anthropogenic pressures has benefited considerably from new field-and statistical-techniques.We used machine learning and bibliometric structural topic modelling to identify 20 latent topics comprising four principal fields from a corpus of 16,952 forest ecology/forestry articles published in eight ecology and five forestry journals between 2010 and 2022.Articles published per year increased from 820 in 2010 to 2,354 in 2021,shifting toward more applied topics.Publications from China and some countries in North America and Europe dominated,with relatively fewer articles from some countries in West and Central Africa and West Asia,despite globally important forest resources.Most study sites were in some countries in North America,Central Asia,and South America,and Australia.Articles utilizing R statistical software predominated,increasing from 29.5%in 2010 to 71.4%in 2022.The most frequently used packages included lme4,vegan,nlme,MuMIn,ggplot2,car,MASS,mgcv,multcomp and raster.R was more often used in forest ecology than applied forestry articles.R software offers advantages in script and workflow-sharing compared to other statistical packages.Our findings demonstrate that the disciplines of forest ecology/forestry are expanding both in number and scope,aided by more sophisticated statistical tools,to tackle the challenges of redressing forest habitat loss and the socio-economic impacts of deforestation.展开更多
Detecting well-known design patterns in object-oriented program source code can help maintainers understand the design of a program. Through the detection, the understandability, maintainability, and reusability of ob...Detecting well-known design patterns in object-oriented program source code can help maintainers understand the design of a program. Through the detection, the understandability, maintainability, and reusability of object-oriented programs can be improved. There are automated detection techniques;however, many existing techniques are based on static analysis and use strict conditions composed on class structure data. Hence, it is difficult for them to detect and distinguish design patterns in which the class structures are similar. Moreover, it is difficult for them to deal with diversity in design pattern applications. To solve these problems in existing techniques, we propose a design pattern detection technique using source code metrics and machine learning. Our technique judges candidates for the roles that compose design patterns by using machine learning and measurements of several metrics, and it detects design patterns by analyzing the relations between candidates. It suppresses false negatives and distinguishes patterns in which the class structures are similar. As a result of experimental evaluations with a set of programs, we confirmed that our technique is more accurate than two conventional techniques.展开更多
Software maintenance is the process of fixing,modifying,and improving software deliverables after they are delivered to the client.Clients can benefit from offshore software maintenance outsourcing(OSMO)in different w...Software maintenance is the process of fixing,modifying,and improving software deliverables after they are delivered to the client.Clients can benefit from offshore software maintenance outsourcing(OSMO)in different ways,including time savings,cost savings,and improving the software quality and value.One of the hardest challenges for the OSMO vendor is to choose a suitable project among several clients’projects.The goal of the current study is to recommend a machine learning-based decision support system that OSMO vendors can utilize to forecast or assess the project of OSMO clients.The projects belong to OSMO vendors,having offices in developing countries while providing services to developed countries.In the current study,Extreme Learning Machine’s(ELM’s)variant called Deep Extreme Learning Machines(DELMs)is used.A novel dataset consisting of 195 projects data is proposed to train the model and to evaluate the overall efficiency of the proposed model.The proposed DELM’s based model evaluations achieved 90.017%training accuracy having a value with 1.412×10^(-3) Root Mean Square Error(RMSE)and 85.772%testing accuracy with 1.569×10^(-3) RMSE with five DELMs hidden layers.The results express that the suggested model has gained a notable recognition rate in comparison to any previous studies.The current study also concludes DELMs as the most applicable and useful technique for OSMO client’s project assessment.展开更多
Demands on software reliability and availability have increased tremendously due to the nature of present day applications. We focus on the aspect of software for the high availability of application servers since the...Demands on software reliability and availability have increased tremendously due to the nature of present day applications. We focus on the aspect of software for the high availability of application servers since the unavailability of servers more often originates from software faults rather than hardware faults. The software rejuvenation technique has been widely used to avoid the occurrence of unplanned failures, mainly due to the phenomena of software aging or caused by transient failures. In this paper, first we present a new way of using the virtual machine based software rejuvenation named VMSR to offer high availability for application server systems. Second we model a single physical server which is used to host multiple virtual machines (VMs) with the VMSR framework using stochastic modeling and evaluate it through both numerical analysis and SHARPE (Symbolic Hierarchical Automated Reliability and Performance Evaluator) tool simulation. This VMSR model is very general and can capture application server characteristics, failure behavior, and performability measures. Our results demonstrate that VMSR approach is a practical way to ensure uninterrupted availability and to optimize performance for aging applications.展开更多
Purpose:This research addresses the challenge of concept drift in AI-enabled software,particularly within autonomous vehicle systems where concept drift in object recognition(like pedestrian detection)can lead to misc...Purpose:This research addresses the challenge of concept drift in AI-enabled software,particularly within autonomous vehicle systems where concept drift in object recognition(like pedestrian detection)can lead to misclassifications and safety risks.This study introduces a proactive framework to detect early signs of domain-specific concept drift by leveraging domain analysis and natural language processing techniques.This method is designed to help maintain the relevance of domain knowledge and prevent potential failures in AI systems due to evolving concept definitions.Design/methodology/approach:The proposed framework integrates natural language processing and image analysis to continuously update and monitor key domain concepts against evolving external data sources,such as social media and news.By identifying terms and features closely associated with core concepts,the system anticipates and flags significant changes.This was tested in the automotive domain on the pedestrian concept,where the framework was evaluated for its capacity to detect shifts in the recognition of pedestrians,particularly during events like Halloween and specific car accidents.Findings:The framework demonstrated an ability to detect shifts in the domain concept of pedestrians,as evidenced by contextual changes around major events.While it successfully identified pedestrian-related drift,the system’s accuracy varied when overlapping with larger social events.The results indicate the model’s potential to foresee relevant shifts before they impact autonomous systems,although further refinement is needed to handle high-impact concurrent events.Research limitations:This study focused on detecting concept drift in the pedestrian domain within autonomous vehicles,with results varying across domains.To assess generalizability,we tested the framework for airplane-related incidents and demonstrated adaptability.However,unpredictable events and data biases from social media and news may obscure domain-specific drifts.Further evaluation across diverse applications is needed to enhance robustness in evolving AI environments.Practical implications:The proactive detection of concept drift has significant implications for AI-driven domains,especially in safety-critical applications like autonomous driving.By identifying early signs of drift,this framework provides actionable insights for AI system updates,potentially reducing misclassification risks and enhancing public safety.Moreover,it enables timely interventions,reducing costly and labor-intensive retraining requirements by focusing only on the relevant aspects of evolving concepts.This method offers a streamlined approach for maintaining AI system performance in environments where domain knowledge rapidly changes.Originality/value:This study contributes a novel domain-agnostic framework that combines natural language processing with image analysis to predict concept drift early.This unique approach,which is focused on real-time data sources,offers an effective and scalable solution for addressing the evolving nature of domain-specific concepts in AI applications.展开更多
When data privacy is imposed as a necessity,Federated learning(FL)emerges as a relevant artificial intelligence field for developing machine learning(ML)models in a distributed and decentralized environment.FL allows ...When data privacy is imposed as a necessity,Federated learning(FL)emerges as a relevant artificial intelligence field for developing machine learning(ML)models in a distributed and decentralized environment.FL allows ML models to be trained on local devices without any need for centralized data transfer,thereby reducing both the exposure of sensitive data and the possibility of data interception by malicious third parties.This paradigm has gained momentum in the last few years,spurred by the plethora of real-world applications that have leveraged its ability to improve the efficiency of distributed learning and to accommodate numerous participants with their data sources.By virtue of FL,models can be learned from all such distributed data sources while preserving data privacy.The aim of this paper is to provide a practical tutorial on FL,including a short methodology and a systematic analysis of existing software frameworks.Furthermore,our tutorial provides exemplary cases of study from three complementary perspectives:i)Foundations of FL,describing the main components of FL,from key elements to FL categories;ii)Implementation guidelines and exemplary cases of study,by systematically examining the functionalities provided by existing software frameworks for FL deployment,devising a methodology to design a FL scenario,and providing exemplary cases of study with source code for different ML approaches;and iii)Trends,shortly reviewing a non-exhaustive list of research directions that are under active investigation in the current FL landscape.The ultimate purpose of this work is to establish itself as a referential work for researchers,developers,and data scientists willing to explore the capabilities of FL in practical applications.展开更多
A new preventive software rejuvenation policy is proposed in this paper. This technique is implemented when additive consumption of physical memory has reached some level. Using the theory of cumulative damage process...A new preventive software rejuvenation policy is proposed in this paper. This technique is implemented when additive consumption of physical memory has reached some level. Using the theory of cumulative damage process, two models are given for two kinds of bugs. For the first model, aging-related bug is considered only and consumption of physical memory could be known by tests made at periodic times, optimal preventive rejuvenation policy is analytically derived and numerical example is given. As an extended preventive software rejuvenation policy, Heisenbug and aging-related bug are considered meanwhile in the second model.展开更多
Recent studies have shown that software is one of the main reasons for computer systems unavailability. A growing ac- cumulation of software errors with time causes a phenomenon called software aging. This phenomenon ...Recent studies have shown that software is one of the main reasons for computer systems unavailability. A growing ac- cumulation of software errors with time causes a phenomenon called software aging. This phenomenon can result in system per- formance degradation and eventually system hang/crash. To cope with software aging, software rejuvenation has been proposed. Software rejuvenation is a proactive technique which leads to re- moving the accumulated software errors by stopping the system, cleaning up its internal state, and resuming its normal operation. One of the main challenges of software rejuvenation is accurately predicting the time to crash due to aging factors such as me- mory leaks. In this paper, different machine learning techniques are compared to accurately predict the software time to crash un- der different aging scenarios. Finally, by comparing the accuracy of different techniques, it can be concluded that the multilayer per- ceptron neural network has the highest prediction accuracy among all techniques studied.展开更多
Aiming at the difficulty of unknown Trojan detection in the APT flooding situation, an improved detecting method has been proposed. The basic idea of this method originates from advanced persistent threat (APT) atta...Aiming at the difficulty of unknown Trojan detection in the APT flooding situation, an improved detecting method has been proposed. The basic idea of this method originates from advanced persistent threat (APT) attack intents: besides dealing with damaging or destroying facilities, the more essential purpose of APT attacks is to gather confidential data from target hosts by planting Trojans. Inspired by this idea and some in-depth analyses on recently happened APT attacks, five typical communication characteristics are adopted to describe application’s network behavior, with which a fine-grained classifier based on Decision Tree and Na ve Bayes is modeled. Finally, with the training of supervised machine learning approaches, the classification detection method is implemented. Compared with general methods, this method is capable of enhancing the detection and awareness capability of unknown Trojans with less resource consumption.展开更多
In recent years, a large number of relatively advanced and often ready-to-use robotic hardware components and systems have been developed for small-scale use. As these tools are mature, there is now a shift towards ad...In recent years, a large number of relatively advanced and often ready-to-use robotic hardware components and systems have been developed for small-scale use. As these tools are mature, there is now a shift towards advanced applications. These often require automation and demand reliability, efficiency and decisional autonomy. New software tools and algorithms for artificial intelligence(AI) and machine learning(ML) can help here. However, since there are many software-based control approaches for small-scale robotics, it is rather unclear how these can be integrated and which approach may be used as a starting point. Therefore, this paper attempts to shed light on existing approaches with their advantages and disadvantages compared to established requirements. For this purpose, a survey was conducted in the target group. The software categories presented include vendor-provided software, robotic software frameworks(RSF), scientific software and in-house developed software(IHDS). Typical representatives for each category are described in detail, including Smar Act precision tool commander, Math Works Matlab and national instruments Lab VIEW, as well as the robot operating system(ROS). The identified software categories and their representatives are rated for end user satisfaction based on functional and non-functional requirements, recommendations and learning curves. The paper concludes with a recommendation of ROS as a basis for future work.展开更多
基金supported by the grants from Natural Science Foundation of China(Project No.61375045)the joint astronomic fund of the national natural science foundation of China and Chinese Academic Sinica(Project No.U1531242)Beijing Natural Science Foundation(4142030)
文摘In the past two decades, software aging has been studied by both academic and industry communities. Many scholars focused on analytical methods or time series to model software aging process. While machine learning has been shown as a very promising technique in application to forecast software state: normal or aging. In this paper, we proposed a method which can give practice guide to forecast software aging using machine learning algorithm. Firstly, we collected data from a running commercial web server and preprocessed these data. Secondly, feature selection algorithm was applied to find a subset of model parameters set. Thirdly, time series model was used to predict values of selected parameters in advance. Fourthly, some machine learning algorithms were used to model software aging process and to predict software aging. Fifthly, we used sensitivity analysis to analyze how heavily outcomes changed following input variables change. In the last, we applied our method to an IIS web server. Through analysis of the experiment results, we find that our proposed method can predict software aging in the early stage of system development life cycle.
基金the National Natural Science Foundation of China(No.61375053)
文摘Nowadays,software requirements are still mainly analyzed manually,which has many drawbacks(such as a large amount of labor consumption,inefficiency,and even inaccuracy of the results).The problems are even worse in domain analysis scenarios because a large number of requirements from many users need to be analyzed.In this sense,automatic analysis of software requirements can bring benefits to software companies.For this purpose,we proposed an approach to automatically analyze software requirement specifications(SRSs) and extract the semantic information.In this approach,a machine learning and ontology based semantic role labeling(SRL) method was used.First of all,some common verbs were calculated from SRS documents in the E-commerce domain,and then semantic frames were designed for those verbs.Based on the frames,sentences from SRSs were selected and labeled manually,and the labeled sentences were used as training examples in the machine learning stage.Besides the training examples labeled with semantic roles,external ontology knowledge was used to relieve the data sparsity problem and obtain reliable results.Based on the Sem Cor and Word Net corpus,the senses of nouns and verbs were identified in a sequential manner through the K-nearest neighbor approach.Then the senses of the verbs were used to identify the frame types.After that,we trained the SRL labeling classifier with the maximum entropy method,in which we added some new features based on word sense,such as the hypernyms and hyponyms of the word senses in the ontology.Experimental results show that this new approach for automatic functional requirements analysis is effective.
文摘<p align="justify"> <span style="font-family:Verdana;">Amid the Covid-19 widespread, it has been challenging for educational institutions to conduct online classes, facing multiples challenges. This paper provides an insight into different approaches in facing those challenges which includes conducting a fair online class for students. It is tough for an instructor to keep track of their students at the same time because it is difficult to screen if any of the understudies within the class are not present, mindful, or drowsing. This paper discusses a possible solution, something new that can offer support to instructors seeing things from a more significant point of view. The solution is a facial analysis computer program that can let instructors know which students are attentive and who is not. There’s a green and red square box for face detection, for which Instructors can watch by seeing a green box on those mindful students conjointly, a red box on those who are not mindful at all. This paper finds that the program can automatically give attendance by analyzing data from face detection. It has other features for which the teacher can also know if any student leaves the class early. In this paper, model design, performance analysis, and online class assistant aspects of the program have been discussed.</span> </p>
文摘Software cost estimation is a crucial aspect of software project management,significantly impacting productivity and planning.This research investigates the impact of various feature selection techniques on software cost estimation accuracy using the CoCoMo NASA dataset,which comprises data from 93 unique software projects with 24 attributes.By applying multiple machine learning algorithms alongside three feature selection methods,this study aims to reduce data redundancy and enhance model accuracy.Our findings reveal that the principal component analysis(PCA)-based feature selection technique achieved the highest performance,underscoring the importance of optimal feature selection in improving software cost estimation accuracy.It is demonstrated that our proposed method outperforms the existing method while achieving the highest precision,accuracy,and recall rates.
基金supported by a National Research Foundation of Korea (NRF)grant funded by the Ministry of Science and ICT (MSIT) (No.2020R1F1A1061107)the Korea Institute for Advancement of Technology (KIAT)grant funded by the Korean Government (MOTIE) (P0008703,The Competency Development Program for Industry Specialists)the MSIT under the ICAN (ICT Challenge and Advanced Network of HRD)program (No.IITP-2022-RS-2022-00156310)supervised by the Institute of Information&Communication Technology Planning and Evaluation (IITP).
文摘With the development of the 5th generation of mobile communi-cation(5G)networks and artificial intelligence(AI)technologies,the use of the Internet of Things(IoT)has expanded throughout industry.Although IoT networks have improved industrial productivity and convenience,they are highly dependent on nonstandard protocol stacks and open-source-based,poorly validated software,resulting in several security vulnerabilities.How-ever,conventional AI-based software vulnerability discovery technologies cannot be applied to IoT because they require excessive memory and com-puting power.This study developed a technique for optimizing training data size to detect software vulnerabilities rapidly while maintaining learning accuracy.Experimental results using a software vulnerability classification dataset showed that different optimal data sizes did not affect the learning performance of the learning models.Moreover,the minimal data size required to train a model without performance degradation could be determined in advance.For example,the random forest model saved 85.18%of memory and improved latency by 97.82%while maintaining a learning accuracy similar to that achieved when using 100%of data,despite using only 1%.
基金supported by the Center for Cyber-Physical Systems,Khalifa University,under Grant 8474000137-RC1-C2PS-T5.
文摘The software engineering field has long focused on creating high-quality software despite limited resources.Detecting defects before the testing stage of software development can enable quality assurance engineers to con-centrate on problematic modules rather than all the modules.This approach can enhance the quality of the final product while lowering development costs.Identifying defective modules early on can allow for early corrections and ensure the timely delivery of a high-quality product that satisfies customers and instills greater confidence in the development team.This process is known as software defect prediction,and it can improve end-product quality while reducing the cost of testing and maintenance.This study proposes a software defect prediction system that utilizes data fusion,feature selection,and ensemble machine learning fusion techniques.A novel filter-based metric selection technique is proposed in the framework to select the optimum features.A three-step nested approach is presented for predicting defective modules to achieve high accuracy.In the first step,three supervised machine learning techniques,including Decision Tree,Support Vector Machines,and Naïve Bayes,are used to detect faulty modules.The second step involves integrating the predictive accuracy of these classification techniques through three ensemble machine-learning methods:Bagging,Voting,and Stacking.Finally,in the third step,a fuzzy logic technique is employed to integrate the predictive accuracy of the ensemble machine learning techniques.The experiments are performed on a fused software defect dataset to ensure that the developed fused ensemble model can perform effectively on diverse datasets.Five NASA datasets are integrated to create the fused dataset:MW1,PC1,PC3,PC4,and CM1.According to the results,the proposed system exhibited superior performance to other advanced techniques for predicting software defects,achieving a remarkable accuracy rate of 92.08%.
文摘An essential objective of software development is to locate and fix defects ahead of schedule that could be expected under diverse circumstances. Many software development activities are performed by individuals, which may lead to different software bugs over the development to occur, causing disappointments in the not-so-distant future. Thus, the prediction of software defects in the first stages has become a primary interest in the field of software engineering. Various software defect prediction (SDP) approaches that rely on software metrics have been proposed in the last two decades. Bagging, support vector machines (SVM), decision tree (DS), and random forest (RF) classifiers are known to perform well to predict defects. This paper studies and compares these supervised machine learning and ensemble classifiers on 10 NASA datasets. The experimental results showed that, in the majority of cases, RF was the best performing classifier compared to the others.
基金supported by the National Natural Science Foundation of China(Nos.52378419 and 52478368).
文摘Machine learning(ML)has strong potential for soil settlement prediction,but determining hyperparameters for ML models is often intricate and laborious.Therefore,we apply Bayesian optimization to determine the optimal hyperparameter combinations,enhancing the effectiveness of ML models for soil parameter inversion.The ML models are trained using numerical simulation data generated with the modified Cam-Clay(MCC)model in ABAQUS software,and their performance is evaluated using ground settlement monitoring data from an airport runway.Five optimized ML models—decision tree(DT),random forest(RF),support vector regression(SVR),deep neural network(DNN),and one-dimensional convolutional neural network(1D-CNN)—are compared in terms of their accuracy for soil parameter inversion and settlement prediction.The results indicate that Bayesian optimization efficiently utilizes prior knowledge to identify the optimal hyperparameters,significantly improving model performance.Among the evaluated models,the 1D-CNN achieves the highest accuracy in soil parameter inversion,generating settlement predictions that closely match real monitoring data.These findings demonstrate the effectiveness of the proposed approach for soil parameter inversion and settlement prediction,and reveal how Bayesian optimization can refine the model selection process.
文摘With the rising demand for data access,network service providers face the challenge of growing their capital and operating costs while at the same time enhancing network capacity and meeting the increased demand for access.To increase efficacy of Software Defined Network(SDN)and Network Function Virtualization(NFV)framework,we need to eradicate network security configuration errors that may create vulnerabilities to affect overall efficiency,reduce network performance,and increase maintenance cost.The existing frameworks lack in security,and computer systems face few abnormalities,which prompts the need for different recognition and mitigation methods to keep the system in the operational state proactively.The fundamental concept behind SDN-NFV is the encroachment from specific resource execution to the programming-based structure.This research is around the combination of SDN and NFV for rational decision making to control and monitor traffic in the virtualized environment.The combination is often seen as an extra burden in terms of resources usage in a heterogeneous network environment,but as well as it provides the solution for critical problems specially regarding massive network traffic issues.The attacks have been expanding step by step;therefore,it is hard to recognize and protect by conventional methods.To overcome these issues,there must be an autonomous system to recognize and characterize the network traffic’s abnormal conduct if there is any.Only four types of assaults,including HTTP Flood,UDP Flood,Smurf Flood,and SiDDoS Flood,are considered in the identified dataset,to optimize the stability of the SDN-NFVenvironment and security management,through several machine learning based characterization techniques like Support Vector Machine(SVM),K-Nearest Neighbors(KNN),Logistic Regression(LR)and Isolation Forest(IF).Python is used for simulation purposes,including several valuable utilities like the mine package,the open-source Python ML libraries Scikit-learn,NumPy,SciPy,Matplotlib.Few Flood assaults and Structured Query Language(SQL)injections anomalies are validated and effectively-identified through the anticipated procedure.The classification results are promising and show that overall accuracy lies between 87%to 95%for SVM,LR,KNN,and IF classifiers in the scrutiny of traffic,whether the network traffic is normal or anomalous in the SDN-NFV environment.
基金This work was funded by the Deanship of Scientific Research at Jouf University under Grant Number(DSR2022-RG-0102).
文摘Software Defined Network(SDN)and Network Function Virtualization(NFV)technology promote several benefits to network operators,including reduced maintenance costs,increased network operational performance,simplified network lifecycle,and policies management.Network vulnerabilities try to modify services provided by Network Function Virtualization MANagement and Orchestration(NFV MANO),and malicious attacks in different scenarios disrupt the NFV Orchestrator(NFVO)and Virtualized Infrastructure Manager(VIM)lifecycle management related to network services or individual Virtualized Network Function(VNF).This paper proposes an anomaly detection mechanism that monitors threats in NFV MANO and manages promptly and adaptively to implement and handle security functions in order to enhance the quality of experience for end users.An anomaly detector investigates these identified risks and provides secure network services.It enables virtual network security functions and identifies anomalies in Kubernetes(a cloud-based platform).For training and testing purpose of the proposed approach,an intrusion-containing dataset is used that hold multiple malicious activities like a Smurf,Neptune,Teardrop,Pod,Land,IPsweep,etc.,categorized as Probing(Prob),Denial of Service(DoS),User to Root(U2R),and Remote to User(R2L)attacks.An anomaly detector is anticipated with the capabilities of a Machine Learning(ML)technique,making use of supervised learning techniques like Logistic Regression(LR),Support Vector Machine(SVM),Random Forest(RF),Naïve Bayes(NB),and Extreme Gradient Boosting(XGBoost).The proposed framework has been evaluated by deploying the identified ML algorithm on a Jupyter notebook in Kubeflow to simulate Kubernetes for validation purposes.RF classifier has shown better outcomes(99.90%accuracy)than other classifiers in detecting anomalies/intrusions in the containerized environment.
基金financially supported by the National Natural Science Foundation of China(31971541).
文摘Forest habitats are critical for biodiversity,ecosystem services,human livelihoods,and well-being.Capacity to conduct theoretical and applied forest ecology research addressing direct(e.g.,deforestation)and indirect(e.g.,climate change)anthropogenic pressures has benefited considerably from new field-and statistical-techniques.We used machine learning and bibliometric structural topic modelling to identify 20 latent topics comprising four principal fields from a corpus of 16,952 forest ecology/forestry articles published in eight ecology and five forestry journals between 2010 and 2022.Articles published per year increased from 820 in 2010 to 2,354 in 2021,shifting toward more applied topics.Publications from China and some countries in North America and Europe dominated,with relatively fewer articles from some countries in West and Central Africa and West Asia,despite globally important forest resources.Most study sites were in some countries in North America,Central Asia,and South America,and Australia.Articles utilizing R statistical software predominated,increasing from 29.5%in 2010 to 71.4%in 2022.The most frequently used packages included lme4,vegan,nlme,MuMIn,ggplot2,car,MASS,mgcv,multcomp and raster.R was more often used in forest ecology than applied forestry articles.R software offers advantages in script and workflow-sharing compared to other statistical packages.Our findings demonstrate that the disciplines of forest ecology/forestry are expanding both in number and scope,aided by more sophisticated statistical tools,to tackle the challenges of redressing forest habitat loss and the socio-economic impacts of deforestation.
文摘Detecting well-known design patterns in object-oriented program source code can help maintainers understand the design of a program. Through the detection, the understandability, maintainability, and reusability of object-oriented programs can be improved. There are automated detection techniques;however, many existing techniques are based on static analysis and use strict conditions composed on class structure data. Hence, it is difficult for them to detect and distinguish design patterns in which the class structures are similar. Moreover, it is difficult for them to deal with diversity in design pattern applications. To solve these problems in existing techniques, we propose a design pattern detection technique using source code metrics and machine learning. Our technique judges candidates for the roles that compose design patterns by using machine learning and measurements of several metrics, and it detects design patterns by analyzing the relations between candidates. It suppresses false negatives and distinguishes patterns in which the class structures are similar. As a result of experimental evaluations with a set of programs, we confirmed that our technique is more accurate than two conventional techniques.
基金fully funded by Universiti Teknologi Malaysia under the UTM Fundamental Research Grant(UTMFR)with Cost Center No Q.K130000.2556.21H14.
文摘Software maintenance is the process of fixing,modifying,and improving software deliverables after they are delivered to the client.Clients can benefit from offshore software maintenance outsourcing(OSMO)in different ways,including time savings,cost savings,and improving the software quality and value.One of the hardest challenges for the OSMO vendor is to choose a suitable project among several clients’projects.The goal of the current study is to recommend a machine learning-based decision support system that OSMO vendors can utilize to forecast or assess the project of OSMO clients.The projects belong to OSMO vendors,having offices in developing countries while providing services to developed countries.In the current study,Extreme Learning Machine’s(ELM’s)variant called Deep Extreme Learning Machines(DELMs)is used.A novel dataset consisting of 195 projects data is proposed to train the model and to evaluate the overall efficiency of the proposed model.The proposed DELM’s based model evaluations achieved 90.017%training accuracy having a value with 1.412×10^(-3) Root Mean Square Error(RMSE)and 85.772%testing accuracy with 1.569×10^(-3) RMSE with five DELMs hidden layers.The results express that the suggested model has gained a notable recognition rate in comparison to any previous studies.The current study also concludes DELMs as the most applicable and useful technique for OSMO client’s project assessment.
基金supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) under Grant No. KRF2007-210-D00006
文摘Demands on software reliability and availability have increased tremendously due to the nature of present day applications. We focus on the aspect of software for the high availability of application servers since the unavailability of servers more often originates from software faults rather than hardware faults. The software rejuvenation technique has been widely used to avoid the occurrence of unplanned failures, mainly due to the phenomena of software aging or caused by transient failures. In this paper, first we present a new way of using the virtual machine based software rejuvenation named VMSR to offer high availability for application server systems. Second we model a single physical server which is used to host multiple virtual machines (VMs) with the VMSR framework using stochastic modeling and evaluate it through both numerical analysis and SHARPE (Symbolic Hierarchical Automated Reliability and Performance Evaluator) tool simulation. This VMSR model is very general and can capture application server characteristics, failure behavior, and performability measures. Our results demonstrate that VMSR approach is a practical way to ensure uninterrupted availability and to optimize performance for aging applications.
基金supported by U.S.Office of Naval Research(ONR)Grant number G2A62826.
文摘Purpose:This research addresses the challenge of concept drift in AI-enabled software,particularly within autonomous vehicle systems where concept drift in object recognition(like pedestrian detection)can lead to misclassifications and safety risks.This study introduces a proactive framework to detect early signs of domain-specific concept drift by leveraging domain analysis and natural language processing techniques.This method is designed to help maintain the relevance of domain knowledge and prevent potential failures in AI systems due to evolving concept definitions.Design/methodology/approach:The proposed framework integrates natural language processing and image analysis to continuously update and monitor key domain concepts against evolving external data sources,such as social media and news.By identifying terms and features closely associated with core concepts,the system anticipates and flags significant changes.This was tested in the automotive domain on the pedestrian concept,where the framework was evaluated for its capacity to detect shifts in the recognition of pedestrians,particularly during events like Halloween and specific car accidents.Findings:The framework demonstrated an ability to detect shifts in the domain concept of pedestrians,as evidenced by contextual changes around major events.While it successfully identified pedestrian-related drift,the system’s accuracy varied when overlapping with larger social events.The results indicate the model’s potential to foresee relevant shifts before they impact autonomous systems,although further refinement is needed to handle high-impact concurrent events.Research limitations:This study focused on detecting concept drift in the pedestrian domain within autonomous vehicles,with results varying across domains.To assess generalizability,we tested the framework for airplane-related incidents and demonstrated adaptability.However,unpredictable events and data biases from social media and news may obscure domain-specific drifts.Further evaluation across diverse applications is needed to enhance robustness in evolving AI environments.Practical implications:The proactive detection of concept drift has significant implications for AI-driven domains,especially in safety-critical applications like autonomous driving.By identifying early signs of drift,this framework provides actionable insights for AI system updates,potentially reducing misclassification risks and enhancing public safety.Moreover,it enables timely interventions,reducing costly and labor-intensive retraining requirements by focusing only on the relevant aspects of evolving concepts.This method offers a streamlined approach for maintaining AI system performance in environments where domain knowledge rapidly changes.Originality/value:This study contributes a novel domain-agnostic framework that combines natural language processing with image analysis to predict concept drift early.This unique approach,which is focused on real-time data sources,offers an effective and scalable solution for addressing the evolving nature of domain-specific concepts in AI applications.
基金the R&D&I,Spain grants PID2020-119478GB-I00 and,PID2020-115832GB-I00 funded by MCIN/AEI/10.13039/501100011033.N.Rodríguez-Barroso was supported by the grant FPU18/04475 funded by MCIN/AEI/10.13039/501100011033 and by“ESF Investing in your future”Spain.J.Moyano was supported by a postdoctoral Juan de la Cierva Formación grant FJC2020-043823-I funded by MCIN/AEI/10.13039/501100011033 and by European Union NextGenerationEU/PRTR.J.Del Ser acknowledges funding support from the Spanish Centro para el Desarrollo Tecnológico Industrial(CDTI)through the AI4ES projectthe Department of Education of the Basque Government(consolidated research group MATHMODE,IT1456-22)。
文摘When data privacy is imposed as a necessity,Federated learning(FL)emerges as a relevant artificial intelligence field for developing machine learning(ML)models in a distributed and decentralized environment.FL allows ML models to be trained on local devices without any need for centralized data transfer,thereby reducing both the exposure of sensitive data and the possibility of data interception by malicious third parties.This paradigm has gained momentum in the last few years,spurred by the plethora of real-world applications that have leveraged its ability to improve the efficiency of distributed learning and to accommodate numerous participants with their data sources.By virtue of FL,models can be learned from all such distributed data sources while preserving data privacy.The aim of this paper is to provide a practical tutorial on FL,including a short methodology and a systematic analysis of existing software frameworks.Furthermore,our tutorial provides exemplary cases of study from three complementary perspectives:i)Foundations of FL,describing the main components of FL,from key elements to FL categories;ii)Implementation guidelines and exemplary cases of study,by systematically examining the functionalities provided by existing software frameworks for FL deployment,devising a methodology to design a FL scenario,and providing exemplary cases of study with source code for different ML approaches;and iii)Trends,shortly reviewing a non-exhaustive list of research directions that are under active investigation in the current FL landscape.The ultimate purpose of this work is to establish itself as a referential work for researchers,developers,and data scientists willing to explore the capabilities of FL in practical applications.
基金This project is supported by National Natural Science Foundation of China (70471017, 70801036) Humanities and Social Science Research Foundation of China(05JA630027)
文摘A new preventive software rejuvenation policy is proposed in this paper. This technique is implemented when additive consumption of physical memory has reached some level. Using the theory of cumulative damage process, two models are given for two kinds of bugs. For the first model, aging-related bug is considered only and consumption of physical memory could be known by tests made at periodic times, optimal preventive rejuvenation policy is analytically derived and numerical example is given. As an extended preventive software rejuvenation policy, Heisenbug and aging-related bug are considered meanwhile in the second model.
文摘Recent studies have shown that software is one of the main reasons for computer systems unavailability. A growing ac- cumulation of software errors with time causes a phenomenon called software aging. This phenomenon can result in system per- formance degradation and eventually system hang/crash. To cope with software aging, software rejuvenation has been proposed. Software rejuvenation is a proactive technique which leads to re- moving the accumulated software errors by stopping the system, cleaning up its internal state, and resuming its normal operation. One of the main challenges of software rejuvenation is accurately predicting the time to crash due to aging factors such as me- mory leaks. In this paper, different machine learning techniques are compared to accurately predict the software time to crash un- der different aging scenarios. Finally, by comparing the accuracy of different techniques, it can be concluded that the multilayer per- ceptron neural network has the highest prediction accuracy among all techniques studied.
基金Supported by the National Natural Science Foundation of China (61202387, 61103220)Major Projects of National Science and Technology of China(2010ZX03006-001-01)+3 种基金Doctoral Fund of Ministry of Education of China (2012014110002)China Postdoctoral Science Foundation (2012M510641)Hubei Province Natural Science Foundation (2011CDB456)Wuhan Chenguang Plan Project(2012710367)
文摘Aiming at the difficulty of unknown Trojan detection in the APT flooding situation, an improved detecting method has been proposed. The basic idea of this method originates from advanced persistent threat (APT) attack intents: besides dealing with damaging or destroying facilities, the more essential purpose of APT attacks is to gather confidential data from target hosts by planting Trojans. Inspired by this idea and some in-depth analyses on recently happened APT attacks, five typical communication characteristics are adopted to describe application’s network behavior, with which a fine-grained classifier based on Decision Tree and Na ve Bayes is modeled. Finally, with the training of supervised machine learning approaches, the classification detection method is implemented. Compared with general methods, this method is capable of enhancing the detection and awareness capability of unknown Trojans with less resource consumption.
文摘In recent years, a large number of relatively advanced and often ready-to-use robotic hardware components and systems have been developed for small-scale use. As these tools are mature, there is now a shift towards advanced applications. These often require automation and demand reliability, efficiency and decisional autonomy. New software tools and algorithms for artificial intelligence(AI) and machine learning(ML) can help here. However, since there are many software-based control approaches for small-scale robotics, it is rather unclear how these can be integrated and which approach may be used as a starting point. Therefore, this paper attempts to shed light on existing approaches with their advantages and disadvantages compared to established requirements. For this purpose, a survey was conducted in the target group. The software categories presented include vendor-provided software, robotic software frameworks(RSF), scientific software and in-house developed software(IHDS). Typical representatives for each category are described in detail, including Smar Act precision tool commander, Math Works Matlab and national instruments Lab VIEW, as well as the robot operating system(ROS). The identified software categories and their representatives are rated for end user satisfaction based on functional and non-functional requirements, recommendations and learning curves. The paper concludes with a recommendation of ROS as a basis for future work.