Continual learning(CL)studies the problem of learning to accumulate knowledge over time from a stream of data.A crucial challenge is that neural networks suffer from performance degradation on previously seen data,kno...Continual learning(CL)studies the problem of learning to accumulate knowledge over time from a stream of data.A crucial challenge is that neural networks suffer from performance degradation on previously seen data,known as catastrophic forgetting,due to allowing parameter sharing.In this work,we consider a more practical online class-incremental CL setting,where the model learns new samples in an online manner and may continuously experience new classes.Moreover,prior knowledge is unavailable during training and evaluation.Existing works usually explore sample usages from a single dimension,which ignores a lot of valuable supervisory information.To better tackle the setting,we propose a novel replay-based CL method,which leverages multi-level representations produced by the intermediate process of training samples for replay and strengthens supervision to consolidate previous knowledge.Specifically,besides the previous raw samples,we store the corresponding logits and features in the memory.Furthermore,to imitate the prediction of the past model,we construct extra constraints by leveraging multi-level information stored in the memory.With the same number of samples for replay,our method can use more past knowledge to prevent interference.We conduct extensive evaluations on several popular CL datasets,and experiments show that our method consistently outperforms state-of-the-art methods with various sizes of episodic memory.We further provide a detailed analysis of these results and demonstrate that our method is more viable in practical scenarios.展开更多
Modern intelligent systems,such as autonomous vehicles and face recognition,must continuously adapt to new scenarios while preserving their ability to handle previously encountered situations.However,when neural netwo...Modern intelligent systems,such as autonomous vehicles and face recognition,must continuously adapt to new scenarios while preserving their ability to handle previously encountered situations.However,when neural networks learn new classes sequentially,they suffer from catastrophic forgetting—the tendency to lose knowledge of earlier classes.This challenge,which lies at the core of class-incremental learning,severely limits the deployment of continual learning systems in real-world applications with streaming data.Existing approaches,including rehearsalbased methods and knowledge distillation techniques,have attempted to address this issue but often struggle to effectively preserve decision boundaries and discriminative features under limited memory constraints.To overcome these limitations,we propose a support vector-guided framework for class-incremental learning.The framework integrates an enhanced feature extractor with a Support Vector Machine classifier,which generates boundary-critical support vectors to guide both replay and distillation.Building on this architecture,we design a joint feature retention strategy that combines boundary proximity with feature diversity,and a Support Vector Distillation Loss that enforces dual alignment in decision and semantic spaces.In addition,triple attention modules are incorporated into the feature extractor to enhance representation power.Extensive experiments on CIFAR-100 and Tiny-ImageNet demonstrate effective improvements.On CIFAR-100 and Tiny-ImageNet with 5 tasks,our method achieves 71.68%and 58.61%average accuracy,outperforming strong baselines by 3.34%and 2.05%.These advantages are consistently observed across different task splits,highlighting the robustness and generalization of the proposed approach.Beyond benchmark evaluations,the framework also shows potential in few-shot and resource-constrained applications such as edge computing and mobile robotics.展开更多
Mental-health risk detection seeks early signs of distress from social media posts and clinical transcripts to enable timely intervention before crises.When such risks go undetected,consequences can escalate to self-h...Mental-health risk detection seeks early signs of distress from social media posts and clinical transcripts to enable timely intervention before crises.When such risks go undetected,consequences can escalate to self-harm,long-term disability,reduced productivity,and significant societal and economic burden.Despite recent advances,detecting risk from online text remains challenging due to heterogeneous language,evolving semantics,and the sequential emergence of new datasets.Effective solutions must encode clinically meaningful cues,reason about causal relations,and adapt to new domains without forgetting prior knowledge.To address these challenges,this paper presents a Continual Neuro-Symbolic Graph Learning(CNSGL)framework that unifies symbolic reasoning,causal inference,and continual learning within a single architecture.Each post is represented as a symbolic graph linking clinically relevant tags to textual content,enriched with causal edges derived from directional Point-wise Mutual Information(PMI).A two-layer Graph Convolutional Network(GCN)encodes these graphs,and a Transformer-based attention pooler aggregates node embeddings while providing interpretable tag-level importances.Continual adaptation across datasets is achieved through the Multi-Head Freeze(MH-Freeze)strategy,which freezes a shared encoder and incrementally trains lightweight task-specific heads(small classifiers attached to the shared embedding).Experimental evaluations across six diverse mental-health datasets ranging from Reddit discourse to clinical interviews,demonstrate that MH-Freeze consistently outperforms existing continual-learning baselines in both discriminative accuracy and calibration reliability.Across six datasets,MH-Freeze achieves up to 0.925 accuracy and 0.923 F1-Score,with AUPRC≥0.934 and AUROC≥0.942,consistently surpassing all continual-learning baselines.The results confirm the framework’s ability to preserve prior knowledge,adapt to domain shifts,and maintain causal interpretability,establishing CNSGL as a promising step toward robust,explainable,and lifelong mental-health risk assessment.展开更多
Continual learning fault diagnosis(CLFD)has gained growing interest in mechanical systems for its ability to accumulate and transfer knowledge in dynamic fault diagnosis scenarios.However,existing CLFD methods typical...Continual learning fault diagnosis(CLFD)has gained growing interest in mechanical systems for its ability to accumulate and transfer knowledge in dynamic fault diagnosis scenarios.However,existing CLFD methods typically assume balanced task distributions,neglecting the long-tailed nature of real-world fault occurrences,where certain faults dominate while others are rare.Due to the long-tailed distribution among different me-chanical conditions,excessive attention has been focused on the dominant type,leading to performance de-gradation in rarer types.In this paper,decoupling incremental classifier and representation learning(DICRL)is proposed to address the dual challenges of catastrophic forgetting introduced by incremental tasks and the bias in long-tailed CLFD(LT-CLFD).The core innovation lies in the structural decoupling of incremental classifier learning and representation learning.An instance-balanced sampling strategy is employed to learn more dis-criminative deep representations from the exemplars selected by the herding algorithm and new data.Then,the previous classifiers are frozen to prevent damage to representation learning during backward propagation.Cosine normalization classifier with learnable weight scaling is trained using a class-balanced sampling strategy to enhance classification accuracy.Experimental results demonstrate that DICRL outperforms existing continual learning methods across multiple benchmarks,demonstrating superior performance and robustness in both LT-CLFD and conventional CLFD.DICRL effectively tackles both catastrophic forgetting and long-tailed distribution in CLFD,enabling more reliable fault diagnosis in industrial applications.展开更多
The iterative continuation task(ICT)requires English as a foreign language(EFL)learners to read a segment and write a continuation that aligns with the preceding segment of an English novel with successive turns,offer...The iterative continuation task(ICT)requires English as a foreign language(EFL)learners to read a segment and write a continuation that aligns with the preceding segment of an English novel with successive turns,offering exposure to diverse grammatical structures and opportunities for contextualized usage.Given the importance of integrating technology into second language(L2)writing and the critical role that grammar plays in L2 writing development,automated written corrective feedback provided by Grammarly has gained significant attention.This study investigates the impact of Grammarly on grammar learning strategies,grammar grit,and grammar competence among EFL college students engaged in ICT.This study employed a mixed-methods sequential exploratory design;56 participants were divided into an experimental group(n=28),receiving Grammarly feedback for ICT,and a control group(n=28),completing ICT without Grammarly feedback.Quantitative results revealed that both groups showed improvements in L2 grammar learning strategies,grit and competence.For the experimental group,significant differences were observed across all variables of L2 grammar learning strategies,grit,and competence between pre-and post-tests.For the control group,significant differences were only observed in the affective dimension of grammar learning strategies,Consistency of Interest(COI)of grammar grit,and grammar competence.However,the control group presented a significantly higher improvement in grammar competence.Qualitative analysis showed both positive and negative perceptions of Grammarly.The pedagogical implications of integrating Grammarly and ICT for L2 grammar development are discussed.展开更多
In wireless sensor networks,ensuring communication security via specific emitter identification(SEI)is crucial.However,existing SEI methods are limited to closed-set scenarios and lack the ability to detect unknown de...In wireless sensor networks,ensuring communication security via specific emitter identification(SEI)is crucial.However,existing SEI methods are limited to closed-set scenarios and lack the ability to detect unknown devices and perform classincremental training.This study proposes a class-incremental open-set SEI approach.The open-set SEI model calculates radiofrequency fingerprints(RFFs)prototypes for known signals and employs a self-attention mechanism to enhance their discriminability.Detection thresholds are set through Gaussian fitting for each class.For class-incremental learning,the algorithm freezes the parameters of the previously trained model to initialize the new model.It designs specific losses:the RFFs extraction distribution difference loss and the prototype transformation distribution difference loss,which force the new model to retain old knowledge while learning new knowledge.The training loss enables learning of new class RFFs.Experimental results demonstrate that the open-set SEI model achieves state-of-theart performance and strong noise robustness.Moreover,the class-incremental learning algorithm effectively enables the model to retain old device RFFs knowledge,acquire new device RFFs knowledge,and detect unknown devices simultaneously.展开更多
Ensuring the consistent mechanical performance of three-dimensional(3D)-printed continuous fiber-reinforced composites is a significant challenge in additive manufacturing.The current reliance on manual monitoring exa...Ensuring the consistent mechanical performance of three-dimensional(3D)-printed continuous fiber-reinforced composites is a significant challenge in additive manufacturing.The current reliance on manual monitoring exacerbates this challenge by rendering the process vulnerable to environmental changes and unexpected factors,resulting in defects and inconsistent product quality,particularly in unmanned long-term operations or printing in extreme environments.To address these issues,we developed a process monitoring and closed-loop feedback control strategy for the 3D printing process.Real-time printing image data were captured and analyzed using a well-trained neural network model,and a real-time control module-enabled closed-loop feedback control of the flow rate was developed.The neural network model,which was based on image processing and artificial intelligence,enabled the recognition of flow rate values with an accuracy of 94.70%.The experimental results showed significant improvements in both the surface performance and mechanical properties of printed composites,with three to six times improvement in tensile strength and elastic modulus,demonstrating the effectiveness of the strategy.This study provides a generalized process monitoring and feedback control method for the 3D printing of continuous fiber-reinforced composites,and offers a potential solution for remote online monitoring and closed-loop adjustment in unmanned or extreme space environments.展开更多
Autonomous legged robots,capable of navigating uneven terrain,can perform a diverse array of tasks.However,designing locomotion controllers remains challenging.In particular,designing a controller based on durable and...Autonomous legged robots,capable of navigating uneven terrain,can perform a diverse array of tasks.However,designing locomotion controllers remains challenging.In particular,designing a controller based on durable and reliable proprioceptive sensors,is essential for achieving adaptability.Presently,the controller must either be manually designed for specific robots and tasks,or developed using machine-learning techniques,which require extensive training time and result in complex controllers.Inspired by animal locomotion,we propose a simple yet comprehensive closed-loop modular framework that utilizes minimal proprioceptive feedback(i.e.,the Coxa-Femur(CF)joint angle),enabling a quadruped robot to efficiently navigate unpredictable and uneven terrains,including the step and slope.The framework comprises a basic neural control network capable of rapidly learning optimized motor patterns,and a straightforward module for sensory feedback sharing and integration.In a series of experiments,we show that integrating sensory feedback into the base neural control network aids the robot in continually learning robust motor patterns on flat,step,and slope terrain,compared with the open-loop base framework.Sharing sensory feedback information across the four legs enables a quadruped robot to proactively navigate unpredictable steps with minimal interaction.Furthermore,the controller remains functional even in the absence of sensor signals.This control configuration was successfully transferred to a physical robot without any modifications.展开更多
The analytic continuation serves as a crucial bridge between quantum Monte Carlo calculations in imaginary-time formalism,specifically the Green's functions,and physical measurements(the spectral functions)in real...The analytic continuation serves as a crucial bridge between quantum Monte Carlo calculations in imaginary-time formalism,specifically the Green's functions,and physical measurements(the spectral functions)in real time.Various approaches have been developed to enhance the accuracy of analytic continuation,including the Padéapproximation,the maximum entropy method,and stochastic analytic continuation.In this study,we employ different deep learning techniques to investigate the analytic continuation for the quantum impurity model.A significant challenge in this context is that the sharp Abrikosov-Suhl resonance peak may be either underestimated or overestimated.We fit both the imaginary-time Green's function and the spectral function using Chebyshev polynomials in logarithmic coordinates.We utilize Full-Connected Networks(FCN),Convolutional Neural Networks(CNNs),and Residual Networks(ResNet)to address this issue.Our findings indicate that introducing noise during the training phase significantly improves the accuracy of the learning process.The typical absolute error achieved is less than 10-4.These investigations pave the way for machine learning to optimize the analytic continuation problem in many-body systems,thereby reducing the need for prior expertise in physics.展开更多
In natural language processing(NLP),managing multiple downstream tasks through fine-tuning pre-trained models often requires maintaining separate task-specific models,leading to practical inefficiencies.To address thi...In natural language processing(NLP),managing multiple downstream tasks through fine-tuning pre-trained models often requires maintaining separate task-specific models,leading to practical inefficiencies.To address this challenge,we introduce AdaptForever,a novel approach that enables continuous mastery of NLP tasks through the integration of elastic and mutual learning strategies with a stochastic expert mechanism.Our method freezes the pre-trained model weights while incorporating adapters enhanced with mutual learning capabilities,facilitating effective knowledge transfer from previous tasks to new ones.By combining Elastic Weight Consolidation(EWC)for knowledge preservation with specialized regularization terms,AdaptForever successfully maintains performance on earlier tasks while acquiring new capabilities.Experimental results demonstrate that AdaptForever achieves superior performance across a continuous sequence of NLP tasks compared to existing parameter-efficient methods,while effectively preventing catastrophic forgetting and enabling positive knowledge transfer between tasks.展开更多
As a data-driven approach, Deep Learning(DL)-based fault diagnosis methods need to collect the relatively comprehensive data on machine fault types to achieve satisfactory performance. A mechanical system may include ...As a data-driven approach, Deep Learning(DL)-based fault diagnosis methods need to collect the relatively comprehensive data on machine fault types to achieve satisfactory performance. A mechanical system may include multiple submachines in the real-world. During condition monitoring of a mechanical system, fault data are distributed in a continuous flow of constantly generated information and new faults will inevitably occur in unconsidered submachines, which are also called machine increments. Therefore, adequately collecting fault data in advance is difficult. Limited by the characteristics of DL, training existing models directly with new fault data of new submachines leads to catastrophic forgetting of old tasks, while the cost of collecting all known data to retrain the models is excessively high. DL-based fault diagnosis methods cannot learn continually and adaptively in dynamic environments. A new Continual Learning Fault Diagnosis method(CLFD) is proposed in this paper to solve a series of fault diagnosis tasks with machine increments. The stability–plasticity dilemma is an intrinsic issue in continual learning. The core of CLFD is the proposed Dual-branch Adaptive Aggregation Residual Network(DAARN).Two types of residual blocks are created in each block layer of DAARN: steady and dynamic blocks. The stability–plasticity dilemma is solved by assigning them with adaptive aggregation weights to balance stability and plasticity, and a bi-level optimization program is used to optimize adaptive aggregation weights and model parameters. In addition, a feature-level knowledge distillation loss function is proposed to further overcome catastrophic forgetting. CLFD is then applied to the fault diagnosis case with machine increments. Results demonstrate that CLFD outperforms other continual learning methods and has satisfactory robustness.展开更多
1|OVERVIEW.Machine learning(ML)has been increasingly used for tackling various diagnostic,therapeutic,and prognostic tasks owing to its capability to learn and reason without explicit programming[1].Most developed ML ...1|OVERVIEW.Machine learning(ML)has been increasingly used for tackling various diagnostic,therapeutic,and prognostic tasks owing to its capability to learn and reason without explicit programming[1].Most developed ML models have had their accuracy proven through internal validation using retrospective data.However,external validation using retrospective data,continual monitoring using prospective data,and randomized controlled trials(RCTs)using prospective data are important for the translation of ML models into real-world clinical practice[2].展开更多
Climate change poses significant challenges to agricultural management,particularly in adapting to extreme weather conditions that impact agricultural production.Existing works with traditional Reinforcement Learning(...Climate change poses significant challenges to agricultural management,particularly in adapting to extreme weather conditions that impact agricultural production.Existing works with traditional Reinforcement Learning(RL)methods often falter under such extreme conditions.To address this challenge,our study introduces a novel approach by integrating Continual Learning(CL)with RL to form Continual Reinforcement Learning(CRL),enhancing the adaptability of agricultural management strategies.Leveraging the Gym-DSSAT simulation environment,our research enables RL agents to learn optimal fertilization strategies based on variable weather conditions.By incorporating CL algorithms,such as Elastic Weight Consolidation(EWC),with established RL techniques like Deep Q-Networks(DQN),we developed a framework in which agents can learn and retain knowledge across diverse weather scenarios.The CRL approach was tested under climate variability to assess the robustness and adaptability of the induced policies,particularly under extreme weather events like severe droughts.Our results showed that continually learned policies exhibited superior adaptability and performance compared to optimal policies learned through the conventional RL methods,especially in challenging conditions of reduced rainfall and increased temperatures.This pioneering work,which combines CL with RL to generate adaptive policies for agricultural management,is expected to make significant advancements in precision agriculture in the era of climate change.展开更多
An iterative learning model predictive control (ILMPC) technique is applied to a class of continuous/batch processes. Such processes are characterized by the operations of batch processes generating periodic strong ...An iterative learning model predictive control (ILMPC) technique is applied to a class of continuous/batch processes. Such processes are characterized by the operations of batch processes generating periodic strong disturbances to the continuous processes and traditional regulatory controllers are unable to eliminate these periodic disturbances. ILMPC integrates the feature of iterative learning control (ILC) handling repetitive signal and the flexibility of model predictive control (MPC). By on-line monitoring the operation status of batch processes, an event-driven iterative learning algorithm for batch repetitive disturbances is initiated and the soft constraints are adjusted timely as the feasible region is away from the desired operating zone. The results of an industrial application show that the proposed ILMPC method is effective for a class of continuous/batch processes.展开更多
Continuous cooling transformation diagrams in synthetic weld heat-affected zone(SH-CCT diagrams)show the phase transition temperature and hardness at different cooling rates,which is an important basis for formulating...Continuous cooling transformation diagrams in synthetic weld heat-affected zone(SH-CCT diagrams)show the phase transition temperature and hardness at different cooling rates,which is an important basis for formulating the welding process or predicting the performance of welding heat-affected zone.However,the experimental determination of SH-CCT diagrams is a time-consuming and costly process,which does not conform to the development trend of new materials.In addition,the prediction of SHCCT diagrams using metallurgical models remains a challenge due to the complexity of alloying elements and welding processes.So,in this study,a hybrid machine learning model consisting of multilayer perceptron classifier,k-Nearest Neighbors and random forest is established to predict the phase transformation temperature and hardness of low alloy steel using chemical composition and cooling rate.Then the SH-CCT diagrams of 6 kinds of steels are calculated by the hybrid machine learning model.The results show that the accuracy of the classification model is up to 100%,the predicted values of the regression models are in good agreement with the experimental results,with high correlation coefficient and low error value.Moreover,the mathematical expressions of hardness in welding heat-affected zone of low alloy steel are calculated by symbolic regression,which can quantitatively express the relationship between alloy composition,cooling time and hardness.This study demonstrates the great potential of the material informatics in the field of welding technology.展开更多
Reinforcement Learning is a commonly used technique for learning tasks in robotics, however, traditional algorithms are unable to handle large amounts of data coming from the robot’s sensors, require long training ti...Reinforcement Learning is a commonly used technique for learning tasks in robotics, however, traditional algorithms are unable to handle large amounts of data coming from the robot’s sensors, require long training times, and use dis-crete actions. This work introduces TS-RRLCA, a two stage method to tackle these problems. In the first stage, low-level data coming from the robot’s sensors is transformed into a more natural, relational representation based on rooms, walls, corners, doors and obstacles, significantly reducing the state space. We use this representation along with Behavioural Cloning, i.e., traces provided by the user;to learn, in few iterations, a relational control policy with discrete actions which can be re-used in different environments. In the second stage, we use Locally Weighted Regression to transform the initial policy into a continuous actions policy. We tested our approach in simulation and with a real service robot on different environments for different navigation and following tasks. Results show how the policies can be used on different domains and perform smoother, faster and shorter paths than the original discrete actions policies.展开更多
The predominant method for smart phone accessing is confined to methods directing the authentication by means of Point-of-Entry that heavily depend on physiological biometrics like,fingerprint or face.Implicit continuou...The predominant method for smart phone accessing is confined to methods directing the authentication by means of Point-of-Entry that heavily depend on physiological biometrics like,fingerprint or face.Implicit continuous authentication initiating to be loftier to conventional authentication mechanisms by continuously confirming users’identities on continuing basis and mark the instant at which an illegitimate hacker grasps dominance of the session.However,divergent issues remain unaddressed.This research aims to investigate the power of Deep Reinforcement Learning technique to implicit continuous authentication for mobile devices using a method called,Gaussian Weighted Cauchy Kriging-based Continuous Czekanowski’s(GWCK-CC).First,a Gaussian Weighted Non-local Mean Filter Preprocessing model is applied for reducing the noise pre-sent in the raw input face images.Cauchy Kriging Regression function is employed to reduce the dimensionality.Finally,Continuous Czekanowski’s Clas-sification is utilized for proficient classification between the genuine user and attacker.By this way,the proposed GWCK-CC method achieves accurate authen-tication with minimum error rate and time.Experimental assessment of the pro-posed GWCK-CC method and existing methods are carried out with different factors by using UMDAA-02 Face Dataset.The results confirm that the proposed GWCK-CC method enhances authentication accuracy,by 9%,reduces the authen-tication time,and error rate by 44%,and 43%as compared to the existing methods.展开更多
This study proposed a measurement platform for continuous blood pressure estimation based on dual photoplethysmography(PPG)sensors and a deep learning(DL)that can be used for continuous and rapid measurement of blood ...This study proposed a measurement platform for continuous blood pressure estimation based on dual photoplethysmography(PPG)sensors and a deep learning(DL)that can be used for continuous and rapid measurement of blood pressure and analysis of cardiovascular-related indicators.The proposed platform measured the signal changes in PPG and converted them into physiological indicators,such as pulse transit time(PTT),pulse wave velocity(PWV),perfusion index(PI)and heart rate(HR);these indicators were then fed into the DL to calculate blood pressure.The hardware of the experiment comprised 2 PPG components(i.e.,Raspberry Pi 3 Model B and analog-todigital converter[MCP3008]),which were connected using a serial peripheral interface.The DL algorithm converted the stable dual PPG signals acquired from the strictly standardized experimental process into various physiological indicators as input parameters and finally obtained the systolic blood pressure(SBP),diastolic blood pressure(DBP)and mean arterial pressure(MAP).To increase the robustness of the DL model,this study input data of 100 Asian participants into the training database,including those with and without cardiovascular disease,each with a proportion of approximately 50%.The experimental results revealed that the mean absolute error and standard deviation of SBP was 0.17±0.46 mmHg.The mean absolute error and standard deviation of DBP was 0.27±0.52 mmHg.The mean absolute error and standard deviation of MAP was 0.16±0.40 mmHg.展开更多
Class-incremental learning studies the problem of continually learning new classes from data streams.But networks suffer from catastrophic forgetting problems,forgetting past knowledge when acquiring new knowledge.Amo...Class-incremental learning studies the problem of continually learning new classes from data streams.But networks suffer from catastrophic forgetting problems,forgetting past knowledge when acquiring new knowledge.Among different approaches,replay methods have shown exceptional promise for this challenge.But performance still baffles from two aspects:(i)data in imbalanced distribution and(ii)networks with semantic inconsistency.First,due to limited memory buffer,there exists imbalance between old and new classes.Direct optimisation would lead feature space skewed towards new classes,resulting in performance degradation on old classes.Second,existing methods normally leverage previous network to regularise the present network.However,the previous network is not trained on new classes,which means that these two networks are semantic inconsistent,leading to misleading guidance information.To address these two problems,we propose BCSD(BiaMix contrastive learning and memory similarity distillation).For imbalanced distribution,we design Biased MixUp,where mixed samples are in high weight from old classes and low weight from new classes.Thus,network learns to push decision boundaries towards new classes.We further leverage label information to construct contrastive learning in order to ensure discriminability.Meanwhile,for semantic inconsistency,we distill knowledge from the previous network by capturing the similarity of new classes in current tasks to old classes from the memory buffer and transfer that knowledge to the present network.Empirical results on various datasets demonstrate its effectiveness and efficiency.展开更多
The overall research in Reinforcement Learning (RL) concentrates on discrete sets of actions, but for certain real-world problems it is important to have methods which are able to find good strategies using actions dr...The overall research in Reinforcement Learning (RL) concentrates on discrete sets of actions, but for certain real-world problems it is important to have methods which are able to find good strategies using actions drawn from continuous sets. This paper describes a simple control task called direction finder and its known optimal solution for both discrete and continuous actions. It allows for comparison of RL solution methods based on their value functions. In order to solve the control task for continuous actions, a simple idea for generalising them by means of feature vectors is presented. The resulting algorithm is applied using different choices of feature calculations. For comparing their performance a simple measure is展开更多
基金supported in part by the National Natura Science Foundation of China(U2013602,61876181,51521003)the Nationa Key R&D Program of China(2020YFB13134)+2 种基金Shenzhen Science and Technology Research and Development Foundation(JCYJ20190813171009236)Beijing Nova Program of Science and Technology(Z191100001119043)the Youth Innovation Promotion Association,Chinese Academy of Sciences。
文摘Continual learning(CL)studies the problem of learning to accumulate knowledge over time from a stream of data.A crucial challenge is that neural networks suffer from performance degradation on previously seen data,known as catastrophic forgetting,due to allowing parameter sharing.In this work,we consider a more practical online class-incremental CL setting,where the model learns new samples in an online manner and may continuously experience new classes.Moreover,prior knowledge is unavailable during training and evaluation.Existing works usually explore sample usages from a single dimension,which ignores a lot of valuable supervisory information.To better tackle the setting,we propose a novel replay-based CL method,which leverages multi-level representations produced by the intermediate process of training samples for replay and strengthens supervision to consolidate previous knowledge.Specifically,besides the previous raw samples,we store the corresponding logits and features in the memory.Furthermore,to imitate the prediction of the past model,we construct extra constraints by leveraging multi-level information stored in the memory.With the same number of samples for replay,our method can use more past knowledge to prevent interference.We conduct extensive evaluations on several popular CL datasets,and experiments show that our method consistently outperforms state-of-the-art methods with various sizes of episodic memory.We further provide a detailed analysis of these results and demonstrate that our method is more viable in practical scenarios.
基金supported by the Gansu Provincial Natural Science Foundation(grant number 25JRRA074)the Gansu Provincial Key R&D Science and Technology Program(grant number 24YFGA060)the National Natural Science Foundation of China(grant number 62161019).
文摘Modern intelligent systems,such as autonomous vehicles and face recognition,must continuously adapt to new scenarios while preserving their ability to handle previously encountered situations.However,when neural networks learn new classes sequentially,they suffer from catastrophic forgetting—the tendency to lose knowledge of earlier classes.This challenge,which lies at the core of class-incremental learning,severely limits the deployment of continual learning systems in real-world applications with streaming data.Existing approaches,including rehearsalbased methods and knowledge distillation techniques,have attempted to address this issue but often struggle to effectively preserve decision boundaries and discriminative features under limited memory constraints.To overcome these limitations,we propose a support vector-guided framework for class-incremental learning.The framework integrates an enhanced feature extractor with a Support Vector Machine classifier,which generates boundary-critical support vectors to guide both replay and distillation.Building on this architecture,we design a joint feature retention strategy that combines boundary proximity with feature diversity,and a Support Vector Distillation Loss that enforces dual alignment in decision and semantic spaces.In addition,triple attention modules are incorporated into the feature extractor to enhance representation power.Extensive experiments on CIFAR-100 and Tiny-ImageNet demonstrate effective improvements.On CIFAR-100 and Tiny-ImageNet with 5 tasks,our method achieves 71.68%and 58.61%average accuracy,outperforming strong baselines by 3.34%and 2.05%.These advantages are consistently observed across different task splits,highlighting the robustness and generalization of the proposed approach.Beyond benchmark evaluations,the framework also shows potential in few-shot and resource-constrained applications such as edge computing and mobile robotics.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(RS-2025-00518960)in part by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(RS-2025-00563192).
文摘Mental-health risk detection seeks early signs of distress from social media posts and clinical transcripts to enable timely intervention before crises.When such risks go undetected,consequences can escalate to self-harm,long-term disability,reduced productivity,and significant societal and economic burden.Despite recent advances,detecting risk from online text remains challenging due to heterogeneous language,evolving semantics,and the sequential emergence of new datasets.Effective solutions must encode clinically meaningful cues,reason about causal relations,and adapt to new domains without forgetting prior knowledge.To address these challenges,this paper presents a Continual Neuro-Symbolic Graph Learning(CNSGL)framework that unifies symbolic reasoning,causal inference,and continual learning within a single architecture.Each post is represented as a symbolic graph linking clinically relevant tags to textual content,enriched with causal edges derived from directional Point-wise Mutual Information(PMI).A two-layer Graph Convolutional Network(GCN)encodes these graphs,and a Transformer-based attention pooler aggregates node embeddings while providing interpretable tag-level importances.Continual adaptation across datasets is achieved through the Multi-Head Freeze(MH-Freeze)strategy,which freezes a shared encoder and incrementally trains lightweight task-specific heads(small classifiers attached to the shared embedding).Experimental evaluations across six diverse mental-health datasets ranging from Reddit discourse to clinical interviews,demonstrate that MH-Freeze consistently outperforms existing continual-learning baselines in both discriminative accuracy and calibration reliability.Across six datasets,MH-Freeze achieves up to 0.925 accuracy and 0.923 F1-Score,with AUPRC≥0.934 and AUROC≥0.942,consistently surpassing all continual-learning baselines.The results confirm the framework’s ability to preserve prior knowledge,adapt to domain shifts,and maintain causal interpretability,establishing CNSGL as a promising step toward robust,explainable,and lifelong mental-health risk assessment.
基金Supported by National Natural Science Foundation of China(Grant No.52272440)Suzhou Science Foundation(Grant Nos.SYG202323,ZXL2022027).
文摘Continual learning fault diagnosis(CLFD)has gained growing interest in mechanical systems for its ability to accumulate and transfer knowledge in dynamic fault diagnosis scenarios.However,existing CLFD methods typically assume balanced task distributions,neglecting the long-tailed nature of real-world fault occurrences,where certain faults dominate while others are rare.Due to the long-tailed distribution among different me-chanical conditions,excessive attention has been focused on the dominant type,leading to performance de-gradation in rarer types.In this paper,decoupling incremental classifier and representation learning(DICRL)is proposed to address the dual challenges of catastrophic forgetting introduced by incremental tasks and the bias in long-tailed CLFD(LT-CLFD).The core innovation lies in the structural decoupling of incremental classifier learning and representation learning.An instance-balanced sampling strategy is employed to learn more dis-criminative deep representations from the exemplars selected by the herding algorithm and new data.Then,the previous classifiers are frozen to prevent damage to representation learning during backward propagation.Cosine normalization classifier with learnable weight scaling is trained using a class-balanced sampling strategy to enhance classification accuracy.Experimental results demonstrate that DICRL outperforms existing continual learning methods across multiple benchmarks,demonstrating superior performance and robustness in both LT-CLFD and conventional CLFD.DICRL effectively tackles both catastrophic forgetting and long-tailed distribution in CLFD,enabling more reliable fault diagnosis in industrial applications.
文摘The iterative continuation task(ICT)requires English as a foreign language(EFL)learners to read a segment and write a continuation that aligns with the preceding segment of an English novel with successive turns,offering exposure to diverse grammatical structures and opportunities for contextualized usage.Given the importance of integrating technology into second language(L2)writing and the critical role that grammar plays in L2 writing development,automated written corrective feedback provided by Grammarly has gained significant attention.This study investigates the impact of Grammarly on grammar learning strategies,grammar grit,and grammar competence among EFL college students engaged in ICT.This study employed a mixed-methods sequential exploratory design;56 participants were divided into an experimental group(n=28),receiving Grammarly feedback for ICT,and a control group(n=28),completing ICT without Grammarly feedback.Quantitative results revealed that both groups showed improvements in L2 grammar learning strategies,grit and competence.For the experimental group,significant differences were observed across all variables of L2 grammar learning strategies,grit,and competence between pre-and post-tests.For the control group,significant differences were only observed in the affective dimension of grammar learning strategies,Consistency of Interest(COI)of grammar grit,and grammar competence.However,the control group presented a significantly higher improvement in grammar competence.Qualitative analysis showed both positive and negative perceptions of Grammarly.The pedagogical implications of integrating Grammarly and ICT for L2 grammar development are discussed.
基金supported by the National Natural Science Foundation of China(62371465)Taishan Scholar Project of Shandong Province(ts201511020)。
文摘In wireless sensor networks,ensuring communication security via specific emitter identification(SEI)is crucial.However,existing SEI methods are limited to closed-set scenarios and lack the ability to detect unknown devices and perform classincremental training.This study proposes a class-incremental open-set SEI approach.The open-set SEI model calculates radiofrequency fingerprints(RFFs)prototypes for known signals and employs a self-attention mechanism to enhance their discriminability.Detection thresholds are set through Gaussian fitting for each class.For class-incremental learning,the algorithm freezes the parameters of the previously trained model to initialize the new model.It designs specific losses:the RFFs extraction distribution difference loss and the prototype transformation distribution difference loss,which force the new model to retain old knowledge while learning new knowledge.The training loss enables learning of new class RFFs.Experimental results demonstrate that the open-set SEI model achieves state-of-theart performance and strong noise robustness.Moreover,the class-incremental learning algorithm effectively enables the model to retain old device RFFs knowledge,acquire new device RFFs knowledge,and detect unknown devices simultaneously.
基金supported by National Key Research and Development Program of China(Grant No.2023YFB4604100)National Key Research and Development Program of China(Grant No.2022YFB3806104)+4 种基金Key Research and Development Program in Shaanxi Province(Grant No.2021LLRH-08-17)Young Elite Scientists Sponsorship Program by CAST(No.2023QNRC001)K C Wong Education Foundation of ChinaYouth Innovation Team of Shaanxi Universities of ChinaKey Research and Development Program of Shaanxi Province(Grant 2021LLRH-08-3.1).
文摘Ensuring the consistent mechanical performance of three-dimensional(3D)-printed continuous fiber-reinforced composites is a significant challenge in additive manufacturing.The current reliance on manual monitoring exacerbates this challenge by rendering the process vulnerable to environmental changes and unexpected factors,resulting in defects and inconsistent product quality,particularly in unmanned long-term operations or printing in extreme environments.To address these issues,we developed a process monitoring and closed-loop feedback control strategy for the 3D printing process.Real-time printing image data were captured and analyzed using a well-trained neural network model,and a real-time control module-enabled closed-loop feedback control of the flow rate was developed.The neural network model,which was based on image processing and artificial intelligence,enabled the recognition of flow rate values with an accuracy of 94.70%.The experimental results showed significant improvements in both the surface performance and mechanical properties of printed composites,with three to six times improvement in tensile strength and elastic modulus,demonstrating the effectiveness of the strategy.This study provides a generalized process monitoring and feedback control method for the 3D printing of continuous fiber-reinforced composites,and offers a potential solution for remote online monitoring and closed-loop adjustment in unmanned or extreme space environments.
基金supported by the National Natural Science Foundation of China(Grant Nos.62233008 and 51705247)the State Key Laboratory of Mechanics and Control for Aerospace Structures of Nanjing University of Aeronautics and Astronautics.
文摘Autonomous legged robots,capable of navigating uneven terrain,can perform a diverse array of tasks.However,designing locomotion controllers remains challenging.In particular,designing a controller based on durable and reliable proprioceptive sensors,is essential for achieving adaptability.Presently,the controller must either be manually designed for specific robots and tasks,or developed using machine-learning techniques,which require extensive training time and result in complex controllers.Inspired by animal locomotion,we propose a simple yet comprehensive closed-loop modular framework that utilizes minimal proprioceptive feedback(i.e.,the Coxa-Femur(CF)joint angle),enabling a quadruped robot to efficiently navigate unpredictable and uneven terrains,including the step and slope.The framework comprises a basic neural control network capable of rapidly learning optimized motor patterns,and a straightforward module for sensory feedback sharing and integration.In a series of experiments,we show that integrating sensory feedback into the base neural control network aids the robot in continually learning robust motor patterns on flat,step,and slope terrain,compared with the open-loop base framework.Sharing sensory feedback information across the four legs enables a quadruped robot to proactively navigate unpredictable steps with minimal interaction.Furthermore,the controller remains functional even in the absence of sensor signals.This control configuration was successfully transferred to a physical robot without any modifications.
基金Sponsored by National Natural Science Foundation of China(Grant No.12174101)Fundamental Research Funds for the Central Universities(Grant No.2022MS051).
文摘The analytic continuation serves as a crucial bridge between quantum Monte Carlo calculations in imaginary-time formalism,specifically the Green's functions,and physical measurements(the spectral functions)in real time.Various approaches have been developed to enhance the accuracy of analytic continuation,including the Padéapproximation,the maximum entropy method,and stochastic analytic continuation.In this study,we employ different deep learning techniques to investigate the analytic continuation for the quantum impurity model.A significant challenge in this context is that the sharp Abrikosov-Suhl resonance peak may be either underestimated or overestimated.We fit both the imaginary-time Green's function and the spectral function using Chebyshev polynomials in logarithmic coordinates.We utilize Full-Connected Networks(FCN),Convolutional Neural Networks(CNNs),and Residual Networks(ResNet)to address this issue.Our findings indicate that introducing noise during the training phase significantly improves the accuracy of the learning process.The typical absolute error achieved is less than 10-4.These investigations pave the way for machine learning to optimize the analytic continuation problem in many-body systems,thereby reducing the need for prior expertise in physics.
基金supported by the National Key R&D Program of China(No.2023YFB3308601)Sichuan Science and Technology Program(2024NSFJQ0035,2024NSFSC0004)the Talents by Sichuan provincial Party Committee Organization Department.
文摘In natural language processing(NLP),managing multiple downstream tasks through fine-tuning pre-trained models often requires maintaining separate task-specific models,leading to practical inefficiencies.To address this challenge,we introduce AdaptForever,a novel approach that enables continuous mastery of NLP tasks through the integration of elastic and mutual learning strategies with a stochastic expert mechanism.Our method freezes the pre-trained model weights while incorporating adapters enhanced with mutual learning capabilities,facilitating effective knowledge transfer from previous tasks to new ones.By combining Elastic Weight Consolidation(EWC)for knowledge preservation with specialized regularization terms,AdaptForever successfully maintains performance on earlier tasks while acquiring new capabilities.Experimental results demonstrate that AdaptForever achieves superior performance across a continuous sequence of NLP tasks compared to existing parameter-efficient methods,while effectively preventing catastrophic forgetting and enabling positive knowledge transfer between tasks.
基金supported by the National Natural Science Foundation of China(Nos.52272440,51875375)the China Postdoctoral Science Foundation Funded Project(No.2021M701503).
文摘As a data-driven approach, Deep Learning(DL)-based fault diagnosis methods need to collect the relatively comprehensive data on machine fault types to achieve satisfactory performance. A mechanical system may include multiple submachines in the real-world. During condition monitoring of a mechanical system, fault data are distributed in a continuous flow of constantly generated information and new faults will inevitably occur in unconsidered submachines, which are also called machine increments. Therefore, adequately collecting fault data in advance is difficult. Limited by the characteristics of DL, training existing models directly with new fault data of new submachines leads to catastrophic forgetting of old tasks, while the cost of collecting all known data to retrain the models is excessively high. DL-based fault diagnosis methods cannot learn continually and adaptively in dynamic environments. A new Continual Learning Fault Diagnosis method(CLFD) is proposed in this paper to solve a series of fault diagnosis tasks with machine increments. The stability–plasticity dilemma is an intrinsic issue in continual learning. The core of CLFD is the proposed Dual-branch Adaptive Aggregation Residual Network(DAARN).Two types of residual blocks are created in each block layer of DAARN: steady and dynamic blocks. The stability–plasticity dilemma is solved by assigning them with adaptive aggregation weights to balance stability and plasticity, and a bi-level optimization program is used to optimize adaptive aggregation weights and model parameters. In addition, a feature-level knowledge distillation loss function is proposed to further overcome catastrophic forgetting. CLFD is then applied to the fault diagnosis case with machine increments. Results demonstrate that CLFD outperforms other continual learning methods and has satisfactory robustness.
文摘1|OVERVIEW.Machine learning(ML)has been increasingly used for tackling various diagnostic,therapeutic,and prognostic tasks owing to its capability to learn and reason without explicit programming[1].Most developed ML models have had their accuracy proven through internal validation using retrospective data.However,external validation using retrospective data,continual monitoring using prospective data,and randomized controlled trials(RCTs)using prospective data are important for the translation of ML models into real-world clinical practice[2].
基金support from the University of Iowa OVPR Interdisciplinary Scholars Program and the US Department of Education(ED#P116S210005)for this study.Kishlay Jha’s work is supported in part by the US National Institute of Health(NIH)and National Science Foundation(NSF)under grants R01LM014012-01A1 and ITE-2333740.
文摘Climate change poses significant challenges to agricultural management,particularly in adapting to extreme weather conditions that impact agricultural production.Existing works with traditional Reinforcement Learning(RL)methods often falter under such extreme conditions.To address this challenge,our study introduces a novel approach by integrating Continual Learning(CL)with RL to form Continual Reinforcement Learning(CRL),enhancing the adaptability of agricultural management strategies.Leveraging the Gym-DSSAT simulation environment,our research enables RL agents to learn optimal fertilization strategies based on variable weather conditions.By incorporating CL algorithms,such as Elastic Weight Consolidation(EWC),with established RL techniques like Deep Q-Networks(DQN),we developed a framework in which agents can learn and retain knowledge across diverse weather scenarios.The CRL approach was tested under climate variability to assess the robustness and adaptability of the induced policies,particularly under extreme weather events like severe droughts.Our results showed that continually learned policies exhibited superior adaptability and performance compared to optimal policies learned through the conventional RL methods,especially in challenging conditions of reduced rainfall and increased temperatures.This pioneering work,which combines CL with RL to generate adaptive policies for agricultural management,is expected to make significant advancements in precision agriculture in the era of climate change.
基金Supported by the National Creative Research Groups Science Foundation of China (60721062) and the National High Technology Research and Development Program of China (2007AA04Z162).
文摘An iterative learning model predictive control (ILMPC) technique is applied to a class of continuous/batch processes. Such processes are characterized by the operations of batch processes generating periodic strong disturbances to the continuous processes and traditional regulatory controllers are unable to eliminate these periodic disturbances. ILMPC integrates the feature of iterative learning control (ILC) handling repetitive signal and the flexibility of model predictive control (MPC). By on-line monitoring the operation status of batch processes, an event-driven iterative learning algorithm for batch repetitive disturbances is initiated and the soft constraints are adjusted timely as the feasible region is away from the desired operating zone. The results of an industrial application show that the proposed ILMPC method is effective for a class of continuous/batch processes.
基金financial support from the National Key Research and Development Program of China[No.2016YFB0700501]the National Natural Science Foundation of China(No.51571020)。
文摘Continuous cooling transformation diagrams in synthetic weld heat-affected zone(SH-CCT diagrams)show the phase transition temperature and hardness at different cooling rates,which is an important basis for formulating the welding process or predicting the performance of welding heat-affected zone.However,the experimental determination of SH-CCT diagrams is a time-consuming and costly process,which does not conform to the development trend of new materials.In addition,the prediction of SHCCT diagrams using metallurgical models remains a challenge due to the complexity of alloying elements and welding processes.So,in this study,a hybrid machine learning model consisting of multilayer perceptron classifier,k-Nearest Neighbors and random forest is established to predict the phase transformation temperature and hardness of low alloy steel using chemical composition and cooling rate.Then the SH-CCT diagrams of 6 kinds of steels are calculated by the hybrid machine learning model.The results show that the accuracy of the classification model is up to 100%,the predicted values of the regression models are in good agreement with the experimental results,with high correlation coefficient and low error value.Moreover,the mathematical expressions of hardness in welding heat-affected zone of low alloy steel are calculated by symbolic regression,which can quantitatively express the relationship between alloy composition,cooling time and hardness.This study demonstrates the great potential of the material informatics in the field of welding technology.
文摘Reinforcement Learning is a commonly used technique for learning tasks in robotics, however, traditional algorithms are unable to handle large amounts of data coming from the robot’s sensors, require long training times, and use dis-crete actions. This work introduces TS-RRLCA, a two stage method to tackle these problems. In the first stage, low-level data coming from the robot’s sensors is transformed into a more natural, relational representation based on rooms, walls, corners, doors and obstacles, significantly reducing the state space. We use this representation along with Behavioural Cloning, i.e., traces provided by the user;to learn, in few iterations, a relational control policy with discrete actions which can be re-used in different environments. In the second stage, we use Locally Weighted Regression to transform the initial policy into a continuous actions policy. We tested our approach in simulation and with a real service robot on different environments for different navigation and following tasks. Results show how the policies can be used on different domains and perform smoother, faster and shorter paths than the original discrete actions policies.
文摘The predominant method for smart phone accessing is confined to methods directing the authentication by means of Point-of-Entry that heavily depend on physiological biometrics like,fingerprint or face.Implicit continuous authentication initiating to be loftier to conventional authentication mechanisms by continuously confirming users’identities on continuing basis and mark the instant at which an illegitimate hacker grasps dominance of the session.However,divergent issues remain unaddressed.This research aims to investigate the power of Deep Reinforcement Learning technique to implicit continuous authentication for mobile devices using a method called,Gaussian Weighted Cauchy Kriging-based Continuous Czekanowski’s(GWCK-CC).First,a Gaussian Weighted Non-local Mean Filter Preprocessing model is applied for reducing the noise pre-sent in the raw input face images.Cauchy Kriging Regression function is employed to reduce the dimensionality.Finally,Continuous Czekanowski’s Clas-sification is utilized for proficient classification between the genuine user and attacker.By this way,the proposed GWCK-CC method achieves accurate authen-tication with minimum error rate and time.Experimental assessment of the pro-posed GWCK-CC method and existing methods are carried out with different factors by using UMDAA-02 Face Dataset.The results confirm that the proposed GWCK-CC method enhances authentication accuracy,by 9%,reduces the authen-tication time,and error rate by 44%,and 43%as compared to the existing methods.
基金This study was supported in part by the Ministry of Science and Technology MOST 108-2221-E-150-022-MY3 and Taiwan Ocean University.
文摘This study proposed a measurement platform for continuous blood pressure estimation based on dual photoplethysmography(PPG)sensors and a deep learning(DL)that can be used for continuous and rapid measurement of blood pressure and analysis of cardiovascular-related indicators.The proposed platform measured the signal changes in PPG and converted them into physiological indicators,such as pulse transit time(PTT),pulse wave velocity(PWV),perfusion index(PI)and heart rate(HR);these indicators were then fed into the DL to calculate blood pressure.The hardware of the experiment comprised 2 PPG components(i.e.,Raspberry Pi 3 Model B and analog-todigital converter[MCP3008]),which were connected using a serial peripheral interface.The DL algorithm converted the stable dual PPG signals acquired from the strictly standardized experimental process into various physiological indicators as input parameters and finally obtained the systolic blood pressure(SBP),diastolic blood pressure(DBP)and mean arterial pressure(MAP).To increase the robustness of the DL model,this study input data of 100 Asian participants into the training database,including those with and without cardiovascular disease,each with a proportion of approximately 50%.The experimental results revealed that the mean absolute error and standard deviation of SBP was 0.17±0.46 mmHg.The mean absolute error and standard deviation of DBP was 0.27±0.52 mmHg.The mean absolute error and standard deviation of MAP was 0.16±0.40 mmHg.
基金supported by the National Natural Science Foundation of China(62176188).
文摘Class-incremental learning studies the problem of continually learning new classes from data streams.But networks suffer from catastrophic forgetting problems,forgetting past knowledge when acquiring new knowledge.Among different approaches,replay methods have shown exceptional promise for this challenge.But performance still baffles from two aspects:(i)data in imbalanced distribution and(ii)networks with semantic inconsistency.First,due to limited memory buffer,there exists imbalance between old and new classes.Direct optimisation would lead feature space skewed towards new classes,resulting in performance degradation on old classes.Second,existing methods normally leverage previous network to regularise the present network.However,the previous network is not trained on new classes,which means that these two networks are semantic inconsistent,leading to misleading guidance information.To address these two problems,we propose BCSD(BiaMix contrastive learning and memory similarity distillation).For imbalanced distribution,we design Biased MixUp,where mixed samples are in high weight from old classes and low weight from new classes.Thus,network learns to push decision boundaries towards new classes.We further leverage label information to construct contrastive learning in order to ensure discriminability.Meanwhile,for semantic inconsistency,we distill knowledge from the previous network by capturing the similarity of new classes in current tasks to old classes from the memory buffer and transfer that knowledge to the present network.Empirical results on various datasets demonstrate its effectiveness and efficiency.
文摘The overall research in Reinforcement Learning (RL) concentrates on discrete sets of actions, but for certain real-world problems it is important to have methods which are able to find good strategies using actions drawn from continuous sets. This paper describes a simple control task called direction finder and its known optimal solution for both discrete and continuous actions. It allows for comparison of RL solution methods based on their value functions. In order to solve the control task for continuous actions, a simple idea for generalising them by means of feature vectors is presented. The resulting algorithm is applied using different choices of feature calculations. For comparing their performance a simple measure is