Large language models(LLMs)have significantly advanced artificial intelligence(AI)by excelling in tasks such as understanding,generation,and reasoning across multiple modalities.Despite these achievements,LLMs have in...Large language models(LLMs)have significantly advanced artificial intelligence(AI)by excelling in tasks such as understanding,generation,and reasoning across multiple modalities.Despite these achievements,LLMs have inherent limitations including outdated information,hallucinations,inefficiency,lack of interpretability,and challenges in domain-specific accuracy.To address these issues,this survey explores three promising directions in the post-LLM era:knowledge empowerment,model collaboration,and model co-evolution.First,we examine methods of integrating external knowledge into LLMs to enhance factual accuracy,reasoning capabilities,and interpretability,including incorporating knowledge into training objectives,instruction tuning,retrieval-augmented inference,and knowledge prompting.Second,we discuss model collaboration strategies that leverage the complementary strengths of LLMs and smaller models to improve efficiency and domain-specific performance through techniques such as model merging,functional model collaboration,and knowledge injection.Third,we delve into model co-evolution,in which multiple models collaboratively evolve by sharing knowledge,parameters,and learning strategies to adapt to dynamic environments and tasks,thereby enhancing their adaptability and continual learning.We illustrate how the integration of these techniques advances AI capabilities in science,engineering,and society—particularly in hypothesis development,problem formulation,problem-solving,and interpretability across various domains.We conclude by outlining future pathways for further advancement and applications.展开更多
Inter-satellite link(ISL)scheduling is required by the BeiDou Navigation Satellite System(BDS)to guarantee the system ranging and communication performance.In the BDS,a great number of ISL scheduling instances must be...Inter-satellite link(ISL)scheduling is required by the BeiDou Navigation Satellite System(BDS)to guarantee the system ranging and communication performance.In the BDS,a great number of ISL scheduling instances must be addressed every day,which will certainly spend a lot of time via normal metaheuristics and hardly meet the quick-response requirements that often occur in real-world applications.To address the dual requirements of normal and quick-response ISL schedulings,a data-driven heuristic assisted memetic algorithm(DHMA)is proposed in this paper,which includes a high-performance memetic algorithm(MA)and a data-driven heuristic.In normal situations,the high-performance MA that hybridizes parallelism,competition,and evolution strategies is performed for high-quality ISL scheduling solutions over time.When in quick-response situations,the data-driven heuristic is performed to quickly schedule high-probability ISLs according to a prediction model,which is trained from the high-quality MA solutions.The main idea of the DHMA is to address normal and quick-response schedulings separately,while high-quality normal scheduling data are trained for quick-response use.In addition,this paper also presents an easy-to-understand ISL scheduling model and its NP-completeness.A seven-day experimental study with 10080 one-minute ISL scheduling instances shows the efficient performance of the DHMA in addressing the ISL scheduling in normal(in 84 hours)and quick-response(in 0.62 hour)situations,which can well meet the dual scheduling requirements in real-world BDS applications.展开更多
Microsatellite instability(MSI)is a key biomarker for cancer therapy and prognosis.Traditional experimental assays are laborious and time-consuming,and next-generation sequencingbased computational methods do not work...Microsatellite instability(MSI)is a key biomarker for cancer therapy and prognosis.Traditional experimental assays are laborious and time-consuming,and next-generation sequencingbased computational methods do not work on leukemia samples,paraffin-embedded samples,or patient-derived xenografts/organoids,due to the requirement of matched normal samples.Herein,we developed MSIsensor-pro,an open-source single sample MSI scoring method for research and clinical applications.MSIsensor-pro introduces a multinomial distribution model to quantify polymerase slippages for each tumor sample and a discriminative site selection method to enable MSI detection without matched normal samples.We demonstrate that MSIsensor-pro is an ultrafast,accurate,and robust MSI calling method.Using samples with various sequencing depths and tumor purities,MSIsensor-pro significantly outperformed the current leading methods in both accuracy and computational cost.MSIsensor-pro is available at https://github.com/xjtu-omics/msisensor-pro and free for non-commercial use,while a commercial license is provided upon request.展开更多
Accurate and credible identification of the drivers of algal growth is essential for sustainable utilization and scientific management of freshwater.In this study,we developed a deep learning-based Transformer model,n...Accurate and credible identification of the drivers of algal growth is essential for sustainable utilization and scientific management of freshwater.In this study,we developed a deep learning-based Transformer model,named Bloomformer-1,for end-to-end identification of the drivers of algal growth without the needing extensive a priori knowledge or prior experiments.The Middle Route of the South-to-North Water Diversion Project(MRP)was used as the study site to demonstrate that Bloomformer-1 exhibited more robust performance(with the highest R^(2),0.80 to 0.94,and the lowest RMSE,0.22–0.43μg/L)compared to four widely used traditional machine learning models,namely extra trees regression(ETR),gradient boosting regression tree(GBRT),support vector regression(SVR),and multiple linear regression(MLR).In addition,Bloomformer-1 had higher interpretability(including higher transferability and understandability)than the four traditional machine learning models,which meant that it was trustworthy and the results could be directly applied to real scenarios.Finally,it was determined that total phosphorus(TP)was the most important driver for the MRP,especially in Henan section of the canal,although total nitrogen(TN)had the highest effect on algal growth in the Hebei section.Based on these results,phosphorus loading controlling in the whole MRP was proposed as an algal control strategy.展开更多
This study provides an analysis of the implementation of FAIR Guidelines in selected non-Western geographies. The analysis was based on a systematic literature review to determine if the findability, accessibility, in...This study provides an analysis of the implementation of FAIR Guidelines in selected non-Western geographies. The analysis was based on a systematic literature review to determine if the findability, accessibility, interoperability, and reusability of data is seen as an issue, if the adoption of the FAIR Guidelines is seen as a solution, and if the climate is conducive to the implementation of the FAIR Guidelines. The results show that the FAIR Guidelines have been discussed in most of the countries studied, which have identified data sharing and the reusability of research data as an issue(e.g., Kazakhstan, Russia, countries in the Middle East), and partially introduced in others(e.g., Indonesia). In Indonesia, a FAIR equivalent system has been introduced, although certain functions need to be added for data to be entirely FAIR. In Japan, both FAIR equivalent systems and FAIR-based systems have been adopted and created, and the acceptance of FAIRbased systems is recommended by the Government of Japan. In a number of African countries, the FAIR Guidelines are in the process of being implemented and the implementation of FAIR is well supported. In conclusion, a window of opportunity for implementing the FAIR Guidelines is open in most of the countries studied, however, more awareness needs to be raised about the benefits of FAIR in Russia and Kazakhstan to place it firmly on the policy agenda.展开更多
With the prevailing COVID-19 pandemic, the lack of digitally-recorded and connected health data poses a challenge for analysing the situation. Virus outbreaks, such as the current pandemic, allow for the optimisation ...With the prevailing COVID-19 pandemic, the lack of digitally-recorded and connected health data poses a challenge for analysing the situation. Virus outbreaks, such as the current pandemic, allow for the optimisation and reuse of data, which can be beneficial in managing future outbreaks. However, there is a general lack of knowledge about the actual flow of information in health facilities, which is also the case in Uganda. In Uganda, where this case study was conducted, there is no comprehensive knowledge about what type of data is collected or how it is collected along the journey of a patient through a health facility. This study investigates information flows of clinical patient data in health facilities in Uganda. The study found that almost all health facilities in Uganda store patient information in paper files on shelves. Hospitals in Uganda are provided with paper tools, such as reporting forms, registers and manuals, in which district data is collected as aggregate data and submitted in the form of digital reports to the Ministry of Health Resource Center. These reporting forms are not digitised and, thus, not machine-actionable. Hence, it is not easy for health facilities, researchers, and others to find and access patient and research data. It is also not easy to reuse and connect this data with other digital health data worldwide, leading to the incorrect conclusion that there is less health data in Uganda. The a FAIR architecture has the potential to solve such problems and facilitate the transition from paper to digital records in the Uganda health system.展开更多
The digital health landscape in Uganda is plagued by problems with interoperability and sustainability, due to fragmentation and a lack of integrated digital health solutions. This can be partly attributed to the abse...The digital health landscape in Uganda is plagued by problems with interoperability and sustainability, due to fragmentation and a lack of integrated digital health solutions. This can be partly attributed to the absence of policies on the interoperability of data, as well as the fact that there is no common goal to make digital data and data infrastructure interoperable across the data ecosystem. The promulgation of the FAIR Guidelines in 2016 brought together various data stewards and stakeholders to adopt a common vision on data management and enable greater interoperability. This article explores the potential of enhancing digital health interoperability through FAIR by analysing the digital solutions piloted in Uganda and their sustainability. It looks at the factors that are currently hindering interoperability by examining existing digital health solutions in Uganda, such as the Digital Health Atlas Uganda(DHA-U) and Uganda Digital Health Dashboard(UDHD). The level of FAIRness of the two dashboards was determined using the FAIR Evaluation Services tool. Analysis was also carried out to discover the level of FAIRness of the digital health solutions within the dashboards and the most frequently used software applications and data standards by the different digital health interventions in Uganda.展开更多
Rapid and effective data sharing is necessary to control disease outbreaks,such as the current coronavirus pandemic.Despite the existence of data sharing agreements,data silos,lack of interoperable data infrastructure...Rapid and effective data sharing is necessary to control disease outbreaks,such as the current coronavirus pandemic.Despite the existence of data sharing agreements,data silos,lack of interoperable data infrastructures,and different institutional jurisdictions hinder data sharing and accessibility.To overcome these challenges,the Virus Outbreak Data Network(VODAN)-Africa initiative is championing an approach in which data never leaves the institution where it was generated,but,instead,algorithms can visit the data and query multiple datasets in an automated way.To make this possible,FAIR Data Points—distributed data repositories that host machine-actionable data and metadata that adhere to the FAIR Guidelines(that data should be Findable,Accessible,Interoperable and Reusable)—have been deployed in participating institutions using a dockerised bundle of tools called VODAN in a Box(Vi B).Vi B is a set of multiple FAIR-enabling and open-source services with a single goal:to support the gathering of World Health Organization(WHO)electronic case report forms(e CRFs)as FAIR data in a machine-actionable way,but without exposing or transferring the data outside the facility.Following the execution of a proof of concept,Vi B was deployed in Uganda and Leiden University.The proof of concept generated a first query which was implemented across two continents.A SWOT(strengths,weaknesses,opportunities and threats)analysis of the architecture was carried out and established the changes needed for specifications and requirements for the future development of the solution.展开更多
This article explores the global implementation of the FAIR Guiding Principles for scientific management and data stewardship,which provide that data should be findable,accessible,interoperable and reusable.The implem...This article explores the global implementation of the FAIR Guiding Principles for scientific management and data stewardship,which provide that data should be findable,accessible,interoperable and reusable.The implementation of these principles is designed to lead to the stewardship of data as FAIR digital objects and the establishment of the Internet of FAIR Data and Services(IFDS).If implementation reaches a tipping point,IFDS has the potential to revolutionize how data is managed by making machine and human readable data discoverable for reuse.Accordingly,this article examines the expansion of the implementation of FAIR Guiding Principles,especially how and in which geographies(locations)and areas(topic domains)implementation is taking place.A literature review of academic articles published between 2016 and 2019 on the use of FAIR Guiding Principles is presented.The investigation also includes an analysis of the domains in the IFDS Implementation Networks(INs).Its uptake has been mainly in the Western hemisphere.The investigation found that implementation of FAIR Guiding Principles has taken firm hold in the domain of bio and natural sciences.To achieve a tipping point for FAIR implementation,it is now time to ensure the inclusion of non-European ascendants and of other scientific domains.Apart from equal opportunity and genuine global partnership issues,a permanent European bias poses challenges with regard to the representativeness and validity of data and could limit the potential of IFDS to reach across continental boundaries.The article concludes that,despite efforts to be inclusive,acceptance of the FAIR Guiding Principles and IFDS in different scientific communities is limited and there is a need to act now to prevent dampening of the momentum in the development and implementation of the IFDS.It is further concluded that policy entrepreneurs and the GO FAIR INs may contribute to making the FAIR Guiding Principles more flexible in including different research epistemologies,especially through its GO CHANGE pillar.展开更多
This article investigates expansion of the Internet of FAIR Data and Services(IFDS)to Africa,through the three GO FAIR pillars:GO CHANGE,GO BUILD and GO TRAIN.Introduction of the IFDS in Africa has a focus on digital ...This article investigates expansion of the Internet of FAIR Data and Services(IFDS)to Africa,through the three GO FAIR pillars:GO CHANGE,GO BUILD and GO TRAIN.Introduction of the IFDS in Africa has a focus on digital health.Two examples of introducing FAIR are compared:a regional initiative for digital health by governments in the East Africa Community(EAC)and an initiative by a local health provider(Solidarmed)in collaboration with Great Zimbabwe University in Zimbabwe.The obstacles to introducing FAIR are identified as underrepresentation of data from Africa in IFDS at this moment,the lack of explicit recognition of situational context of research in FAIR at present and the lack of acceptability of FAIR as a foreign and European invention which affects acceptance.It is envisaged that FAIR has an important contribution to solve fragmentation in digital health in Africa,and that any obstacles concerning African participation,context relevance and acceptance of IFDS need to be removed.This will require involvement of African researchers and ICT-developers so that it is driven by local ownership.Assessment of ecological validity in FAIR principles would ensure that the context specificity of research is reflected in the FAIR principles.This will help enhance the acceptance of the FAIR Guidelines in Africa and will help strengthen digital health research and services.展开更多
Extensive studies on selecting recombination operators adaptively,namely,adaptive operator selection(AOS),during the search process of an evolutionary algorithm(EA),have shown that AOS is promising for improving EA...Extensive studies on selecting recombination operators adaptively,namely,adaptive operator selection(AOS),during the search process of an evolutionary algorithm(EA),have shown that AOS is promising for improving EA's performance.A variety of heuristic mechanisms for AOS have been proposed in recent decades,which usually contain two main components:the feature extraction and the policy setting.The feature extraction refers to as extracting relevant features from the information collected during the search process.The policy setting means to set a strategy(or policy)on how to select an operator from a pool of operators based on the extracted feature.Both components are designed by hand in existing studies,which may not be efficient for adapting optimization problems.In this paper,a generalized framework is proposed for learning the components of AOS for one of the main streams of EAs,namely,differential evolution(DE).In the framework,the feature extraction is parameterized as a deep neural network(DNN),while a Dirichlet distribution is considered to be the policy.A reinforcement learning method,named policy gradient,is used to train the DNN.As case studies,the proposed framework is applied to two DEs including the classic DE and a recently-proposed DE,which result in two new algorithms named PG-DE and PG-MPEDE,respectively.Experiments on the Congress of Evolutionary Computation(CEC)2018 test suite show that the proposed new algorithms perform significantly better than their counterparts.Finally,we prove theoretically that the considered classic methods are the special cases of the proposed framework.展开更多
The objective of this study was to assess the regulatory framework for health data in Indonesia in order to understand the policy context and explore the possibility of expanding the adoption and implementation of the...The objective of this study was to assess the regulatory framework for health data in Indonesia in order to understand the policy context and explore the possibility of expanding the adoption and implementation of the FAIR Guidelines,which state that data should be Findable,Accessible,Interoperable and Reusable(FAIR),in Indonesia.Although the FAIR Guidelines were not explicitly mentioned in any of the policy documents relevant to the Indonesian digital health sector,six out of the eight documents analysed contained FAIR Equivalent principles.In particular,Indonesia’s Population Identification Number(NIK)has the potential,as a unique identifier,to support the integration and interoperability(findability)of data,which is crucial to all other aspects of the FAIR Guidelines.There is also a plan to build standards and protocols into the implementation of information systems in each ministry and government agency to improve data accessibility(accessibility),the integration of the various information systems is planned/ongoing(interoperability),and the need for a standardised arrangement for health information systems related to health data following the community standard is recognised(reusability).The documents at the core of Indonesia’s digital health/e Health policy have the highest FAIR Equivalency Score(FE-Score),showing some degree of alignment between the Indonesian digital health implementation vision and the FAIR Guidelines.This indicates that Indonesia’s digital health sector is open to using the FAIR Guidelines.展开更多
The FAIR principles have been widely cited,endorsed and adopted by a broad range of stakeholders since their publication in 2016.By intention,the 15 FAIR guiding principles do not dictate specific technological implem...The FAIR principles have been widely cited,endorsed and adopted by a broad range of stakeholders since their publication in 2016.By intention,the 15 FAIR guiding principles do not dictate specific technological implementations,but provide guidance for improving Findability,Accessibility,Interoperability and Reusability of digital resources.This has likely contributed to the broad adoption of the FAIR principles,because individual stakeholder communities can implement their own FAIR solutions.However,it has also resulted in inconsistent interpretations that carry the risk of leading to incompatible implementations.Thus,while the FAIR principles are formulated on a high level and may be interpreted and implemented in different ways,for true interoperability we need to support convergence in implementation choices that are widely accessible and(re)-usable.We introduce the concept of FAIR implementation considerations to assist accelerated global participation and convergence towards accessible,robust,widespread and consistent FAIR implementations.Any self-identified stakeholder community may either choose to reuse solutions from existing implementations,or when they spot a gap,accept the challenge to create the needed solution,which,ideally,can be used again by other communities in the future.Here,we provide interpretations and implementation considerations(choices and challenges)for each FAIR principle.展开更多
Complex structural variants(CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.However,detecting the compounded mutational...Complex structural variants(CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.However,detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy.As a result,there has been limited progress for CSV discovery compared with simple structural variants.Here,we systematically analyzed the multi-breakpoint connection feature of CSVs,and proposed Mako,utilizing a bottom-up guided model-free strategy,to detect CSVs from paired-end short-read sequencing.Specifically,we implemented a graph-based pattern growth approach,where the graph depicts potential breakpoint connections,and pattern growth enables CSV detection without pre-defined models.Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms.Notably,validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%,where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp,respectively.Moreover,the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types,including two novel types of adjacent segment swap and tandem dispersed duplication.Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs.Mako is publicly available at https://github.com/xjtu-omics/Mako.展开更多
Law enforcement agencies have a restricted area in which their powers apply,which is called their jurisdiction.These restrictions also apply to the Internet.However,on the Internet,the physical borders of the jurisdic...Law enforcement agencies have a restricted area in which their powers apply,which is called their jurisdiction.These restrictions also apply to the Internet.However,on the Internet,the physical borders of the jurisdiction,typically country borders,are hard to discover.In our case,it is hard to establish whether someone involved in criminal online behavior is indeed a Dutch citizen.We propose a way to overcome the arduous task of manually investigating whether a user on an Internet forum is Dutch or not.More precisely,we aim to detect that a given English text is written by a Dutch native author.To develop a detector,we follow a machine learning approach.Therefore,we need to prepare a specific training corpus.To obtain a corpus that is representative for online forums,we collected a large amount of English forum posts from Dutch and non-Dutch authors on Reddit.To learn a detection model,we used a bag-of-words representation to capture potential misspellings,grammatical errors or unusual turns of phrases that are characteristic of the mother tongue of the authors.For this learning task,we compare the linear support vector machine and regularized logistic regression using the appropriate performance metrics f1 score,precision,and average precision.Our results show logistic regression with frequency-based feature selection performs best at predicting Dutch natives.Further study should be directed to the general applicability of the results that is to find out if the developed models are applicable to other forums with comparable high performance.展开更多
基金supported in part by National Natural Science Foundation of China(62441605)。
文摘Large language models(LLMs)have significantly advanced artificial intelligence(AI)by excelling in tasks such as understanding,generation,and reasoning across multiple modalities.Despite these achievements,LLMs have inherent limitations including outdated information,hallucinations,inefficiency,lack of interpretability,and challenges in domain-specific accuracy.To address these issues,this survey explores three promising directions in the post-LLM era:knowledge empowerment,model collaboration,and model co-evolution.First,we examine methods of integrating external knowledge into LLMs to enhance factual accuracy,reasoning capabilities,and interpretability,including incorporating knowledge into training objectives,instruction tuning,retrieval-augmented inference,and knowledge prompting.Second,we discuss model collaboration strategies that leverage the complementary strengths of LLMs and smaller models to improve efficiency and domain-specific performance through techniques such as model merging,functional model collaboration,and knowledge injection.Third,we delve into model co-evolution,in which multiple models collaboratively evolve by sharing knowledge,parameters,and learning strategies to adapt to dynamic environments and tasks,thereby enhancing their adaptability and continual learning.We illustrate how the integration of these techniques advances AI capabilities in science,engineering,and society—particularly in hypothesis development,problem formulation,problem-solving,and interpretability across various domains.We conclude by outlining future pathways for further advancement and applications.
基金supported by the National Natural Science Foundation of China(61773120)the National Natural Science Fund for Distinguished Young Scholars of China(61525304)+2 种基金the Foundation for the Author of National Excellent Doctoral Dissertation of China(2014-92)the Hunan Postgraduate Research Innovation Project(CX2018B022)the China Scholarship Council-Leiden University Scholarship。
文摘Inter-satellite link(ISL)scheduling is required by the BeiDou Navigation Satellite System(BDS)to guarantee the system ranging and communication performance.In the BDS,a great number of ISL scheduling instances must be addressed every day,which will certainly spend a lot of time via normal metaheuristics and hardly meet the quick-response requirements that often occur in real-world applications.To address the dual requirements of normal and quick-response ISL schedulings,a data-driven heuristic assisted memetic algorithm(DHMA)is proposed in this paper,which includes a high-performance memetic algorithm(MA)and a data-driven heuristic.In normal situations,the high-performance MA that hybridizes parallelism,competition,and evolution strategies is performed for high-quality ISL scheduling solutions over time.When in quick-response situations,the data-driven heuristic is performed to quickly schedule high-probability ISLs according to a prediction model,which is trained from the high-quality MA solutions.The main idea of the DHMA is to address normal and quick-response schedulings separately,while high-quality normal scheduling data are trained for quick-response use.In addition,this paper also presents an easy-to-understand ISL scheduling model and its NP-completeness.A seven-day experimental study with 10080 one-minute ISL scheduling instances shows the efficient performance of the DHMA in addressing the ISL scheduling in normal(in 84 hours)and quick-response(in 0.62 hour)situations,which can well meet the dual scheduling requirements in real-world BDS applications.
基金supported by the National Key R&D Program of China(Grant Nos.2018YFC0910400 and 2017YFC0907500)the National Natural Science Foundation of China(Grant Nos.31671372,61702406,31701739,and 31970317)+2 种基金the National Science and Technology Major Project of China(Grant No.2018ZX10302205)the‘‘World-Class Universities and the Characteristic Development Guidance Funds for the Central Universities”the General Financial Grant from the China Postdoctoral Science Foundation(Grant Nos.2017M623178 and 2017M623188)
文摘Microsatellite instability(MSI)is a key biomarker for cancer therapy and prognosis.Traditional experimental assays are laborious and time-consuming,and next-generation sequencingbased computational methods do not work on leukemia samples,paraffin-embedded samples,or patient-derived xenografts/organoids,due to the requirement of matched normal samples.Herein,we developed MSIsensor-pro,an open-source single sample MSI scoring method for research and clinical applications.MSIsensor-pro introduces a multinomial distribution model to quantify polymerase slippages for each tumor sample and a discriminative site selection method to enable MSI detection without matched normal samples.We demonstrate that MSIsensor-pro is an ultrafast,accurate,and robust MSI calling method.Using samples with various sequencing depths and tumor purities,MSIsensor-pro significantly outperformed the current leading methods in both accuracy and computational cost.MSIsensor-pro is available at https://github.com/xjtu-omics/msisensor-pro and free for non-commercial use,while a commercial license is provided upon request.
基金This research was Jointly funded by National Key R&D plan(No.2021YFC3200900)National Natural Science Foundation of China(No.31971477).
文摘Accurate and credible identification of the drivers of algal growth is essential for sustainable utilization and scientific management of freshwater.In this study,we developed a deep learning-based Transformer model,named Bloomformer-1,for end-to-end identification of the drivers of algal growth without the needing extensive a priori knowledge or prior experiments.The Middle Route of the South-to-North Water Diversion Project(MRP)was used as the study site to demonstrate that Bloomformer-1 exhibited more robust performance(with the highest R^(2),0.80 to 0.94,and the lowest RMSE,0.22–0.43μg/L)compared to four widely used traditional machine learning models,namely extra trees regression(ETR),gradient boosting regression tree(GBRT),support vector regression(SVR),and multiple linear regression(MLR).In addition,Bloomformer-1 had higher interpretability(including higher transferability and understandability)than the four traditional machine learning models,which meant that it was trustworthy and the results could be directly applied to real scenarios.Finally,it was determined that total phosphorus(TP)was the most important driver for the MRP,especially in Henan section of the canal,although total nitrogen(TN)had the highest effect on algal growth in the Hebei section.Based on these results,phosphorus loading controlling in the whole MRP was proposed as an algal control strategy.
基金VODAN-Africathe Philips Foundation+2 种基金the Dutch Development Bank FMOCORDAIDthe GO FAIR Foundation for supporting this research
文摘This study provides an analysis of the implementation of FAIR Guidelines in selected non-Western geographies. The analysis was based on a systematic literature review to determine if the findability, accessibility, interoperability, and reusability of data is seen as an issue, if the adoption of the FAIR Guidelines is seen as a solution, and if the climate is conducive to the implementation of the FAIR Guidelines. The results show that the FAIR Guidelines have been discussed in most of the countries studied, which have identified data sharing and the reusability of research data as an issue(e.g., Kazakhstan, Russia, countries in the Middle East), and partially introduced in others(e.g., Indonesia). In Indonesia, a FAIR equivalent system has been introduced, although certain functions need to be added for data to be entirely FAIR. In Japan, both FAIR equivalent systems and FAIR-based systems have been adopted and created, and the acceptance of FAIRbased systems is recommended by the Government of Japan. In a number of African countries, the FAIR Guidelines are in the process of being implemented and the implementation of FAIR is well supported. In conclusion, a window of opportunity for implementing the FAIR Guidelines is open in most of the countries studied, however, more awareness needs to be raised about the benefits of FAIR in Russia and Kazakhstan to place it firmly on the policy agenda.
基金VODAN-Africathe Philips Foundation+2 种基金the Dutch Development Bank FMOCORDAIDthe GO FAIR Foundation for supporting this research
文摘With the prevailing COVID-19 pandemic, the lack of digitally-recorded and connected health data poses a challenge for analysing the situation. Virus outbreaks, such as the current pandemic, allow for the optimisation and reuse of data, which can be beneficial in managing future outbreaks. However, there is a general lack of knowledge about the actual flow of information in health facilities, which is also the case in Uganda. In Uganda, where this case study was conducted, there is no comprehensive knowledge about what type of data is collected or how it is collected along the journey of a patient through a health facility. This study investigates information flows of clinical patient data in health facilities in Uganda. The study found that almost all health facilities in Uganda store patient information in paper files on shelves. Hospitals in Uganda are provided with paper tools, such as reporting forms, registers and manuals, in which district data is collected as aggregate data and submitted in the form of digital reports to the Ministry of Health Resource Center. These reporting forms are not digitised and, thus, not machine-actionable. Hence, it is not easy for health facilities, researchers, and others to find and access patient and research data. It is also not easy to reuse and connect this data with other digital health data worldwide, leading to the incorrect conclusion that there is less health data in Uganda. The a FAIR architecture has the potential to solve such problems and facilitate the transition from paper to digital records in the Uganda health system.
基金VODAN-Africathe Philips Foundation+2 种基金the Dutch Development Bank FMOCORDAIDthe GO FAIR Foundation for supporting this research
文摘The digital health landscape in Uganda is plagued by problems with interoperability and sustainability, due to fragmentation and a lack of integrated digital health solutions. This can be partly attributed to the absence of policies on the interoperability of data, as well as the fact that there is no common goal to make digital data and data infrastructure interoperable across the data ecosystem. The promulgation of the FAIR Guidelines in 2016 brought together various data stewards and stakeholders to adopt a common vision on data management and enable greater interoperability. This article explores the potential of enhancing digital health interoperability through FAIR by analysing the digital solutions piloted in Uganda and their sustainability. It looks at the factors that are currently hindering interoperability by examining existing digital health solutions in Uganda, such as the Digital Health Atlas Uganda(DHA-U) and Uganda Digital Health Dashboard(UDHD). The level of FAIRness of the two dashboards was determined using the FAIR Evaluation Services tool. Analysis was also carried out to discover the level of FAIRness of the digital health solutions within the dashboards and the most frequently used software applications and data standards by the different digital health interventions in Uganda.
基金VODAN-Africathe Philips Foundation+2 种基金the Dutch Development Bank FMOCORDAIDthe GO FAIR Foundation for supporting this research
文摘Rapid and effective data sharing is necessary to control disease outbreaks,such as the current coronavirus pandemic.Despite the existence of data sharing agreements,data silos,lack of interoperable data infrastructures,and different institutional jurisdictions hinder data sharing and accessibility.To overcome these challenges,the Virus Outbreak Data Network(VODAN)-Africa initiative is championing an approach in which data never leaves the institution where it was generated,but,instead,algorithms can visit the data and query multiple datasets in an automated way.To make this possible,FAIR Data Points—distributed data repositories that host machine-actionable data and metadata that adhere to the FAIR Guidelines(that data should be Findable,Accessible,Interoperable and Reusable)—have been deployed in participating institutions using a dockerised bundle of tools called VODAN in a Box(Vi B).Vi B is a set of multiple FAIR-enabling and open-source services with a single goal:to support the gathering of World Health Organization(WHO)electronic case report forms(e CRFs)as FAIR data in a machine-actionable way,but without exposing or transferring the data outside the facility.Following the execution of a proof of concept,Vi B was deployed in Uganda and Leiden University.The proof of concept generated a first query which was implemented across two continents.A SWOT(strengths,weaknesses,opportunities and threats)analysis of the architecture was carried out and established the changes needed for specifications and requirements for the future development of the solution.
文摘This article explores the global implementation of the FAIR Guiding Principles for scientific management and data stewardship,which provide that data should be findable,accessible,interoperable and reusable.The implementation of these principles is designed to lead to the stewardship of data as FAIR digital objects and the establishment of the Internet of FAIR Data and Services(IFDS).If implementation reaches a tipping point,IFDS has the potential to revolutionize how data is managed by making machine and human readable data discoverable for reuse.Accordingly,this article examines the expansion of the implementation of FAIR Guiding Principles,especially how and in which geographies(locations)and areas(topic domains)implementation is taking place.A literature review of academic articles published between 2016 and 2019 on the use of FAIR Guiding Principles is presented.The investigation also includes an analysis of the domains in the IFDS Implementation Networks(INs).Its uptake has been mainly in the Western hemisphere.The investigation found that implementation of FAIR Guiding Principles has taken firm hold in the domain of bio and natural sciences.To achieve a tipping point for FAIR implementation,it is now time to ensure the inclusion of non-European ascendants and of other scientific domains.Apart from equal opportunity and genuine global partnership issues,a permanent European bias poses challenges with regard to the representativeness and validity of data and could limit the potential of IFDS to reach across continental boundaries.The article concludes that,despite efforts to be inclusive,acceptance of the FAIR Guiding Principles and IFDS in different scientific communities is limited and there is a need to act now to prevent dampening of the momentum in the development and implementation of the IFDS.It is further concluded that policy entrepreneurs and the GO FAIR INs may contribute to making the FAIR Guiding Principles more flexible in including different research epistemologies,especially through its GO CHANGE pillar.
文摘This article investigates expansion of the Internet of FAIR Data and Services(IFDS)to Africa,through the three GO FAIR pillars:GO CHANGE,GO BUILD and GO TRAIN.Introduction of the IFDS in Africa has a focus on digital health.Two examples of introducing FAIR are compared:a regional initiative for digital health by governments in the East Africa Community(EAC)and an initiative by a local health provider(Solidarmed)in collaboration with Great Zimbabwe University in Zimbabwe.The obstacles to introducing FAIR are identified as underrepresentation of data from Africa in IFDS at this moment,the lack of explicit recognition of situational context of research in FAIR at present and the lack of acceptability of FAIR as a foreign and European invention which affects acceptance.It is envisaged that FAIR has an important contribution to solve fragmentation in digital health in Africa,and that any obstacles concerning African participation,context relevance and acceptance of IFDS need to be removed.This will require involvement of African researchers and ICT-developers so that it is driven by local ownership.Assessment of ecological validity in FAIR principles would ensure that the context specificity of research is reflected in the FAIR principles.This will help enhance the acceptance of the FAIR Guidelines in Africa and will help strengthen digital health research and services.
基金supported by National Natural Science Foundation of China(Grant No.62076197)Key Research and Development Project of Shaanxi Province(Grant No.2022GXLH-01-15)。
文摘Extensive studies on selecting recombination operators adaptively,namely,adaptive operator selection(AOS),during the search process of an evolutionary algorithm(EA),have shown that AOS is promising for improving EA's performance.A variety of heuristic mechanisms for AOS have been proposed in recent decades,which usually contain two main components:the feature extraction and the policy setting.The feature extraction refers to as extracting relevant features from the information collected during the search process.The policy setting means to set a strategy(or policy)on how to select an operator from a pool of operators based on the extracted feature.Both components are designed by hand in existing studies,which may not be efficient for adapting optimization problems.In this paper,a generalized framework is proposed for learning the components of AOS for one of the main streams of EAs,namely,differential evolution(DE).In the framework,the feature extraction is parameterized as a deep neural network(DNN),while a Dirichlet distribution is considered to be the policy.A reinforcement learning method,named policy gradient,is used to train the DNN.As case studies,the proposed framework is applied to two DEs including the classic DE and a recently-proposed DE,which result in two new algorithms named PG-DE and PG-MPEDE,respectively.Experiments on the Congress of Evolutionary Computation(CEC)2018 test suite show that the proposed new algorithms perform significantly better than their counterparts.Finally,we prove theoretically that the considered classic methods are the special cases of the proposed framework.
基金VODAN-Africathe Philips Foundation+2 种基金the Dutch Development Bank FMOCORDAIDthe GO FAIR Foundation for supporting this research
文摘The objective of this study was to assess the regulatory framework for health data in Indonesia in order to understand the policy context and explore the possibility of expanding the adoption and implementation of the FAIR Guidelines,which state that data should be Findable,Accessible,Interoperable and Reusable(FAIR),in Indonesia.Although the FAIR Guidelines were not explicitly mentioned in any of the policy documents relevant to the Indonesian digital health sector,six out of the eight documents analysed contained FAIR Equivalent principles.In particular,Indonesia’s Population Identification Number(NIK)has the potential,as a unique identifier,to support the integration and interoperability(findability)of data,which is crucial to all other aspects of the FAIR Guidelines.There is also a plan to build standards and protocols into the implementation of information systems in each ministry and government agency to improve data accessibility(accessibility),the integration of the various information systems is planned/ongoing(interoperability),and the need for a standardised arrangement for health information systems related to health data following the community standard is recognised(reusability).The documents at the core of Indonesia’s digital health/e Health policy have the highest FAIR Equivalency Score(FE-Score),showing some degree of alignment between the Indonesian digital health implementation vision and the FAIR Guidelines.This indicates that Indonesia’s digital health sector is open to using the FAIR Guidelines.
基金The work of A.Jacobsen,C.Evelo,M.Thompson,R.Cornet,R.Kaliyaperuma and M.Roos is supported by funding from the European Union’s Horizon 2020 research and innovation program under the EJP RD COFUND-EJP N°825575.The work of A.Jacobsen,C.Evelo,C.Goble,M.Thompson,N.Juty,R.Hooft,M.Roos,S-A.Sansone,P.McQuilton,P.Rocca-Serra and D.Batista is supported by funding from ELIXIR EXCELERATE,H2020 grant agreement number 676559.R.Hooft was further funded by NL NWO NRGWI.obrug.2018.009.N.Juty and C.Goble were funded by CORBEL(H2020 grant agreement 654248)N.Juty,C.Goble,S-A.Sansone,P.McQuilton,P.Rocca-Serra and D.Batista were funded by FAIRplus(IMI grant agreement 802750)+13 种基金N.Juty,C.Goble,M.Thompson,M.Roos,S-A.Sansone,P.McQuilton,P.Rocca-Serra and D.Batista were funded by EOSClife H2020-EU(grant agreement number 824087)C.Goble was funded by DMMCore(BBSRC BB/M013189/)M.Thompson,M.Roos received funding from NWO(VWData 400.17.605)S-A.Sansone,P.McQuilton,P.Rocca-Serra and D.Batista have been funded by grants awarded to S-A.Sansone from the UK BBSRC and Research Councils(BB/L024101/1,BB/L005069/1)EU(H2020-EU 634107H2020-EU 654241,IMI(IMPRiND 116060)NIH Data Common Fund,and from the Wellcome Trust(ISA-InterMine 212930/Z/18/ZFAIRsharing 208381/A/17/Z)The work of A.Waagmeester has been funded by grant award number GM089820 from the National Institutes of Health.M.Kersloot was funded by the European Regional Development Fund(KVW-00163).The work of N.Meyers was funded by the National Science Foundation(OAC 1839030)The work of M.D.Wilkinson is funded by Isaac Peral/Marie Curie cofund with the Universidad Politecnica de Madrid and the Ministerio de Economia y Competitividad grant number TIN2014-55993-RMThe work of B.Magagna,E.Schultes,L.da Silva Santos and K.Jeffery is funded by the H2020-EU 824068The work of B.Magagna,E.Schultes and L.da Silva Santos is funded by the GO FAIR ISCO grant of the Dutch Ministry of Science and CultureThe work of G.Guizzardi is supported by the OCEAN Project(FUB).M.Courtot received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No.802750.R.Cornet was further funded by FAIR4Health(H2020-EU grant agreement number 824666)K.Jeffery received funding from EPOS-IP H2020-EU agreement 676564 and ENVRIplus H2020-EU agreement 654182.
文摘The FAIR principles have been widely cited,endorsed and adopted by a broad range of stakeholders since their publication in 2016.By intention,the 15 FAIR guiding principles do not dictate specific technological implementations,but provide guidance for improving Findability,Accessibility,Interoperability and Reusability of digital resources.This has likely contributed to the broad adoption of the FAIR principles,because individual stakeholder communities can implement their own FAIR solutions.However,it has also resulted in inconsistent interpretations that carry the risk of leading to incompatible implementations.Thus,while the FAIR principles are formulated on a high level and may be interpreted and implemented in different ways,for true interoperability we need to support convergence in implementation choices that are widely accessible and(re)-usable.We introduce the concept of FAIR implementation considerations to assist accelerated global participation and convergence towards accessible,robust,widespread and consistent FAIR implementations.Any self-identified stakeholder community may either choose to reuse solutions from existing implementations,or when they spot a gap,accept the challenge to create the needed solution,which,ideally,can be used again by other communities in the future.Here,we provide interpretations and implementation considerations(choices and challenges)for each FAIR principle.
基金supported by the National Key R&D Program of China(Grant Nos.2018YFC0910400 and 2017YFC0907500)the National Science Foundation of China(Grant Nos.31671372,61702406,and 31701739)+3 种基金the Fundamental Research Funds for the Central Universitiesthe World-Class Universities(Disciplines)the Characteristic Development Guidance Funds for the Central Universitiesthe Shanghai Municipal Science and Technology Major Project(Grant No.2017SHZDZX01)。
文摘Complex structural variants(CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants.However,detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy.As a result,there has been limited progress for CSV discovery compared with simple structural variants.Here,we systematically analyzed the multi-breakpoint connection feature of CSVs,and proposed Mako,utilizing a bottom-up guided model-free strategy,to detect CSVs from paired-end short-read sequencing.Specifically,we implemented a graph-based pattern growth approach,where the graph depicts potential breakpoint connections,and pattern growth enables CSV detection without pre-defined models.Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms.Notably,validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%,where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp,respectively.Moreover,the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types,including two novel types of adjacent segment swap and tandem dispersed duplication.Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs.Mako is publicly available at https://github.com/xjtu-omics/Mako.
文摘Law enforcement agencies have a restricted area in which their powers apply,which is called their jurisdiction.These restrictions also apply to the Internet.However,on the Internet,the physical borders of the jurisdiction,typically country borders,are hard to discover.In our case,it is hard to establish whether someone involved in criminal online behavior is indeed a Dutch citizen.We propose a way to overcome the arduous task of manually investigating whether a user on an Internet forum is Dutch or not.More precisely,we aim to detect that a given English text is written by a Dutch native author.To develop a detector,we follow a machine learning approach.Therefore,we need to prepare a specific training corpus.To obtain a corpus that is representative for online forums,we collected a large amount of English forum posts from Dutch and non-Dutch authors on Reddit.To learn a detection model,we used a bag-of-words representation to capture potential misspellings,grammatical errors or unusual turns of phrases that are characteristic of the mother tongue of the authors.For this learning task,we compare the linear support vector machine and regularized logistic regression using the appropriate performance metrics f1 score,precision,and average precision.Our results show logistic regression with frequency-based feature selection performs best at predicting Dutch natives.Further study should be directed to the general applicability of the results that is to find out if the developed models are applicable to other forums with comparable high performance.