Cover ratio of cloud is a very important factor which affects the quality of a satellite image, therefore cloud detection from satellite images is a necessary step in assessing the image quality. The study on cloud de...Cover ratio of cloud is a very important factor which affects the quality of a satellite image, therefore cloud detection from satellite images is a necessary step in assessing the image quality. The study on cloud detection from the visual band of a satellite image is developed. Firstly, we consider the differences between the cloud and ground including high grey level, good continuity of grey level, area of cloud region, and the variance of local fractal dimension (VLFD) of the cloud region. A single cloud region detection method is proposed. Secondly, by introducing a reference satellite image and by comparing the variance in the dimensions corresponding to the reference and the tested images, a method that detects multiple cloud regions and determines whether or not the cloud exists in an image is described. By using several Ikonos images, the performance of the proposed method is demonstrated.展开更多
Instruction fine-tuning is a key method for adapting large language models(LLMs)to domain-specific tasks,and instruction quality significantly impacts model performance after fine-tuning.Hence,evaluating the quality o...Instruction fine-tuning is a key method for adapting large language models(LLMs)to domain-specific tasks,and instruction quality significantly impacts model performance after fine-tuning.Hence,evaluating the quality of instruction and selecting high-quality instructions are essential steps in the process of LLM instruction fine-tuning.Although existing studies provide important theoretical foundations and techniques for this,there is still room for improvement in terms of generality,the relationship between methods and experimental verification.Current methods for evaluating instruction quality can be classified into four main categories:human evaluation,statistics-based evaluation,model-based evaluation,and LLMs-based evaluation.Among these methods,human evaluation relies on the subjective judgment and domain expertise of the evaluators,which offers interpretability and is suitable for scenarios involving small-scale data and sufficient budgets.Statistics-based evaluation estimates the quality of instructions using indicators such as stopwords and lexical diversity,providing high efficiency and a suitable evaluation for large-scale data.Model-based evaluation employs specific models to quantify indicators such as perplexity(PPL)and instruction following difficulty(IFD),which is flexible and suitable for specific tasks.The LLMs-based evaluation rates the quality of instructions through prompt-based interaction with LLMs,focusing on aspects such as accuracy and coherence,which is highly automated and customizable,simplifying the evaluation process.Finally,considering the limitations of current quality evaluation methods,some future research directions are proposed for improvement.These include refining instruction categories,extending evaluation indicators,enhancing human-AI interaction evaluation method,applying agents in instruction quality evaluation,and developing a comprehensive evaluation framework.展开更多
Temporal information is pervasive and crucial in medical records and other clinical text,as it formulates the development process of medical conditions and is vital for clinical decision making.However,providing a hol...Temporal information is pervasive and crucial in medical records and other clinical text,as it formulates the development process of medical conditions and is vital for clinical decision making.However,providing a holistic knowledge representation and reasoning framework for various time expressions in the clinical text is challenging.In order to capture complex temporal semantics in clinical text,we propose a novel Clinical Time Ontology(CTO)as an extension from OWL framework.More specifically,we identified eight timerelated problems in clinical text and created 11 core temporal classes to conceptualize the fuzzy time,cyclic time,irregular time,negations and other complex aspects of clinical time.Then,we extended Allen’s and TEO’s temporal relations and defined the relation concept description between complex and simple time.Simultaneously,we provided a formulaic and graphical presentation of complex time and complex time relationships.We carried out empirical study on the expressiveness and usability of CTO using real-world healthcare datasets.Finally,experiment results demonstrate that CTO could faithfully represent and reason over 93%of the temporal expressions,and it can cover a wider range of time-related classes in clinical domain.展开更多
基金supported by the National Natural Science Foundation of China(61702385)the Key Projects of National Social Science Foundation of China(11&ZD189)
文摘Cover ratio of cloud is a very important factor which affects the quality of a satellite image, therefore cloud detection from satellite images is a necessary step in assessing the image quality. The study on cloud detection from the visual band of a satellite image is developed. Firstly, we consider the differences between the cloud and ground including high grey level, good continuity of grey level, area of cloud region, and the variance of local fractal dimension (VLFD) of the cloud region. A single cloud region detection method is proposed. Secondly, by introducing a reference satellite image and by comparing the variance in the dimensions corresponding to the reference and the tested images, a method that detects multiple cloud regions and determines whether or not the cloud exists in an image is described. By using several Ikonos images, the performance of the proposed method is demonstrated.
基金supported by National Natural Science Foundation of China(No.62261023)National Natural Science Foundation of China(No.U1836118)Science and Technology Innovation 2030“New Generation of Artificial Intelligence”(2020AAA0108501).
文摘Instruction fine-tuning is a key method for adapting large language models(LLMs)to domain-specific tasks,and instruction quality significantly impacts model performance after fine-tuning.Hence,evaluating the quality of instruction and selecting high-quality instructions are essential steps in the process of LLM instruction fine-tuning.Although existing studies provide important theoretical foundations and techniques for this,there is still room for improvement in terms of generality,the relationship between methods and experimental verification.Current methods for evaluating instruction quality can be classified into four main categories:human evaluation,statistics-based evaluation,model-based evaluation,and LLMs-based evaluation.Among these methods,human evaluation relies on the subjective judgment and domain expertise of the evaluators,which offers interpretability and is suitable for scenarios involving small-scale data and sufficient budgets.Statistics-based evaluation estimates the quality of instructions using indicators such as stopwords and lexical diversity,providing high efficiency and a suitable evaluation for large-scale data.Model-based evaluation employs specific models to quantify indicators such as perplexity(PPL)and instruction following difficulty(IFD),which is flexible and suitable for specific tasks.The LLMs-based evaluation rates the quality of instructions through prompt-based interaction with LLMs,focusing on aspects such as accuracy and coherence,which is highly automated and customizable,simplifying the evaluation process.Finally,considering the limitations of current quality evaluation methods,some future research directions are proposed for improvement.These include refining instruction categories,extending evaluation indicators,enhancing human-AI interaction evaluation method,applying agents in instruction quality evaluation,and developing a comprehensive evaluation framework.
基金supported by the National Natural Science Foundation of China(No.U1836118)the Open Fund of Key Laboratory of Content Organization and Knowledge Services for Rich Media Digital Publishing(ZD2021-11/01)the Natural Science Foundation of Hubei Province educational Committee(B2019009)
文摘Temporal information is pervasive and crucial in medical records and other clinical text,as it formulates the development process of medical conditions and is vital for clinical decision making.However,providing a holistic knowledge representation and reasoning framework for various time expressions in the clinical text is challenging.In order to capture complex temporal semantics in clinical text,we propose a novel Clinical Time Ontology(CTO)as an extension from OWL framework.More specifically,we identified eight timerelated problems in clinical text and created 11 core temporal classes to conceptualize the fuzzy time,cyclic time,irregular time,negations and other complex aspects of clinical time.Then,we extended Allen’s and TEO’s temporal relations and defined the relation concept description between complex and simple time.Simultaneously,we provided a formulaic and graphical presentation of complex time and complex time relationships.We carried out empirical study on the expressiveness and usability of CTO using real-world healthcare datasets.Finally,experiment results demonstrate that CTO could faithfully represent and reason over 93%of the temporal expressions,and it can cover a wider range of time-related classes in clinical domain.