Associating faces appearing in Web videos with names presented in the surrounding context is an important task in many applications. However, the problem is not well investigated particularly under large-scale realist...Associating faces appearing in Web videos with names presented in the surrounding context is an important task in many applications. However, the problem is not well investigated particularly under large-scale realistic scenario,mainly due to the scarcity of dataset constructed in such circumstance. In this paper, we introduce a Web video dataset of celebrities, named WebV-Cele, for name-face association. The dataset consists of 75 073 Internet videos of over 4 000 hours,covering 2 427 celebrities and 649 001 faces. This is, to our knowledge, the most comprehensive dataset for this problem.We describe the details of dataset construction, discuss several interesting findings by analyzing this dataset like celebrity community discovery, and provide experimental results of name-face association using five existing techniques. We also outline important and challenging research problems that could be investigated in the future.展开更多
Deep learning algorithms show good prospects for remote sensingflood monitoring.They mostly rely on huge amounts of labeled data.However,there is a lack of available labeled data in actual needs.In this paper,we propo...Deep learning algorithms show good prospects for remote sensingflood monitoring.They mostly rely on huge amounts of labeled data.However,there is a lack of available labeled data in actual needs.In this paper,we propose a high-resolution multi-source remote sensing dataset forflood area extraction:GF-FloodNet.GF-FloodNet contains 13388 samples from Gaofen-3(GF-3)and Gaofen-2(GF-2)images.We use a multi-level sample selection and interactive annotation strategy based on active learning to construct it.Compare with otherflood-related datasets,GF-FloodNet not only has a spatial resolution of up to 1.5 m and provides pixel-level labels,but also consists of multi-source remote sensing data.We thoroughly validate and evaluate the dataset using several deep learning models,including quantitative analysis,qualitative analysis,and validation on large-scale remote sensing data in real scenes.Experimental results reveal that GF-FloodNet has significant advantages by multi-source data.It can support different deep learning models for training to extractflood areas.There should be a potential optimal boundary for model training in any deep learning dataset.The boundary seems close to 4824 samples in GF-FloodNet.We provide GF-FloodNet at https://www.kaggle.com/datasets/pengliuair/gf-floodnet and https://pan.baidu.com/s/1vdUCGNAfFwG5UjZ9RLLFMQ?pwd=8v6o.展开更多
The development of large language models(LLMs)has created transformative opportunities for the financial industry,especially in the area of financial trading.However,how to integrate LLMs with trading systems has beco...The development of large language models(LLMs)has created transformative opportunities for the financial industry,especially in the area of financial trading.However,how to integrate LLMs with trading systems has become a challenge.To address this problem,we propose an intelligent trade order recognition pipeline that enables the conversion of trade orders into a standard format for trade execution.The system improves the ability of human traders to interact with trading platforms while addressing the problem of misinformation acquisition in trade execution.In addition,we create a trade order dataset of 500 pieces of data to simulate the real-world trading scenarios.Moreover,we design several metrics to provide a comprehensive assessment of dataset reliability and the generative power of big models in finance by using five state-of-the-art LLMs on our dataset.The results show that most models generate syntactically valid JavaScript object notation(JSON)at high rates(about 80%–99%)and initiate clarifying questions in nearly all incomplete cases(about 90%–100%).However,end-to-end accuracy remains low(about 6%–14%),and missing information is substantial(about 12%–66%).Models also tend to over-interrogate—roughly 70%–80%of follow-ups are unnecessary—raising interaction costs and potential information-exposure risk.The research also demonstrates the feasibility of integrating our pipeline with the real-world trading systems,paving the way for practical deployment of LLM-based trade automation solutions.展开更多
1 Introduction Advances in human genetics have facilitated the investigation of the genetic architecture underlying numerous complex traits.In particular,studies of quantitative expression trait loci(eQTL)have played ...1 Introduction Advances in human genetics have facilitated the investigation of the genetic architecture underlying numerous complex traits.In particular,studies of quantitative expression trait loci(eQTL)have played a pivotal role in elucidating how genetic variants influence gene expression across cell types.This progress has enabled the quantification of effect size for disease genetic risk loci and the construction of binary relational datasets linking diseases to their respective disease genes,such as Fantom5[1]and DisGeNET[2].展开更多
基金supported by a research grant from City University of Hong Kong under Grant No.7008178the National Natural Science Foundation of China under Grant Nos.61228205,61303175 and 61172153
文摘Associating faces appearing in Web videos with names presented in the surrounding context is an important task in many applications. However, the problem is not well investigated particularly under large-scale realistic scenario,mainly due to the scarcity of dataset constructed in such circumstance. In this paper, we introduce a Web video dataset of celebrities, named WebV-Cele, for name-face association. The dataset consists of 75 073 Internet videos of over 4 000 hours,covering 2 427 celebrities and 649 001 faces. This is, to our knowledge, the most comprehensive dataset for this problem.We describe the details of dataset construction, discuss several interesting findings by analyzing this dataset like celebrity community discovery, and provide experimental results of name-face association using five existing techniques. We also outline important and challenging research problems that could be investigated in the future.
基金supported by the National Natural Science Foundation of China under Grant number U2243222,42071413,and 41971397.
文摘Deep learning algorithms show good prospects for remote sensingflood monitoring.They mostly rely on huge amounts of labeled data.However,there is a lack of available labeled data in actual needs.In this paper,we propose a high-resolution multi-source remote sensing dataset forflood area extraction:GF-FloodNet.GF-FloodNet contains 13388 samples from Gaofen-3(GF-3)and Gaofen-2(GF-2)images.We use a multi-level sample selection and interactive annotation strategy based on active learning to construct it.Compare with otherflood-related datasets,GF-FloodNet not only has a spatial resolution of up to 1.5 m and provides pixel-level labels,but also consists of multi-source remote sensing data.We thoroughly validate and evaluate the dataset using several deep learning models,including quantitative analysis,qualitative analysis,and validation on large-scale remote sensing data in real scenes.Experimental results reveal that GF-FloodNet has significant advantages by multi-source data.It can support different deep learning models for training to extractflood areas.There should be a potential optimal boundary for model training in any deep learning dataset.The boundary seems close to 4824 samples in GF-FloodNet.We provide GF-FloodNet at https://www.kaggle.com/datasets/pengliuair/gf-floodnet and https://pan.baidu.com/s/1vdUCGNAfFwG5UjZ9RLLFMQ?pwd=8v6o.
文摘The development of large language models(LLMs)has created transformative opportunities for the financial industry,especially in the area of financial trading.However,how to integrate LLMs with trading systems has become a challenge.To address this problem,we propose an intelligent trade order recognition pipeline that enables the conversion of trade orders into a standard format for trade execution.The system improves the ability of human traders to interact with trading platforms while addressing the problem of misinformation acquisition in trade execution.In addition,we create a trade order dataset of 500 pieces of data to simulate the real-world trading scenarios.Moreover,we design several metrics to provide a comprehensive assessment of dataset reliability and the generative power of big models in finance by using five state-of-the-art LLMs on our dataset.The results show that most models generate syntactically valid JavaScript object notation(JSON)at high rates(about 80%–99%)and initiate clarifying questions in nearly all incomplete cases(about 90%–100%).However,end-to-end accuracy remains low(about 6%–14%),and missing information is substantial(about 12%–66%).Models also tend to over-interrogate—roughly 70%–80%of follow-ups are unnecessary—raising interaction costs and potential information-exposure risk.The research also demonstrates the feasibility of integrating our pipeline with the real-world trading systems,paving the way for practical deployment of LLM-based trade automation solutions.
基金supported by the National Natural Science Foundation of China(Grant Nos.62172318,62372349,62132015).
文摘1 Introduction Advances in human genetics have facilitated the investigation of the genetic architecture underlying numerous complex traits.In particular,studies of quantitative expression trait loci(eQTL)have played a pivotal role in elucidating how genetic variants influence gene expression across cell types.This progress has enabled the quantification of effect size for disease genetic risk loci and the construction of binary relational datasets linking diseases to their respective disease genes,such as Fantom5[1]and DisGeNET[2].