Most existing blockchain schemes are based on the design concept“openness and transparency”to realize data security,which usually require transaction data to be presented in the form of plaintext.However,it inevitab...Most existing blockchain schemes are based on the design concept“openness and transparency”to realize data security,which usually require transaction data to be presented in the form of plaintext.However,it inevitably brings the issues with respect to data privacy and operating performance.In this paper,we proposed a novel blockchain scheme called Cipherchain,which can process and maintain transaction data in the form of ciphertext while the characteristics of immutability and auditability are guaranteed.Specifically in our scheme,transactions can be encrypted locally based on a searchable encryption scheme called multi-user public key encryption with conjunctive keyword search(mPECK),and can be accessed by multiple specific participants after appended to the globally consistent distributed ledger.By introducing execution-consensus-update paradigm of transaction flow,Cipherchain cannot only make it possible for transaction data to exist in the form of ciphertext,but also guarantee the overall system performance not greatly affected by cryptographic operations and other local execution work.In addition,Cipherchain is a promising scheme to realize the technology combination of“blockchain+cloud computing”and“permissioned blockchain+public blockchain”.展开更多
While the Harris-Todaro model is a traditional approach used in researching the urban-rural dichotomy,it fails to explain families’goals to maximize their current utility in terms of intertemporal decision-making con...While the Harris-Todaro model is a traditional approach used in researching the urban-rural dichotomy,it fails to explain families’goals to maximize their current utility in terms of intertemporal decision-making con‐ditions.To fill this gap,in this paper,an urban-rural dichotomy model involving labor migration and educa‐tion is established,in which it is assumed that family utility derives from consumption and children’s educa‐tional achievement.The steady-state path derived through the Bellman equation suggests that increasing edu‐cational investment and family education intensity leads to a significant urban-rural difference in children’s educational achievement.Compared with the traditional Harris-Todaro model,the transversality condition is loosened in this model,while the unavailability of loans constrains migrant families.Four hypotheses are made and tested using an empirical study.An ordinary least squares regression was used in the analysis,but due to the endogeneity caused by missing variables,the instrumental variable method and two-stage least squares regression were used.The results demonstrate that the household registration system can explain 44.5%of the educational achievement difference,and the initial difference is inflated 4.73 times after nine years of compulsory education.This divergence could increase the differences caused by household registra‐tion status,resulting in larger income gaps and intergenerational heredity of identities.展开更多
How the recent progress of reasoning large language models(LLMs),especially the new open-source model DeepSeek-R1,can benefit financial services is an underexplored problem.While LLMs have ignited numerous application...How the recent progress of reasoning large language models(LLMs),especially the new open-source model DeepSeek-R1,can benefit financial services is an underexplored problem.While LLMs have ignited numerous applications within the financial sector,including financial news analysis and general customer interactions.展开更多
The springing up of large language models(LLMs)has shifted the community from single-task-orientated natural language processing(NLP)research to a holistic end-to-end multi-task learning paradigm.Along this line of re...The springing up of large language models(LLMs)has shifted the community from single-task-orientated natural language processing(NLP)research to a holistic end-to-end multi-task learning paradigm.Along this line of research endeavors in the area,LLM-based prompting methods have attracted much attention,partially due to the technological advantages brought by prompt engineering(PE)as well as the underlying NLP principles disclosed by various prompting methods.Traditional supervised learning usually requires training a model based on labeled data and then making predictions.In contrast,PE methods directly use the powerful capabilities of existing LLMs(e.g.,GPT-3 and GPT-4)via composing appropriate prompts,especially under few-shot or zero-shot scenarios.Facing the abundance of studies related to the prompting and the ever-evolving nature of this field,this article aims to 1)illustrate a novel perspective to review existing PE methods within the well-established communication theory framework,2)facilitate a better/deeper understanding of developing trends of existing PE methods used in three typical tasks,and 3)shed light on promising research directions for future PE methods.展开更多
In the past decades,artificial intelligence(AI)has achieved unprecedented success,where statistical models become the central entity in AI.However,the centralized training and inference paradigm for building and using...In the past decades,artificial intelligence(AI)has achieved unprecedented success,where statistical models become the central entity in AI.However,the centralized training and inference paradigm for building and using these models is facing more and more privacy and legal challenges.To bridge the gap between data privacy and the need for data fusion,an emerging AI paradigm feder-ated learning(FL)has emerged as an approach for solving data silos and data privacy problems.Based on secure distributed AI,feder-ated learning emphasizes data security throughout the lifecycle,which includes the following steps:data preprocessing,training,evalu-ation,and deployments.FL keeps data security by using methods,such as secure multi-party computation(MPC),differential privacy,and hardware solutions,to build and use distributed multiple-party machine-learning systems and statistical models over different data sources.Besides data privacy concerns,we argue that the concept of“model”matters,when developing and deploying federated models,they are easy to expose to various kinds of risks including plagiarism,illegal copy,and misuse.To address these issues,we introduce FedIPR,a novel ownership verification scheme,by embedding watermarks into FL models to verify the ownership of FL models and protect model intellectual property rights(IPR or IP-right for short).While security is at the core of FL,there are still many articles re-ferred to distributed machine learning with no security guarantee as“federated learning”,which are not satisfied with the FL definition supposed to be.To this end,in this paper,we reiterate the concept of federated learning and propose secure federated learning(SFL),where the ultimate goal is to build trustworthy and safe AI with strong privacy-preserving and IP-right-preserving.We provide a com-prehensive overview of existing works,including threats,attacks,and defenses in each phase of SFL from the lifecycle perspective.展开更多
Robustness is a long-standing challenge for automatic speech recognition(ASR)as the applied environment of any ASR system faces much noisier speech samples than clean training corpora.However,it is impractical to anno...Robustness is a long-standing challenge for automatic speech recognition(ASR)as the applied environment of any ASR system faces much noisier speech samples than clean training corpora.However,it is impractical to annotate every types of noisy environments.In this work,we propose a novel phonetic-semantic pre-training(PSP)framework that allows a model to effectively improve the performance of ASR against practical noisy environments via seamlessly integrating pre-training,self-supervised learning,and fine-tuning.In particular,there are three fundamental stages in PSP.First,pre-train the phone-to-word transducer(PWT)to map the generated phone sequence to the target text using only unpaired text data;second,continue training the PWT on more complex data generated from an empirical phone-perturbation heuristic,in additional to self-supervised signals by recovering the tainted phones;and third,fine-tune the resultant PWT with real world speech data.We perform experiments on two real-life datasets collected from industrial scenarios and synthetic noisy datasets,which show that the PSP effectively improves the traditional ASR pipeline with relative character error rate(CER)reductions of 28.63%and 26.38%,respectively,in two real-life datasets.It also demonstrates its robustness against synthetic highly noisy speech datasets.展开更多
基金This work is supported by the NSFC(Grant Nos.61671087,61962009,61003287)the Fok Ying Tong Education Foundation(Grant No.131067)+4 种基金the Major Scientific and Technological Special Project of Guizhou Province(Grant No.20183001)the Foundation of State Key Laboratory of Public Big Data(Grant No.2018BDKFJJ018)CCF-Tencent Open Fund WeBank Special Funding(CCF-WebankRAGR20180104)the High-quality and Cutting-edge Disciplines Construction Project for Universities in Beijing(Internet Information,Communication University of China)the Fundamental Research Funds for the Central Universities,and the Fundamental Research Funds for the Central Universities No.2019XD-A02.
文摘Most existing blockchain schemes are based on the design concept“openness and transparency”to realize data security,which usually require transaction data to be presented in the form of plaintext.However,it inevitably brings the issues with respect to data privacy and operating performance.In this paper,we proposed a novel blockchain scheme called Cipherchain,which can process and maintain transaction data in the form of ciphertext while the characteristics of immutability and auditability are guaranteed.Specifically in our scheme,transactions can be encrypted locally based on a searchable encryption scheme called multi-user public key encryption with conjunctive keyword search(mPECK),and can be accessed by multiple specific participants after appended to the globally consistent distributed ledger.By introducing execution-consensus-update paradigm of transaction flow,Cipherchain cannot only make it possible for transaction data to exist in the form of ciphertext,but also guarantee the overall system performance not greatly affected by cryptographic operations and other local execution work.In addition,Cipherchain is a promising scheme to realize the technology combination of“blockchain+cloud computing”and“permissioned blockchain+public blockchain”.
文摘While the Harris-Todaro model is a traditional approach used in researching the urban-rural dichotomy,it fails to explain families’goals to maximize their current utility in terms of intertemporal decision-making con‐ditions.To fill this gap,in this paper,an urban-rural dichotomy model involving labor migration and educa‐tion is established,in which it is assumed that family utility derives from consumption and children’s educa‐tional achievement.The steady-state path derived through the Bellman equation suggests that increasing edu‐cational investment and family education intensity leads to a significant urban-rural difference in children’s educational achievement.Compared with the traditional Harris-Todaro model,the transversality condition is loosened in this model,while the unavailability of loans constrains migrant families.Four hypotheses are made and tested using an empirical study.An ordinary least squares regression was used in the analysis,but due to the endogeneity caused by missing variables,the instrumental variable method and two-stage least squares regression were used.The results demonstrate that the household registration system can explain 44.5%of the educational achievement difference,and the initial difference is inflated 4.73 times after nine years of compulsory education.This divergence could increase the differences caused by household registra‐tion status,resulting in larger income gaps and intergenerational heredity of identities.
文摘How the recent progress of reasoning large language models(LLMs),especially the new open-source model DeepSeek-R1,can benefit financial services is an underexplored problem.While LLMs have ignited numerous applications within the financial sector,including financial news analysis and general customer interactions.
文摘The springing up of large language models(LLMs)has shifted the community from single-task-orientated natural language processing(NLP)research to a holistic end-to-end multi-task learning paradigm.Along this line of research endeavors in the area,LLM-based prompting methods have attracted much attention,partially due to the technological advantages brought by prompt engineering(PE)as well as the underlying NLP principles disclosed by various prompting methods.Traditional supervised learning usually requires training a model based on labeled data and then making predictions.In contrast,PE methods directly use the powerful capabilities of existing LLMs(e.g.,GPT-3 and GPT-4)via composing appropriate prompts,especially under few-shot or zero-shot scenarios.Facing the abundance of studies related to the prompting and the ever-evolving nature of this field,this article aims to 1)illustrate a novel perspective to review existing PE methods within the well-established communication theory framework,2)facilitate a better/deeper understanding of developing trends of existing PE methods used in three typical tasks,and 3)shed light on promising research directions for future PE methods.
基金supported by National Key Research and Development Program of China(No.2018AAA 0101100).
文摘In the past decades,artificial intelligence(AI)has achieved unprecedented success,where statistical models become the central entity in AI.However,the centralized training and inference paradigm for building and using these models is facing more and more privacy and legal challenges.To bridge the gap between data privacy and the need for data fusion,an emerging AI paradigm feder-ated learning(FL)has emerged as an approach for solving data silos and data privacy problems.Based on secure distributed AI,feder-ated learning emphasizes data security throughout the lifecycle,which includes the following steps:data preprocessing,training,evalu-ation,and deployments.FL keeps data security by using methods,such as secure multi-party computation(MPC),differential privacy,and hardware solutions,to build and use distributed multiple-party machine-learning systems and statistical models over different data sources.Besides data privacy concerns,we argue that the concept of“model”matters,when developing and deploying federated models,they are easy to expose to various kinds of risks including plagiarism,illegal copy,and misuse.To address these issues,we introduce FedIPR,a novel ownership verification scheme,by embedding watermarks into FL models to verify the ownership of FL models and protect model intellectual property rights(IPR or IP-right for short).While security is at the core of FL,there are still many articles re-ferred to distributed machine learning with no security guarantee as“federated learning”,which are not satisfied with the FL definition supposed to be.To this end,in this paper,we reiterate the concept of federated learning and propose secure federated learning(SFL),where the ultimate goal is to build trustworthy and safe AI with strong privacy-preserving and IP-right-preserving.We provide a com-prehensive overview of existing works,including threats,attacks,and defenses in each phase of SFL from the lifecycle perspective.
文摘Robustness is a long-standing challenge for automatic speech recognition(ASR)as the applied environment of any ASR system faces much noisier speech samples than clean training corpora.However,it is impractical to annotate every types of noisy environments.In this work,we propose a novel phonetic-semantic pre-training(PSP)framework that allows a model to effectively improve the performance of ASR against practical noisy environments via seamlessly integrating pre-training,self-supervised learning,and fine-tuning.In particular,there are three fundamental stages in PSP.First,pre-train the phone-to-word transducer(PWT)to map the generated phone sequence to the target text using only unpaired text data;second,continue training the PWT on more complex data generated from an empirical phone-perturbation heuristic,in additional to self-supervised signals by recovering the tainted phones;and third,fine-tune the resultant PWT with real world speech data.We perform experiments on two real-life datasets collected from industrial scenarios and synthetic noisy datasets,which show that the PSP effectively improves the traditional ASR pipeline with relative character error rate(CER)reductions of 28.63%and 26.38%,respectively,in two real-life datasets.It also demonstrates its robustness against synthetic highly noisy speech datasets.