Since OpenAI opened access to ChatGPT,large language models(LLMs)become an increasingly popular topic attracting researchers’attention from abundant domains.However,public researchers meet some problems when developi...Since OpenAI opened access to ChatGPT,large language models(LLMs)become an increasingly popular topic attracting researchers’attention from abundant domains.However,public researchers meet some problems when developing LLMs given that most of the LLMs are produced by industries and the training details are typically unrevealed.Since datasets are an important setup of LLMs,this paper does a holistic survey on the training datasets used in both the pre-train and fine-tune processes.The paper first summarizes 16 pre-train datasets and 16 fine-tune datasets used in the state-of-the-art LLMs.Secondly,based on the properties of the pre-train and fine-tune processes,it comments on pre-train datasets from quality,quantity,and relation with models,and comments on fine-tune datasets from quality,quantity,and concerns.This study then critically figures out the problems and research trends that exist in current LLM datasets.The study helps public researchers train and investigate LLMs by visual cases and provides useful comments to the research community regarding data development.To the best of our knowledge,this paper is the first to summarize and discuss datasets used in both autoregressive and chat LLMs.The survey offers insights and suggestions to researchers and LLM developers as they build their models,and contributes to the LLM study by pointing out the existing problems of LLM studies from the perspective of data.展开更多
The December 2022 paper in the cancer journal Oncoscience appeared to be a conventional discussion of the pros and cons of treating patients with the drug rapamycin[1].But the article was written using artificial inte...The December 2022 paper in the cancer journal Oncoscience appeared to be a conventional discussion of the pros and cons of treating patients with the drug rapamycin[1].But the article was written using artificial intelligence(AI)and listed the AI chatbot ChatGPT as its lead author.The large language model(LLM)built by OpenAI(San Francisco,CA,USA)had made its sensational public debut less than a month before[2],and the paper was one of the first scientific publications to credit it as an author[3].展开更多
Prompt engineering, the art of crafting effective prompts for artificial intelligence models, has emerged as a pivotal factor in determining the quality and usefulness of AI (Artificial Intelligence)-generated outputs...Prompt engineering, the art of crafting effective prompts for artificial intelligence models, has emerged as a pivotal factor in determining the quality and usefulness of AI (Artificial Intelligence)-generated outputs. This practice involves strategically designing and structuring prompts to guide AI models toward desired outcomes, ensuring that they generate relevant, informative, and accurate responses. The significance of prompt engineering cannot be overstated. Well-crafted prompts can significantly enhance the capabilities of AI models, enabling them to perform tasks that were once thought to be exclusively human domain. By providing clear and concise instructions, prompts can guide AI models to generate creative text, translate languages, write different kinds of creative content, and answer your questions in an informative way. Moreover, prompt engineering can help mitigate biases and ensure that AI models produce outputs that are fair, equitable, and inclusive. However, prompt engineering is not without its challenges. Crafting effective prompts requires a deep understanding of both the AI model’s capabilities and the specific task at hand. Additionally, the quality of the prompts can be influenced by factors such as the model’s training data [1] and the complexity of the task. As AI models continue to evolve, prompt engineering will likely become even more critical in unlocking their full potential.展开更多
Artificial Intelligence (AI) experienced significant advancements in recent years, and its potential power is already recognized across various industries. Yet, the rise of AI has led to a growing concern about its im...Artificial Intelligence (AI) experienced significant advancements in recent years, and its potential power is already recognized across various industries. Yet, the rise of AI has led to a growing concern about its impact on meeting the Sustainable Development Goals (SDGs). The aim of this paper was to evaluate contributions and the potential influence of AI to sustainable development in the society domain. Furthermore, the study analyzed GPT-3 responses, as one of the largest language models developed by OpenAI, descriptively. We conducted a set of queries on the SDGs to gather information on GPT-3’s perceptions of AI impact on sustainable development. Analysis of GPT-3’s contribution potential towards the SDGs showcased its broad range of capabilities for contributing to the SDGs in areas such as education, health, and communication. The study findings provide valuable insights into the contributions of AI to sustainable development in the society domain and highlight the importance of proper regulations to promote the responsible use of AI for sustainable development. We highlighted the potential for improvement in neural language processing skills of GPT-3 by avoiding imitating weak human writing styles with more mistakes in longer texts.展开更多
The application of artificial intelligence(AI)in customer service becomes ubiquitous.In response to the advocacy in the“2021 Coordinated Plan on Artificial Intelligence”,it is crucial to understand how to leverage A...The application of artificial intelligence(AI)in customer service becomes ubiquitous.In response to the advocacy in the“2021 Coordinated Plan on Artificial Intelligence”,it is crucial to understand how to leverage AI customer service chatbots for societal welfare.Across two scenario studies and one lab experiment,this research investigates the impact of AI chatbots’communication styles on consumers’subsequent prosocial intentions irrelevant to the AI-human interaction contents.The combined evidence suggests that consumers exhibit higher prosocial intentions after interacting with social-oriented(vs.task-oriented)AI chatbots.The findings reveal the chain-mediating roles of social presence and empathy.Moreover,the current research investigates the boundary effect of consumers’goal focus(process focus vs.outcome focus),and shows that AI chatbots’communication styles have stronger impact on prosocial intentions for customers with outcome focus.These results revealed the important externality of the AI application in marketplace and provide a novel perspective for companies to implement the corporate social responsibility(CSR)strategy.展开更多
文摘Since OpenAI opened access to ChatGPT,large language models(LLMs)become an increasingly popular topic attracting researchers’attention from abundant domains.However,public researchers meet some problems when developing LLMs given that most of the LLMs are produced by industries and the training details are typically unrevealed.Since datasets are an important setup of LLMs,this paper does a holistic survey on the training datasets used in both the pre-train and fine-tune processes.The paper first summarizes 16 pre-train datasets and 16 fine-tune datasets used in the state-of-the-art LLMs.Secondly,based on the properties of the pre-train and fine-tune processes,it comments on pre-train datasets from quality,quantity,and relation with models,and comments on fine-tune datasets from quality,quantity,and concerns.This study then critically figures out the problems and research trends that exist in current LLM datasets.The study helps public researchers train and investigate LLMs by visual cases and provides useful comments to the research community regarding data development.To the best of our knowledge,this paper is the first to summarize and discuss datasets used in both autoregressive and chat LLMs.The survey offers insights and suggestions to researchers and LLM developers as they build their models,and contributes to the LLM study by pointing out the existing problems of LLM studies from the perspective of data.
文摘The December 2022 paper in the cancer journal Oncoscience appeared to be a conventional discussion of the pros and cons of treating patients with the drug rapamycin[1].But the article was written using artificial intelligence(AI)and listed the AI chatbot ChatGPT as its lead author.The large language model(LLM)built by OpenAI(San Francisco,CA,USA)had made its sensational public debut less than a month before[2],and the paper was one of the first scientific publications to credit it as an author[3].
文摘Prompt engineering, the art of crafting effective prompts for artificial intelligence models, has emerged as a pivotal factor in determining the quality and usefulness of AI (Artificial Intelligence)-generated outputs. This practice involves strategically designing and structuring prompts to guide AI models toward desired outcomes, ensuring that they generate relevant, informative, and accurate responses. The significance of prompt engineering cannot be overstated. Well-crafted prompts can significantly enhance the capabilities of AI models, enabling them to perform tasks that were once thought to be exclusively human domain. By providing clear and concise instructions, prompts can guide AI models to generate creative text, translate languages, write different kinds of creative content, and answer your questions in an informative way. Moreover, prompt engineering can help mitigate biases and ensure that AI models produce outputs that are fair, equitable, and inclusive. However, prompt engineering is not without its challenges. Crafting effective prompts requires a deep understanding of both the AI model’s capabilities and the specific task at hand. Additionally, the quality of the prompts can be influenced by factors such as the model’s training data [1] and the complexity of the task. As AI models continue to evolve, prompt engineering will likely become even more critical in unlocking their full potential.
文摘Artificial Intelligence (AI) experienced significant advancements in recent years, and its potential power is already recognized across various industries. Yet, the rise of AI has led to a growing concern about its impact on meeting the Sustainable Development Goals (SDGs). The aim of this paper was to evaluate contributions and the potential influence of AI to sustainable development in the society domain. Furthermore, the study analyzed GPT-3 responses, as one of the largest language models developed by OpenAI, descriptively. We conducted a set of queries on the SDGs to gather information on GPT-3’s perceptions of AI impact on sustainable development. Analysis of GPT-3’s contribution potential towards the SDGs showcased its broad range of capabilities for contributing to the SDGs in areas such as education, health, and communication. The study findings provide valuable insights into the contributions of AI to sustainable development in the society domain and highlight the importance of proper regulations to promote the responsible use of AI for sustainable development. We highlighted the potential for improvement in neural language processing skills of GPT-3 by avoiding imitating weak human writing styles with more mistakes in longer texts.
基金supported in part by the National Natural Science Foundation of China(NSFC),under Grants Nos.72301034 and 72272016Fundamental Research Funds for the Central Universities under Grant No.2025ZZ048.
文摘The application of artificial intelligence(AI)in customer service becomes ubiquitous.In response to the advocacy in the“2021 Coordinated Plan on Artificial Intelligence”,it is crucial to understand how to leverage AI customer service chatbots for societal welfare.Across two scenario studies and one lab experiment,this research investigates the impact of AI chatbots’communication styles on consumers’subsequent prosocial intentions irrelevant to the AI-human interaction contents.The combined evidence suggests that consumers exhibit higher prosocial intentions after interacting with social-oriented(vs.task-oriented)AI chatbots.The findings reveal the chain-mediating roles of social presence and empathy.Moreover,the current research investigates the boundary effect of consumers’goal focus(process focus vs.outcome focus),and shows that AI chatbots’communication styles have stronger impact on prosocial intentions for customers with outcome focus.These results revealed the important externality of the AI application in marketplace and provide a novel perspective for companies to implement the corporate social responsibility(CSR)strategy.