This study investigates the integration of Artificial Intelligence(AI)technologies—particularly natural language processing and machine learning—into qualitative research(QR)workflows.Our research demonstrates that ...This study investigates the integration of Artificial Intelligence(AI)technologies—particularly natural language processing and machine learning—into qualitative research(QR)workflows.Our research demonstrates that AI can streamline data collection,coding,theme identification,and visualization,significantly improving both speed and accuracy compared to traditional manual methods.Notably,our experimental and numerical results provide a comprehensive analysis of AI’s effect on efficiency,accuracy,and usability across various QR tasks.By presenting and discussing studies on some AI&generative AI models,we contribute to the ongoing scholarly discussion on the role of AI in QR exploring its potential benefits,challenges,and limitations.We highlight the growing use of AI-powered qualitative data analysis tools such as ATLAS.ti,Quirkos,and NVivo for automating coding and data interpretation.Our analysis indicates that while AI tools fromleading companies(e.g.,OpenAI’s GPT-4,Google’s T5,Meta’s RoBERTa)can enhance efficiency and depth in QR,code-focused models and general-purpose proprietary language models often do not align with qualitative needs.Additionally,certain proprietary and open-source models(e.g.,DeepSeek,OLMo)are less prevalent in QR due to specialization gaps or adoption lags,whereas task-specific,transparent models,such as BERT for classification,T5 for text generation and summarization,and BLOOM for multilingual analysis,remain preferable for coding and thematic analysis due to their reproducibility and adaptability.We discuss key stages where AI has made a significant impact,including data collection and pre-processing,advanced text and sentiment analysis,simulation and modeling,improved objectivity and consistency.The benefits of integrating AI into QR,along with corresponding adaptations in research methodologies,are also presented.Noteworthy applications and techniques—including The AI Scientist,Carl,AI co-scientist,augmented physics,and explainable AI(XAI)—further illustrate the diverse potential of AI in research and the challenges to academic norms.Despite AI advancements,challenges persist.AI struggles with contextually nuanced data such as sarcasm,tone,and cultural context,and its reliance on training datasets raises ethical concerns regarding privacy,consent,and bias.Ultimately,we advocate for a hybrid approach where AI augments rather than replaces traditional qualitativemethods,anticipating that ongoing AI advancements will enable more sophisticated,collaborative research practices that effectively combine machine capabilities with human expertise.This trend is underpinned and exemplified by applications like AI co-scientist,augmented physics.展开更多
面对跨学科科学文献的指数级增长与现有检索系统的局限性,本研究基于arXiv平台266万篇论文数据集,创新开发了融合向量语义检索与大型语言模型(Large Language Model,LLM)分析的智能系统。通过构建论文向量数据库实现语义相似性初筛,结合...面对跨学科科学文献的指数级增长与现有检索系统的局限性,本研究基于arXiv平台266万篇论文数据集,创新开发了融合向量语义检索与大型语言模型(Large Language Model,LLM)分析的智能系统。通过构建论文向量数据库实现语义相似性初筛,结合LLM上下文推理优化排序,有效解决了传统关键词搜索的语义鸿沟问题以及LLM的幻觉问题。在核物理领域的应用表明,该系统能精准定位跨学科解决方案,对比特定任务上的关键词检索和向量相似度检索,前10篇文献的查全率从10%跃升到60%,查准率从20%跃升到90%。项目开源提供三大核心模块:1)全量论文向量数据库;2)智能检索优化框架(含查询生成、相关性分析等智能体);3)PDF深度解析工具链。本研究突破性地将语义检索与LLM推理相结合,为应对知识爆炸时代的科研挑战提供了可扩展的解决方案(开源地址:https://gitee.com/lgpang/arxiv_vectordb)。展开更多
文摘This study investigates the integration of Artificial Intelligence(AI)technologies—particularly natural language processing and machine learning—into qualitative research(QR)workflows.Our research demonstrates that AI can streamline data collection,coding,theme identification,and visualization,significantly improving both speed and accuracy compared to traditional manual methods.Notably,our experimental and numerical results provide a comprehensive analysis of AI’s effect on efficiency,accuracy,and usability across various QR tasks.By presenting and discussing studies on some AI&generative AI models,we contribute to the ongoing scholarly discussion on the role of AI in QR exploring its potential benefits,challenges,and limitations.We highlight the growing use of AI-powered qualitative data analysis tools such as ATLAS.ti,Quirkos,and NVivo for automating coding and data interpretation.Our analysis indicates that while AI tools fromleading companies(e.g.,OpenAI’s GPT-4,Google’s T5,Meta’s RoBERTa)can enhance efficiency and depth in QR,code-focused models and general-purpose proprietary language models often do not align with qualitative needs.Additionally,certain proprietary and open-source models(e.g.,DeepSeek,OLMo)are less prevalent in QR due to specialization gaps or adoption lags,whereas task-specific,transparent models,such as BERT for classification,T5 for text generation and summarization,and BLOOM for multilingual analysis,remain preferable for coding and thematic analysis due to their reproducibility and adaptability.We discuss key stages where AI has made a significant impact,including data collection and pre-processing,advanced text and sentiment analysis,simulation and modeling,improved objectivity and consistency.The benefits of integrating AI into QR,along with corresponding adaptations in research methodologies,are also presented.Noteworthy applications and techniques—including The AI Scientist,Carl,AI co-scientist,augmented physics,and explainable AI(XAI)—further illustrate the diverse potential of AI in research and the challenges to academic norms.Despite AI advancements,challenges persist.AI struggles with contextually nuanced data such as sarcasm,tone,and cultural context,and its reliance on training datasets raises ethical concerns regarding privacy,consent,and bias.Ultimately,we advocate for a hybrid approach where AI augments rather than replaces traditional qualitativemethods,anticipating that ongoing AI advancements will enable more sophisticated,collaborative research practices that effectively combine machine capabilities with human expertise.This trend is underpinned and exemplified by applications like AI co-scientist,augmented physics.
文摘面对跨学科科学文献的指数级增长与现有检索系统的局限性,本研究基于arXiv平台266万篇论文数据集,创新开发了融合向量语义检索与大型语言模型(Large Language Model,LLM)分析的智能系统。通过构建论文向量数据库实现语义相似性初筛,结合LLM上下文推理优化排序,有效解决了传统关键词搜索的语义鸿沟问题以及LLM的幻觉问题。在核物理领域的应用表明,该系统能精准定位跨学科解决方案,对比特定任务上的关键词检索和向量相似度检索,前10篇文献的查全率从10%跃升到60%,查准率从20%跃升到90%。项目开源提供三大核心模块:1)全量论文向量数据库;2)智能检索优化框架(含查询生成、相关性分析等智能体);3)PDF深度解析工具链。本研究突破性地将语义检索与LLM推理相结合,为应对知识爆炸时代的科研挑战提供了可扩展的解决方案(开源地址:https://gitee.com/lgpang/arxiv_vectordb)。