期刊文献+
共找到3,531篇文章
< 1 2 177 >
每页显示 20 50 100
Audiovisual Art Event Classification and Outreach Based on Web Extracted Data
1
作者 Andreas Giannakoulopoulos Minas Pergantis +1 位作者 Aristeidis Lamprogeorgos Stella Lampoura 《Journal of Software Engineering and Applications》 2025年第1期24-43,共20页
The World Wide Web provides a wealth of information about everything, including contemporary audio and visual art events, which are discussed on media outlets, blogs, and specialized websites alike. This information m... The World Wide Web provides a wealth of information about everything, including contemporary audio and visual art events, which are discussed on media outlets, blogs, and specialized websites alike. This information may become a robust source of real-world data, which may form the basis of an objective data-driven analysis. In this study, a methodology for collecting information about audio and visual art events in an automated manner from a large array of websites is presented in detail. This process uses cutting edge Semantic Web, Web Search and Generative AI technologies to convert website documents into a collection of structured data. The value of the methodology is demonstrated by creating a large dataset concerning audiovisual events in Greece. The collected information includes event characteristics, estimated metrics based on their text descriptions, outreach metrics based on the media that reported them, and a multi-layered classification of these events based on their type, subjects and methods used. This dataset is openly provided to the general and academic public through a Web application. Moreover, each event’s outreach is evaluated using these quantitative metrics, the results are analyzed with an emphasis on classification popularity and useful conclusions are drawn concerning the importance of artistic subjects, methods, and media. 展开更多
关键词 Web data extraction Art Events Classification Artistic Outreach Online Media
在线阅读 下载PDF
Structured AJAX Data Extraction Based on Agricultural Ontology 被引量:6
2
作者 LI Chuan-xi SU Ya-ru +2 位作者 WANG Ru-jing WEI Yuan-yuan HUANG He 《Journal of Integrative Agriculture》 SCIE CAS CSCD 2012年第5期784-791,共8页
More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditi... More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extracting the structured data from AJAX contents and annotating its semantic are very significant for further applications. In this paper, a structured AJAX data extraction method for agricultural domain based on agricultural ontology was proposed. Firstly, Crawljax, an open AJAX crawling tool, was overridden to explore and retrieve the AJAX contents; secondly, the retrieved contents were partitioned into items and then classified by combining with agricultural ontology. HTML tags and punctuations were used to segment the retrieved contents into entity items. Finally, the entity items were clustered and the semantic annotation was assigned to clustering results according to agricultural ontology. By experimental evaluation, the proposed approach was proved effectively in resource exploring, entity extraction, and semantic annotation. 展开更多
关键词 information extraction structured data AJAX agricultural ontology semantic annotation
在线阅读 下载PDF
Recommendation Algorithm Integrating CNN and Attention System in Data Extraction 被引量:1
3
作者 Yang Li Fei Yin Xianghui Hui 《Computers, Materials & Continua》 SCIE EI 2023年第5期4047-4063,共17页
With the rapid development of the Internet globally since the 21st century,the amount of data information has increased exponentially.Data helps improve people’s livelihood and working conditions,as well as learning ... With the rapid development of the Internet globally since the 21st century,the amount of data information has increased exponentially.Data helps improve people’s livelihood and working conditions,as well as learning efficiency.Therefore,data extraction,analysis,and processing have become a hot issue for people from all walks of life.Traditional recommendation algorithm still has some problems,such as inaccuracy,less diversity,and low performance.To solve these problems and improve the accuracy and variety of the recommendation algorithms,the research combines the convolutional neural networks(CNN)and the attention model to design a recommendation algorithm based on the neural network framework.Through the text convolutional network,the input layer in CNN has transformed into two channels:static ones and non-static ones.Meanwhile,the self-attention system focuses on the system so that data can be better processed and the accuracy of feature extraction becomes higher.The recommendation algorithm combines CNN and attention system and divides the embedding layer into user information feature embedding and data name feature extraction embedding.It obtains data name features through a convolution kernel.Finally,the top pooling layer obtains the length vector.The attention system layer obtains the characteristics of the data type.Experimental results show that the proposed recommendation algorithm that combines CNN and the attention system can perform better in data extraction than the traditional CNN algorithm and other recommendation algorithms that are popular at the present stage.The proposed algorithm shows excellent accuracy and robustness. 展开更多
关键词 data extraction recommendation algorithm CNN algorithm attention model
在线阅读 下载PDF
Research of Extracting Data from HTML Web Pages Automatically 被引量:1
4
作者 王茹 宋瀚涛 陆玉昌 《Journal of Beijing Institute of Technology》 EI CAS 2003年第S1期104-108,共5页
In order to use data information in the Internet, it is necessary to extract data from web pages. An HTT tree model representing HTML pages is presented. Based on the HTT model, a wrapper generation algorithm AGW is p... In order to use data information in the Internet, it is necessary to extract data from web pages. An HTT tree model representing HTML pages is presented. Based on the HTT model, a wrapper generation algorithm AGW is proposed. The AGW algorithm utilizes comparing and correcting technique to generate the wrapper with the native characteristic of the HTT tree structure. The AGW algorithm can not only generate the wrapper automatically, but also rebuild the data schema easily and reduce the complexity of the computing. 展开更多
关键词 information extraction data transformation WRAPPER HTML page
在线阅读 下载PDF
Algorithmic Foundation and Software Tools for Extracting Shoreline Features from Remote Sensing Imagery and LiDAR Data 被引量:9
5
作者 Hongxing Liu Lei Wang +2 位作者 Douglas J. Sherman Qiusheng Wu Haibin Su 《Journal of Geographic Information System》 2011年第2期99-119,共21页
This paper presents algorithmic components and corresponding software routines for extracting shoreline features from remote sensing imagery and LiDAR data. Conceptually, shoreline features are treated as boundary lin... This paper presents algorithmic components and corresponding software routines for extracting shoreline features from remote sensing imagery and LiDAR data. Conceptually, shoreline features are treated as boundary lines between land objects and water objects. Numerical algorithms have been identified and de-vised to segment and classify remote sensing imagery and LiDAR data into land and water pixels, to form and enhance land and water objects, and to trace and vectorize the boundaries between land and water ob-jects as shoreline features. A contouring routine is developed as an alternative method for extracting shore-line features from LiDAR data. While most of numerical algorithms are implemented using C++ program-ming language, some algorithms use available functions of ArcObjects in ArcGIS. Based on VB .NET and ArcObjects programming, a graphical user’s interface has been developed to integrate and organize shoreline extraction routines into a software package. This product represents the first comprehensive software tool dedicated for extracting shorelines from remotely sensed data. Radarsat SAR image, QuickBird multispectral image, and airborne LiDAR data have been used to demonstrate how these software routines can be utilized and combined to extract shoreline features from different types of input data sources: panchromatic or single band imagery, color or multi-spectral image, and LiDAR elevation data. Our software package is freely available for the public through the internet. 展开更多
关键词 SHORELINE extraction Remote Sensing IMAGERY LiDAR data ArcGIS ARCOBJECTS VB.NET
暂未订购
Semi-structured Data Extraction and Schema Knowledge Mining
6
作者 陈恩红 WANG Xufa 《High Technology Letters》 EI CAS 2001年第1期1-5,共5页
A semi structured data extraction method to get the useful information embedded in a group of relevant web pages and store it with OEM(Object Exchange Model) is proposed. Then, the data mining method is adopted to dis... A semi structured data extraction method to get the useful information embedded in a group of relevant web pages and store it with OEM(Object Exchange Model) is proposed. Then, the data mining method is adopted to discover schema knowledge implicit in the semi structured data. This knowledge can make users understand the information structure on the web more deeply and thourouly. At the same time, it can also provide a kind of effective schema for the querying of web information. 展开更多
关键词 Semi-structured data SCHEMA data extraction.
在线阅读 下载PDF
An Efficient Mechanism for Product Data Extraction from E-Commerce Websites
7
作者 Malik Javed Akhtar Zahur Ahmad +3 位作者 Rashid Amin Sultan H.Almotiri Mohammed A.Al Ghamdi Hamza Aldabbas 《Computers, Materials & Continua》 SCIE EI 2020年第12期2639-2663,共25页
A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human underst... A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human understanding and not for machines.Therefore,to make data machine-readable,it requires techniques to grab data from web pages.Researchers have addressed the problem using two approaches,i.e.,knowledge engineering and machine learning.State of the art knowledge engineering approaches use the structure of documents,visual cues,clustering of attributes of data records and text processing techniques to identify data records on a web page.Machine learning approaches use annotated pages to learn rules.These rules are used to extract data from unseen web pages.The structure of web documents is continuously evolving.Therefore,new techniques are needed to handle the emerging requirements of web data extraction.In this paper,we have presented a novel,simple and efficient technique to extract data from web pages using visual styles and structure of documents.The proposed technique detects Rich Data Region(RDR)using query and correlative words of the query.RDR is then divided into data records using style similarity.Noisy elements are removed using a Common Tag Sequence(CTS)and formatting entropy.The system is implemented using JAVA and runs on the dataset of real-world working websites.The effectiveness of results is evaluated using precision,recall,and F-measure and compared with five existing systems.A comparison of the proposed technique to existing systems has shown encouraging results. 展开更多
关键词 Document object model rich data region common tag sequence web data extraction deep web mining
在线阅读 下载PDF
Extraction of Mineral Alteration Zone from ETM+ Data in Northwestern Yunnan,China
8
作者 赵志芳 张玉君 +1 位作者 成秋明 陈建平 《Journal of China University of Geosciences》 SCIE CSCD 2008年第4期416-420,共5页
Alteration is regarded as significant information for mineral exploration. In this study, ETM+ remote sensing data are used for recognizing and extracting alteration zones in northwestern Yunnan (云南), China. The ... Alteration is regarded as significant information for mineral exploration. In this study, ETM+ remote sensing data are used for recognizing and extracting alteration zones in northwestern Yunnan (云南), China. The principal component analysis (PCA) of ETM+ bands 1, 4, 5, and 7 was employed for OH alteration extractions. The PCA of ETM+ bands 1, 3, 4, and 5 was used for extracting Fe^2+ (Fe^3+) alterations. Interfering factors, such as vegetation, snow, and shadows, were masked. Alteration components were defined in the principal components (PCs) by the contributions of their diagnostic spectral bands. The zones of alteration identified from remote sensing were analyzed in detail along with geological surveys and field verification. The results show that the OH^- alteration is a main indicator of K-feldspar, phyllic, and prophilized alterations. These alterations are closely related to porphyry copper deposits. The Fe^2+ (Fe^3+) alteration indicates pyritization, which is mainly related to hydrothermal or skarn type polymetallic deposits. 展开更多
关键词 mineral alteration extraction from ETM+ data PCA OH^- alteration Fe^2+ (Fe^3+) alteration northwestern Yunnan China
在线阅读 下载PDF
Automatic Data Extraction from Websites for Generating Aquatic Product Market Information
9
作者 袁红春 陈莹 孙越夫 《Journal of Donghua University(English Edition)》 EI CAS 2006年第6期15-19,共5页
The massive web-based information resources have led to an increasing demand for effective automatic retrieval of target information for web applications. This paper introduces a web-based data extraction tool that de... The massive web-based information resources have led to an increasing demand for effective automatic retrieval of target information for web applications. This paper introduces a web-based data extraction tool that deploys various algorithms to locate, extract and filter tabular data from HTML pages and to transform them into new web-based representations. The tool has been applied in an aquaculture web application platform for extracting and generating aquatic product market information. Results prove that this tool is very effective in extracting the required data from web pages. 展开更多
关键词 web data table localization algorithm distance algorithm data filtering algorithm data extraction tool.
在线阅读 下载PDF
On Structure-based Web Data Extraction: The Model, Method and Application
10
作者 俞方桦 戴玮 陈家训 《Journal of China Textile University(English Edition)》 EI CAS 2000年第4期103-106,共4页
Web data extraction is to obtain valuable data from the tremendous information resource of the World Wide Web according to the pre - defined pattern. It processes and classifies the data on the Web. Formalization of t... Web data extraction is to obtain valuable data from the tremendous information resource of the World Wide Web according to the pre - defined pattern. It processes and classifies the data on the Web. Formalization of the procedure of Web data extraction is presented, as well as the description of crawling and extraction algorithm. Based on the formalization, an XML - based page structure description language, TIDL, is brought out, including the object model, the HTML object reference model and definition of tags. At the final part, a Web data gathering and querying application based on Internet agent technology, named Web Integration Services Kit (WISK) is mentioned. 展开更多
关键词 World WIDE WEB WEB MINING data extractION HTML XML
在线阅读 下载PDF
Rural Habitation Multistage Nature Boundary Extraction Based on Geographic Name Database
11
作者 Binbin Hu Hong Wang Wei Zhang 《Journal of Geoscience and Environment Protection》 2016年第7期37-43,共7页
In order to extract the boundary of rural habitation, based on geographic name data and basic geographic information data, an extraction method that use polygon aggregation is raised, it can extract the boundary of th... In order to extract the boundary of rural habitation, based on geographic name data and basic geographic information data, an extraction method that use polygon aggregation is raised, it can extract the boundary of three levels of rural habitation consists of town, administrative village and nature village. The method first extracts the boundary of nature village by aggregating the resident polygon, then extracts the boundary of administrative village by aggregating the boundary of nature village, and last extracts the boundary of town by aggregating the boundary of administrative village. The related methods of extracting the boundary of those three levels rural habitation has been given in detail during the experiment with basic geographic information data and geographic name data. Experimental results show the method can be a reference for boundary extraction of rural habitation. 展开更多
关键词 Rural Habitation Geographic Name data Basic Geographic Information data Boundary extraction Polygon Aggregation
暂未订购
Alteration Information Extraction by Applying Synthesis Processing Techniques to Landsat ETM+Data: Case Study of Zhaoyuan Gold Mines, Shandong Province, China
12
作者 刘福江 吴信才 +1 位作者 孙华山 郭艳 《Journal of China University of Geosciences》 SCIE CSCD 2007年第1期72-76,共5页
Satellite remote sensing data are usually used to analyze the spatial distribution pattern of geological structures and generally serve as a significant means for the identification of alteration zones. Based on the L... Satellite remote sensing data are usually used to analyze the spatial distribution pattern of geological structures and generally serve as a significant means for the identification of alteration zones. Based on the Landsat Enhanced Thematic Mapper (ETM+) data, which have better spectral resolution (8 bands) and spatial resolution (15 m in PAN band), the synthesis processing techniques were presented to fulfill alteration information extraction: data preparation, vegetation indices and band ratios, and expert classifier-based classification. These techniques have been implemented in the MapGIS-RSP software (version 1.0), developed by the Wuhan Zondy Cyber Technology Co., Ltd, China. In the study area application of extracting alteration information in the Zhaoyuan (招远) gold mines, Shandong (山东) Province, China, several hydorthermally altered zones (included two new sites) were found after satellite imagery interpretation coupled with field surveys. It is concluded that these synthesis processing techniques are useful approaches and are applicable to a wide range of gold-mineralized alteration information extraction. 展开更多
关键词 alteration information extraction Zhaoyuan gold mines Landsat-7 ETM+ data
在线阅读 下载PDF
The development of data acquisition and control system for extraction power supply of prototype RF ion source 被引量:1
13
作者 Meichu HUANG Chundong HU +4 位作者 Yuanzhe ZHAO Caichao JIANG Yahong XIE Shiyong CHEN Qinglong CUI 《Plasma Science and Technology》 SCIE EI CAS CSCD 2018年第8期104-111,共8页
A 16 kV/20 A power supply was developed for the extraction grid of prototype radio frequency(RF) ion source of neutral beam injector. To acquire the state signals of extraction grid power supply(EGPS) and control ... A 16 kV/20 A power supply was developed for the extraction grid of prototype radio frequency(RF) ion source of neutral beam injector. To acquire the state signals of extraction grid power supply(EGPS) and control the operation of the EGPS, a data acquisition and control system has been developed. This system mainly consists of interlock protection circuit board, photoelectric conversion circuit, optical fibers, industrial compact peripheral component interconnect(CPCI) computer and host computer. The human machine interface of host computer delivers commands and data to program of the CPCI computer, as well as offers a convenient client for setting parameters and displaying EGPS status. The CPCI computer acquires the status of the power supply. The system can turn-off the EGPS quickly when the faults of EGPS occur. The system has been applied to the EGPS of prototype RF ion source. Test results show that the data acquisition and control system for the EGPS can meet the requirements of the operation of prototype RF ion source. 展开更多
关键词 RF ion source data acquisition control system TCP/IP protocol beam extraction
在线阅读 下载PDF
基于多源异构数据的综合性诊疗平台建设与应用
14
作者 孙晓玮 冷金昌 +1 位作者 彭坤 刘敏超 《中国医学物理学杂志》 2026年第1期110-115,共6页
为解决医院临床中多源异构数据管理与应用问题,本研究建立了一个基于多源异构数据的综合性诊疗平台。采用分布式ETL(Extract-Transform-Load)数据采集技术与贴源层数据存储技术,将临床诊疗数据转化为易提取、宜分享的研究数据库。同时,... 为解决医院临床中多源异构数据管理与应用问题,本研究建立了一个基于多源异构数据的综合性诊疗平台。采用分布式ETL(Extract-Transform-Load)数据采集技术与贴源层数据存储技术,将临床诊疗数据转化为易提取、宜分享的研究数据库。同时,通过利用人工智能检索技术与多种数据分析方法,实现对医学数据的实时查询与统计分析。该平台构建了从临床业务的原始医疗数据到科研和管理可分析的高质量数据之间的通道,为医学科研、临床决策和医疗管理提供更全面、准确的信息支持。 展开更多
关键词 多源异构数据 综合性诊疗平台 分布式ETL数据采集 人工智能检索
暂未订购
基于检索增强生成的化工领域大模型智能问答
15
作者 宋凯 陈泽华 +3 位作者 娄娟 陈建 董宇轩 魏啸然 《天津大学学报(自然科学与工程技术版)》 北大核心 2026年第2期212-220,共9页
化工设备设计需要严格依照标准规范.然而标准规范数量多、内容上相互引用,设计人员面对非常规的设计要求或设计问题时很难准确、全面地查找到所有涉及的标准规范条目.利用检索增强生成(RAG)技术结合大语言模型(LLM)可以对设计要求或设... 化工设备设计需要严格依照标准规范.然而标准规范数量多、内容上相互引用,设计人员面对非常规的设计要求或设计问题时很难准确、全面地查找到所有涉及的标准规范条目.利用检索增强生成(RAG)技术结合大语言模型(LLM)可以对设计要求或设计问题进行准确回答的同时分析并提供相应标准规范内容,从而避免遗漏相关的标准规范.然而,由于化工设备设计领域知识库中具有大量公式、图表等复杂数据,如何构建相应的结构化RAG数据库实现LLM在化工设备设计领域的智能问答尚不明确.针对上述问题,本文提出了一种垂直领域的复杂数据智能问答系统构建一体化框架,该框架结合提示工程方法与多个视觉语言模型以实现RAG数据库的构建,采用语义检索与重排序技术,并选取嵌入模型与大语言模型分别作为检索器与生成器,以实现基于RAG的智能问答.基于该框架,本文构建了化工设备设计领域的智能问答系统,并使用Qwen2.5-72b和Qwen2.5-7b模型在以GB/T 150—2011规范为主的压力容器设计问答数据集上进行实验.结果表明,本文所提出的框架在复杂数据提取的准确性上优于现有技术,并通过RAG技术显著提升了问答系统的性能.相比于未结合RAG的技术,Qwen2.5-72b和Qwen2.5-7b模型的准确率分别提高了19.3%和17.7%.此外还对生成器接受的文档块数量对问答系统准确性的影响与设备设计领域数据的泛化性能进行了研究. 展开更多
关键词 大语言模型 检索增强生成 化工设备设计 智能问答 复杂数据信息提取
在线阅读 下载PDF
基于小波优化的卷积自编码器地震道数据压缩
16
作者 刘培刚 余刚 +1 位作者 李正 李宗民 《计算机工程与设计》 北大核心 2026年第1期260-269,共10页
针对地震数据在压缩与重建过程中部分高频和峰值信息丢失的问题,结合小波变换(WT)在多分辨率分析中的优势和卷积自编码器(CAE)在特征提取和数据重建方面的高效能力,提出了一种基于WT改进CAE的地震道数据压缩方法。该方法构建了两个改进... 针对地震数据在压缩与重建过程中部分高频和峰值信息丢失的问题,结合小波变换(WT)在多分辨率分析中的优势和卷积自编码器(CAE)在特征提取和数据重建方面的高效能力,提出了一种基于WT改进CAE的地震道数据压缩方法。该方法构建了两个改进的CAE模型:低压缩比模型WTCAE-L,高压缩比模型WTCAE-H,实现了对地震数据的高效压缩,同时保持了较高的重建质量。实验结果表明,两者在各自压缩比范围内展现最佳性能。 展开更多
关键词 压缩与重建 高频和峰值 多分辨率分析 卷积自编码器 特征提取 地震道数据压缩 低压缩比 高压缩比
在线阅读 下载PDF
基于自编码神经网络高阶特征提取的温室环境因子高维数据压缩方法
17
作者 冷令 王琳 +3 位作者 吕金洪 李浩欣 吴伟斌 高婷 《中国农机化学报》 北大核心 2026年第1期252-257,共6页
针对温室环境数据的维度高、冗余性强,导致数据处理存在压缩比低和峰值信噪比较高的问题,提出基于自编码神经网络高阶特征提取的温室环境因子高维数据压缩方法。应用改进回归方程,填补温室环境因子数据中的缺失值,针对深度自编码神经网... 针对温室环境数据的维度高、冗余性强,导致数据处理存在压缩比低和峰值信噪比较高的问题,提出基于自编码神经网络高阶特征提取的温室环境因子高维数据压缩方法。应用改进回归方程,填补温室环境因子数据中的缺失值,针对深度自编码神经网络的内部协变量迁移现象,加入自适应平衡层,结合小批量梯度下降法,构建深度自适应平衡自编码神经网络,提取温室环境因子高阶特征,基于矢量量化思想,判断相对误差,通过实施新码书计算,获得各划分的质心,根据码书训练结果,设计高维数据压缩方法。结果表明,当数据量超过50 GB时,所设计方法的压缩比下降0.7个百分点,降幅为3.8%,整体压缩性能表现优异;峰值信噪比随着采样率变大并未大幅下降,仅降低4 dB,降幅为7.5%,压缩峰值信噪比具备更优的重建保真度。该方法具有更高的压缩比且有效降低信噪比,对提高温室管理的智能化水平具有借鉴价值。 展开更多
关键词 改进回归方程 自编码神经网络 高阶特征提取 温室环境因子 高维数据压缩
在线阅读 下载PDF
EpiData软件在定量系统评价数据提取中的应用 被引量:5
18
作者 周建国 周权 马虎 《循证医学》 CSCD 2014年第6期361-363,375,共4页
目的介绍Epi Data软件实现定量系统评价数据提取的方法。方法通过对Epi Data软件制作文献提取数据库的演示,介绍该软件的使用。结果通过Epi Data软件可实现定量系统评价的数据提取,方法易掌握且能实现双录入核查比对功能。结论通过Epi D... 目的介绍Epi Data软件实现定量系统评价数据提取的方法。方法通过对Epi Data软件制作文献提取数据库的演示,介绍该软件的使用。结果通过Epi Data软件可实现定量系统评价的数据提取,方法易掌握且能实现双录入核查比对功能。结论通过Epi Data软件可以作为定量系统评价数据提取的一种方法,可以弥补传统方式的不足之处。 展开更多
关键词 EPIdata软件 系统评价 数据提取
在线阅读 下载PDF
基于大语言模型的弱结构化数据通用问答对实体关系抽取研究
19
作者 张天舒 申姝婧 +1 位作者 张子成 杨建林 《情报理论与实践》 北大核心 2026年第2期179-188,共10页
[目的/意义]弱结构化数据因其隐含语义特征而具有较高的潜在价值,但其非规范性和异构性的表征形式使得传统方法在处理该类数据时面临抽取效果不佳和标注成本高等问题。本文提出一种基于大语言模型的实体关系抽取通用框架,通过大语言模... [目的/意义]弱结构化数据因其隐含语义特征而具有较高的潜在价值,但其非规范性和异构性的表征形式使得传统方法在处理该类数据时面临抽取效果不佳和标注成本高等问题。本文提出一种基于大语言模型的实体关系抽取通用框架,通过大语言模型强大的泛化、生成和推理能力,在标注资源有限和需求快速迭代的场景下为弱结构化数据处理提供高效、可扩展的解决方案。[方法/过程]该框架结合提示工程将抽取任务划分为问答对重构、实体识别、关系抽取与三元组增强4个阶段,有效提升了抽取任务的准确性与鲁棒性。本文以江苏省13个地级市的政府信箱问答数据为应用案例验证该框架的有效性。[结果/结论]实验表明,该框架在精确率、召回率和F1值等方面均表现优异,尤其Qwen模型在多个主流中文大语言模型中效果最佳。进一步实证发现,所提出方法能够准确识别“诉求事项—回应内容”等核心实体,不仅验证了该方法在问答数据中的应用价值,而且对政务信息智能化管理实践具有重要参考价值。 展开更多
关键词 大语言模型 实体识别 关系抽取 弱结构化数据 知识抽取
原文传递
Intelligent ETL for Enterprise Software Applications Using Unstructured Data
20
作者 Manthan Joshi Vijay K. Madisetti 《Journal of Software Engineering and Applications》 2025年第1期44-65,共22页
Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and rec... Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and receipts, into known templates and schemas before processing. We propose a new LLM Agent-based intelligent data extraction, transformation, and load (IntelligentETL) pipeline that not only ingests PDFs and detects inputs within it but also addresses the extraction of structured and unstructured data by developing tools that most efficiently and securely deal with respective data types. We study the efficiency of our proposed pipeline and compare it with enterprise solutions that also utilize LLMs. We establish the supremacy in timely and accurate data extraction and transformation capabilities of our approach for analyzing the data from varied sources based on nested and/or interlinked input constraints. 展开更多
关键词 Structured data Relational Model LLM-Powered Agents Field-Level extraction Knowledge Graph
在线阅读 下载PDF
上一页 1 2 177 下一页 到第
使用帮助 返回顶部