The inherent qualitative nature of textual data poses significant challenges for direct integration into statistical models.This paper presents a two-stage process for analyzing longitudinal textual data,offering a so...The inherent qualitative nature of textual data poses significant challenges for direct integration into statistical models.This paper presents a two-stage process for analyzing longitudinal textual data,offering a solution to this inherent challenge.The proposed model comprises(1)initial data preprocessing and sentiment extraction,followed by(2)applying a growth curve model to analyze the extracted sentiments directly.The paper also explores four distinct approaches for extracting sentiment scores in the dialogue,providing versatility to the proposed framework.The practical application of the proposed model is demonstrated through the analysis of an empirical longitudinal textual dataset.This framework offers a valuable contribution to the field by addressing the challenges associated with modeling qualitative textual data,providing a robust methodology for extracting and analyzing sentiments longitudinally.展开更多
Entity matching (EM) identifies records referring to the same entity within or across databases. Existing methods using structured attribute values (such as digital, date or short string values) may fail when the stru...Entity matching (EM) identifies records referring to the same entity within or across databases. Existing methods using structured attribute values (such as digital, date or short string values) may fail when the structured information is not enough to reflect the matching relationships between records. Nowadays more and more databases may have some unstructured textual attribute containing extra consolidated textual information (CText) of the record, but seldom work has been done on using the CText for EM. Conventional string similarity metrics such as edit distance or bag-of-words are unsuitable for measuring the similarities between CText since there are hundreds or thousands of words with each piece of CText, while existing topic models either cannot work well since there are no obvious gaps between topics in CText. In this paper, we propose a novel cooccurrence-based topic model to identify various sub-topics from each piece of CText, and then measure the similarity between CText on the multiple sub-topic dimensions. To avoid ignoring some hidden important sub-topics, we let the crowd help us decide weights of different sub-topics in doing EM. Our empirical study on two real-world datasets based on Amzon Mechanical Turk Crowdsourcing Platform shows that our method outperforms the state-of-the-art EM methods and Text Understanding models.展开更多
Sentiment Analysis,a significant domain within Natural Language Processing(NLP),focuses on extracting and interpreting subjective information-such as emotions,opinions,and attitudes-from textual data.With the increasi...Sentiment Analysis,a significant domain within Natural Language Processing(NLP),focuses on extracting and interpreting subjective information-such as emotions,opinions,and attitudes-from textual data.With the increasing volume of user-generated content on social media and digital platforms,sentiment analysis has become essential for deriving actionable insights across various sectors.This study presents a systematic literature review of sentiment analysis methodologies,encompassing traditional machine learning algorithms,lexicon-based approaches,and recent advancements in deep learning techniques.The review follows a structured protocol comprising three phases:planning,execution,and analysis/reporting.During the execution phase,67 peer-reviewed articles were initially retrieved,with 25 meeting predefined inclusion and exclusion criteria.The analysis phase involved a detailed examination of each study’s methodology,experimental setup,and key contributions.Among the deep learning models evaluated,Long Short-Term Memory(LSTM)networks were identified as the most frequently adopted architecture for sentiment classification tasks.This review highlights current trends,technical challenges,and emerging opportunities in the field,providing valuable guidance for future research and development in applications such as market analysis,public health monitoring,financial forecasting,and crisis management.展开更多
基金supported by the National Science Foundation under Grant No.:SES-1951038.
文摘The inherent qualitative nature of textual data poses significant challenges for direct integration into statistical models.This paper presents a two-stage process for analyzing longitudinal textual data,offering a solution to this inherent challenge.The proposed model comprises(1)initial data preprocessing and sentiment extraction,followed by(2)applying a growth curve model to analyze the extracted sentiments directly.The paper also explores four distinct approaches for extracting sentiment scores in the dialogue,providing versatility to the proposed framework.The practical application of the proposed model is demonstrated through the analysis of an empirical longitudinal textual dataset.This framework offers a valuable contribution to the field by addressing the challenges associated with modeling qualitative textual data,providing a robust methodology for extracting and analyzing sentiments longitudinally.
文摘Entity matching (EM) identifies records referring to the same entity within or across databases. Existing methods using structured attribute values (such as digital, date or short string values) may fail when the structured information is not enough to reflect the matching relationships between records. Nowadays more and more databases may have some unstructured textual attribute containing extra consolidated textual information (CText) of the record, but seldom work has been done on using the CText for EM. Conventional string similarity metrics such as edit distance or bag-of-words are unsuitable for measuring the similarities between CText since there are hundreds or thousands of words with each piece of CText, while existing topic models either cannot work well since there are no obvious gaps between topics in CText. In this paper, we propose a novel cooccurrence-based topic model to identify various sub-topics from each piece of CText, and then measure the similarity between CText on the multiple sub-topic dimensions. To avoid ignoring some hidden important sub-topics, we let the crowd help us decide weights of different sub-topics in doing EM. Our empirical study on two real-world datasets based on Amzon Mechanical Turk Crowdsourcing Platform shows that our method outperforms the state-of-the-art EM methods and Text Understanding models.
基金supported by the“Technology Commercialization Collaboration Platform Construction”project of the Innopolis Foundation(Project Number:2710033536)the Competitive Research Fund of The University of Aizu,Japan.
文摘Sentiment Analysis,a significant domain within Natural Language Processing(NLP),focuses on extracting and interpreting subjective information-such as emotions,opinions,and attitudes-from textual data.With the increasing volume of user-generated content on social media and digital platforms,sentiment analysis has become essential for deriving actionable insights across various sectors.This study presents a systematic literature review of sentiment analysis methodologies,encompassing traditional machine learning algorithms,lexicon-based approaches,and recent advancements in deep learning techniques.The review follows a structured protocol comprising three phases:planning,execution,and analysis/reporting.During the execution phase,67 peer-reviewed articles were initially retrieved,with 25 meeting predefined inclusion and exclusion criteria.The analysis phase involved a detailed examination of each study’s methodology,experimental setup,and key contributions.Among the deep learning models evaluated,Long Short-Term Memory(LSTM)networks were identified as the most frequently adopted architecture for sentiment classification tasks.This review highlights current trends,technical challenges,and emerging opportunities in the field,providing valuable guidance for future research and development in applications such as market analysis,public health monitoring,financial forecasting,and crisis management.