With the recent increase in data volume and diversity,traditional text representation techniques are struggling to capture context,particularly in environments with sparse data.To address these challenges,this study p...With the recent increase in data volume and diversity,traditional text representation techniques are struggling to capture context,particularly in environments with sparse data.To address these challenges,this study proposes a new model,the Masked Joint Representation Model(MJRM).MJRM approximates the original hypothesis by leveraging multiple elements in a limited context.It dynamically adapts to changes in characteristics based on data distribution through three main components.First,masking-based representation learning,termed selective dynamic masking,integrates topic modeling and sentiment clustering to generate and train multiple instances across different data subsets,whose predictions are then aggregated with optimized weights.This design alleviates sparsity,suppresses noise,and preserves contextual structures.Second,regularization-based improvements are applied.Third,techniques for addressing sparse data are used to perform final inference.As a result,MJRM improves performance by up to 4%compared to existing AI techniques.In our experiments,we analyzed the contribution of each factor,demonstrating that masking,dynamic learning,and aggregating multiple instances complement each other to improve performance.This demonstrates that a masking-based multi-learning strategy is effective for context-aware sparse text classification,and can be useful even in challenging situations such as data shortage or data distribution variations.We expect that the approach can be extended to diverse fields such as sentiment analysis,spam filtering,and domain-specific document classification.展开更多
The era of the Internet of things(IoT)has marked a continued exploration of applications and services that can make people’s lives more convenient than ever before.However,the exploration of IoT services also means t...The era of the Internet of things(IoT)has marked a continued exploration of applications and services that can make people’s lives more convenient than ever before.However,the exploration of IoT services also means that people face unprecedented difficulties in spontaneously selecting the most appropriate services.Thus,there is a paramount need for a recommendation system that can help improve the experience of the users of IoT services to ensure the best quality of service.Most of the existing techniques—including collaborative filtering(CF),which is most widely adopted when building recommendation systems—suffer from rating sparsity and cold-start problems,preventing them from providing high quality recommendations.Inspired by the great success of deep learning in a wide range of fields,this work introduces a deep-learning-enabled autoencoder architecture to overcome the setbacks of CF recommendations.The proposed deep learning model is designed as a hybrid architecture with three key networks,namely autoencoder(AE),multilayered perceptron(MLP),and generalized matrix factorization(GMF).The model employs two AE networks to learn deep latent feature representations of users and items respectively and in parallel.Next,MLP and GMF networks are employed to model the linear and non-linear user-item interactions respectively with the extracted latent user and item features.Finally,the rating prediction is performed based on the idea of ensemble learning by fusing the output of the GMF and MLP networks.We conducted extensive experiments on two benchmark datasets,MoiveLens100K and MovieLens1M,using four standard evaluation metrics.Ablation experiments were conducted to confirm the validity of the proposed model and the contribution of each of its components in achieving better recommendation performance.Comparative analyses were also carried out to demonstrate the potential of the proposed model in gaining better accuracy than the existing CF methods with resistance to rating sparsity and cold-start problems.展开更多
Telecom industry relies on churn prediction models to retain their customers.These prediction models help in precise and right time recognition of future switching by a group of customers to other service providers.Re...Telecom industry relies on churn prediction models to retain their customers.These prediction models help in precise and right time recognition of future switching by a group of customers to other service providers.Retention not only contributes to the profit of an organization,but it is also important for upholding a position in the competitive market.In the past,numerous churn prediction models have been proposed,but the current models have a number of flaws that prevent them from being used in real-world largescale telecom datasets.These schemes,fail to incorporate frequently changing requirements.Data sparsity,noisy data,and the imbalanced nature of the dataset are the other main challenges for an accurate prediction.In this paper,we propose a hybrid model,name as“A Hybrid System for Customer Churn Prediction and Retention Analysis via Supervised Learning(HCPRs)”that used Synthetic Minority Over-Sampling Technique(SMOTE)and Particle Swarm Optimization(PSO)to address the issue of imbalance class data and feature selection.Data cleaning and normalization has been done on big Orange dataset contains 15000 features along with 50000 entities.Substantial experiments are performed to test and validate the model on Random Forest(RF),Linear Regression(LR),Naïve Bayes(NB)and XG-Boost.Results show that the proposed model when used with XGBoost classifier,has greater Accuracy Under Curve(AUC)of 98%as compared with other methods.展开更多
Crime risk prediction is helpful for urban safety and citizens’life quality.However,existing crime studies focused on coarse-grained prediction,and usually failed to capture the dynamics of urban crimes.The key chall...Crime risk prediction is helpful for urban safety and citizens’life quality.However,existing crime studies focused on coarse-grained prediction,and usually failed to capture the dynamics of urban crimes.The key challenge is data sparsity,since that 1)not all crimes have been recorded,and 2)crimes usually occur with low frequency.In this paper,we propose an effective framework to predict fine-grained and dynamic crime risks in each road using heterogeneous urban data.First,to address the issue of unreported crimes,we propose a cross-aggregation soft-impute(CASI)method to deal with possible unreported crimes.Then,we use a novel crime risk measurement to capture the crime dynamics from the perspective of influence propagation,taking into consideration of both time-varying and location-varying risk propagation.Based on the dynamically calculated crime risks,we design contextual features(i.e.,POI distributions,taxi mobility,demographic features)from various urban data sources,and propose a zero-inflated negative binomial regression(ZINBR)model to predict future crime risks in roads.The experiments using the real-world data from New York City show that our framework can accurately predict road crime risks,and outperform other baseline methods.展开更多
We propose a novel filter for sparse big data,called an integrated autoencoder(IAE),which utilises auxiliary information to mitigate data sparsity.The proposed model achieves an appropriate balance between prediction ...We propose a novel filter for sparse big data,called an integrated autoencoder(IAE),which utilises auxiliary information to mitigate data sparsity.The proposed model achieves an appropriate balance between prediction accuracy,convergence speed,and complexity.We implement experiments on a GPS trajectory dataset,and the results demonstrate that the IAE is more accurate and robust than some state-of-the-art methods.展开更多
Quality-of-Service (QoS) describes the non-functional characteristics of Web services. As such, the QoS is a critical parameter in service selection, composition and fault tolerance, and must be accurately determine...Quality-of-Service (QoS) describes the non-functional characteristics of Web services. As such, the QoS is a critical parameter in service selection, composition and fault tolerance, and must be accurately determined by some type of QoS prediction method. However, with the dramatic increase in the number of Web services, the prediction failure caused by data sparseness has become a critical challenge. A new 'hybrid user-location-aware prediction based on weighted Adamic-Adar (WAA)' (HUWAA) was proposed. The implicit neighbor search was optimized by incorporating location factors. Meanwhile, the ability of the improved algorithms to solve the data sparsity problem was validated in experiments on public real world datasets. The new algorithm outperforms the existing of item-based pearson correlation coefficient (IPCC), user-based pearson correlation coefficient (UPCC) and Web service recommender system (WSRec) algorithms.展开更多
基金supported by the SungKyunKwan University and the BK21 FOUR(Graduate School Innovation)funded by the Ministry of Education(MOE,Korea)and National Research Foundation of Korea(NRF).
文摘With the recent increase in data volume and diversity,traditional text representation techniques are struggling to capture context,particularly in environments with sparse data.To address these challenges,this study proposes a new model,the Masked Joint Representation Model(MJRM).MJRM approximates the original hypothesis by leveraging multiple elements in a limited context.It dynamically adapts to changes in characteristics based on data distribution through three main components.First,masking-based representation learning,termed selective dynamic masking,integrates topic modeling and sentiment clustering to generate and train multiple instances across different data subsets,whose predictions are then aggregated with optimized weights.This design alleviates sparsity,suppresses noise,and preserves contextual structures.Second,regularization-based improvements are applied.Third,techniques for addressing sparse data are used to perform final inference.As a result,MJRM improves performance by up to 4%compared to existing AI techniques.In our experiments,we analyzed the contribution of each factor,demonstrating that masking,dynamic learning,and aggregating multiple instances complement each other to improve performance.This demonstrates that a masking-based multi-learning strategy is effective for context-aware sparse text classification,and can be useful even in challenging situations such as data shortage or data distribution variations.We expect that the approach can be extended to diverse fields such as sentiment analysis,spam filtering,and domain-specific document classification.
基金supported by the deanship of Scientific Research at Prince Sattam Bin Abdulaziz University,Alkharj,Saudi Arabia through Research Proposal No.2020/01/17215。
文摘The era of the Internet of things(IoT)has marked a continued exploration of applications and services that can make people’s lives more convenient than ever before.However,the exploration of IoT services also means that people face unprecedented difficulties in spontaneously selecting the most appropriate services.Thus,there is a paramount need for a recommendation system that can help improve the experience of the users of IoT services to ensure the best quality of service.Most of the existing techniques—including collaborative filtering(CF),which is most widely adopted when building recommendation systems—suffer from rating sparsity and cold-start problems,preventing them from providing high quality recommendations.Inspired by the great success of deep learning in a wide range of fields,this work introduces a deep-learning-enabled autoencoder architecture to overcome the setbacks of CF recommendations.The proposed deep learning model is designed as a hybrid architecture with three key networks,namely autoencoder(AE),multilayered perceptron(MLP),and generalized matrix factorization(GMF).The model employs two AE networks to learn deep latent feature representations of users and items respectively and in parallel.Next,MLP and GMF networks are employed to model the linear and non-linear user-item interactions respectively with the extracted latent user and item features.Finally,the rating prediction is performed based on the idea of ensemble learning by fusing the output of the GMF and MLP networks.We conducted extensive experiments on two benchmark datasets,MoiveLens100K and MovieLens1M,using four standard evaluation metrics.Ablation experiments were conducted to confirm the validity of the proposed model and the contribution of each of its components in achieving better recommendation performance.Comparative analyses were also carried out to demonstrate the potential of the proposed model in gaining better accuracy than the existing CF methods with resistance to rating sparsity and cold-start problems.
文摘Telecom industry relies on churn prediction models to retain their customers.These prediction models help in precise and right time recognition of future switching by a group of customers to other service providers.Retention not only contributes to the profit of an organization,but it is also important for upholding a position in the competitive market.In the past,numerous churn prediction models have been proposed,but the current models have a number of flaws that prevent them from being used in real-world largescale telecom datasets.These schemes,fail to incorporate frequently changing requirements.Data sparsity,noisy data,and the imbalanced nature of the dataset are the other main challenges for an accurate prediction.In this paper,we propose a hybrid model,name as“A Hybrid System for Customer Churn Prediction and Retention Analysis via Supervised Learning(HCPRs)”that used Synthetic Minority Over-Sampling Technique(SMOTE)and Particle Swarm Optimization(PSO)to address the issue of imbalance class data and feature selection.Data cleaning and normalization has been done on big Orange dataset contains 15000 features along with 50000 entities.Substantial experiments are performed to test and validate the model on Random Forest(RF),Linear Regression(LR),Naïve Bayes(NB)and XG-Boost.Results show that the proposed model when used with XGBoost classifier,has greater Accuracy Under Curve(AUC)of 98%as compared with other methods.
基金This work was partly supported by the National Natural Science Foundation of China(Grant No.61772460)Ten Thousand Talent Program of Zhejiang Province(2018R52039).
文摘Crime risk prediction is helpful for urban safety and citizens’life quality.However,existing crime studies focused on coarse-grained prediction,and usually failed to capture the dynamics of urban crimes.The key challenge is data sparsity,since that 1)not all crimes have been recorded,and 2)crimes usually occur with low frequency.In this paper,we propose an effective framework to predict fine-grained and dynamic crime risks in each road using heterogeneous urban data.First,to address the issue of unreported crimes,we propose a cross-aggregation soft-impute(CASI)method to deal with possible unreported crimes.Then,we use a novel crime risk measurement to capture the crime dynamics from the perspective of influence propagation,taking into consideration of both time-varying and location-varying risk propagation.Based on the dynamically calculated crime risks,we design contextual features(i.e.,POI distributions,taxi mobility,demographic features)from various urban data sources,and propose a zero-inflated negative binomial regression(ZINBR)model to predict future crime risks in roads.The experiments using the real-world data from New York City show that our framework can accurately predict road crime risks,and outperform other baseline methods.
基金supported by the National Social Science Foundation of China[No.16FJY008]the National Planning Office of Philosophy and Social Science[No.11801060]the Natural Science Foundation of Shandong Province[No.ZR2016FM26].
文摘We propose a novel filter for sparse big data,called an integrated autoencoder(IAE),which utilises auxiliary information to mitigate data sparsity.The proposed model achieves an appropriate balance between prediction accuracy,convergence speed,and complexity.We implement experiments on a GPS trajectory dataset,and the results demonstrate that the IAE is more accurate and robust than some state-of-the-art methods.
基金supported by the National Key project of Scientific and Technical Supporting Programs of China (2013BAH10F01, 2013BAH07F02, 2014BAH26F02)the Research Fund for the Doctoral Program of Higher Education (20110005120007)+2 种基金Beijing Higher Education Young Elite Teacher Project (YETP0445)the Co-construction Program with Beijing Municipal Commission of EducationEngineering Research Center of Information Networks,Ministry of Education
文摘Quality-of-Service (QoS) describes the non-functional characteristics of Web services. As such, the QoS is a critical parameter in service selection, composition and fault tolerance, and must be accurately determined by some type of QoS prediction method. However, with the dramatic increase in the number of Web services, the prediction failure caused by data sparseness has become a critical challenge. A new 'hybrid user-location-aware prediction based on weighted Adamic-Adar (WAA)' (HUWAA) was proposed. The implicit neighbor search was optimized by incorporating location factors. Meanwhile, the ability of the improved algorithms to solve the data sparsity problem was validated in experiments on public real world datasets. The new algorithm outperforms the existing of item-based pearson correlation coefficient (IPCC), user-based pearson correlation coefficient (UPCC) and Web service recommender system (WSRec) algorithms.