Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy cl...Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy clustering techniques,such as Fuzzy C-Means(FCM),face significant challenges in handling uncertainty and the dependencies between different views.To overcome these limitations,we introduce a new multi-view fuzzy clustering approach that integrates picture fuzzy sets with a dual-anchor graph method for multi-view data,aiming to enhance clustering accuracy and robustness,termed Multi-view Picture Fuzzy Clustering(MPFC).In particular,the picture fuzzy set theory extends the capability to represent uncertainty by modeling three membership levels:membership degrees,neutral degrees,and refusal degrees.This allows for a more flexible representation of uncertain and conflicting data than traditional fuzzy models.Meanwhile,dual-anchor graphs exploit the similarity relationships between data points and integrate information across views.This combination improves stability,scalability,and robustness when handling noisy and heterogeneous data.Experimental results on several benchmark datasets demonstrate significant improvements in clustering accuracy and efficiency,outperforming traditional methods.Specifically,the MPFC algorithm demonstrates outstanding clustering performance on a variety of datasets,attaining a Purity(PUR)score of 0.6440 and an Accuracy(ACC)score of 0.6213 for the 3 Sources dataset,underscoring its robustness and efficiency.The proposed approach significantly contributes to fields such as pattern recognition,multi-view relational data analysis,and large-scale clustering problems.Future work will focus on extending the method for semi-supervised multi-view clustering,aiming to enhance adaptability,scalability,and performance in real-world applications.展开更多
To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
Identification of security risk factors for small reservoirs is the basis for implementation of early warning systems.The manner of identification of the factors for small reservoirs is of practical significance when ...Identification of security risk factors for small reservoirs is the basis for implementation of early warning systems.The manner of identification of the factors for small reservoirs is of practical significance when data are incomplete.The existing grey relational models have some disadvantages in measuring the correlation between categorical data sequences.To this end,this paper introduces a new grey relational model to analyze heterogeneous data.In this study,a set of security risk factors for small reservoirs was first constructed based on theoretical analysis,and heterogeneous data of these factors were recorded as sequences.The sequences were regarded as random variables,and the information entropy and conditional entropy between sequences were measured to analyze the relational degree between risk factors.Then,a new grey relational analysis model for heterogeneous data was constructed,and a comprehensive security risk factor identification method was developed.A case study of small reservoirs in Guangxi Zhuang Autonomous Region in China shows that the model constructed in this study is applicable to security risk factor identification for small reservoirs with heterogeneous and sparse data.展开更多
Radiometric normalization,as an essential step for multi-source and multi-temporal data processing,has received critical attention.Relative Radiometric Normalization(RRN)method has been primarily used for eliminating ...Radiometric normalization,as an essential step for multi-source and multi-temporal data processing,has received critical attention.Relative Radiometric Normalization(RRN)method has been primarily used for eliminating the radiometric inconsistency.The radiometric trans-forming relation between the subject image and the reference image is an essential aspect of RRN.Aimed at accurate radiometric transforming relation modeling,the learning-based nonlinear regression method,Support Vector machine Regression(SVR)is used for fitting the complicated radiometric transforming relation for the coarse-resolution data-referenced RRN.To evaluate the effectiveness of the proposed method,a series of experiments are performed,including two synthetic data experiments and one real data experiment.And the proposed method is compared with other methods that use linear regression,Artificial Neural Network(ANN)or Random Forest(RF)for radiometric transforming relation modeling.The results show that the proposed method performs well on fitting the radiometric transforming relation and could enhance the RRN performance.展开更多
Missing data handling is vital for multi-sensor information fusion fault diagnosis of motors to prevent the accuracy decay or even model failure,and some promising results have been gained in several current studies.T...Missing data handling is vital for multi-sensor information fusion fault diagnosis of motors to prevent the accuracy decay or even model failure,and some promising results have been gained in several current studies.These studies,however,have the following limitations:1)effective supervision is neglected for missing data across different fault types and 2)imbalance in missing rates among fault types results in inadequate learning during model training.To overcome the above limitations,this paper proposes a dynamic relative advantagedriven multi-fault synergistic diagnosis method to accomplish accurate fault diagnosis of motors under imbalanced missing data rates.Firstly,a cross-fault-type generalized synergistic diagnostic strategy is established based on variational information bottleneck theory,which is able to ensure sufficient supervision in handling missing data.Then,a dynamic relative advantage assessment technique is designed to reduce diagnostic accuracy decay caused by imbalanced missing data rates.The proposed method is validated using multi-sensor data from motor fault simulation experiments,and experimental results demonstrate its effectiveness and superiority in improving diagnostic accuracy and generalization under imbalanced missing data rates.展开更多
For a transaction processing system to operate effectively and efficiently in cloud environments, it is important to distribute huge amount of data while guaranteeing the ACID (atomic, consistent, isolated, and dura...For a transaction processing system to operate effectively and efficiently in cloud environments, it is important to distribute huge amount of data while guaranteeing the ACID (atomic, consistent, isolated, and durable) properties. Moreover, database partition and migration tools can help transplanting conventional relational database systems to the cloud environment rather than rebuilding a new system. This paper proposes a database distribution management (DBDM) system, which partitions or replicates the data according to the transaction behaviors of the application system. The principle strategy of DBDM is to keep together the data used in a single transaction, and thus, avoiding massive transmission of records in join operations. The proposed system has been implemented successfully. The preliminary experiments show that the DBDM performs the database partition and migration effectively. Also, the DBDM system is modularly designed to adapt to different database management system (DBMS) or different partition algorithms.展开更多
We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel que...We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel query optimization, transaction processing system and parallel access method in detail.展开更多
City regions often have great diversity in form and function. To better understand the role of each region, the relations between city regions need to be carefully studied. In this work, the human mobility relations b...City regions often have great diversity in form and function. To better understand the role of each region, the relations between city regions need to be carefully studied. In this work, the human mobility relations between regions of Shanghai based on mobile phone data is explored. By formulating the regions as nodes in a network and the commuting between each pair of regions as link weights, the distribution of nodes degree, and spatial structures of communities in this relation network are studied. Statistics show that regions locate in urban centers and traffic hubs have significantly larger degrees. Moreover, two kinds of spatial structures of communities are found. In most communities, nodes are spatially neighboring. However, in the communities that cover traffic hubs, nodes often locate along corridors.展开更多
Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational...Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase.展开更多
In this paper,the entity_relation data model for integrating spatio_temporal data is designed.In the design,spatio_temporal data can be effectively stored and spatiao_temporal analysis can be easily realized.
Effective data communication is a crucial aspect of the Social Internet of Things(SIoT)and continues to be a significant research focus.This paper proposes a data forwarding algorithm based on Multidimensional Social ...Effective data communication is a crucial aspect of the Social Internet of Things(SIoT)and continues to be a significant research focus.This paper proposes a data forwarding algorithm based on Multidimensional Social Relations(MSRR)in SIoT to solve this problem.The proposed algorithm separates message forwarding into intra-and cross-community forwarding by analyzing interest traits and social connections among nodes.Three new metrics are defined:the intensity of node social relationships,node activity,and community connectivity.Within the community,messages are sent by determining which node is most similar to the sender by weighing the strength of social connections and node activity.When a node performs cross-community forwarding,the message is forwarded to the most reasonable relay community by measuring the node activity and the connection between communities.The proposed algorithm was compared to three existing routing algorithms in simulation experiments.Results indicate that the proposed algorithmsubstantially improves message delivery efficiency while lessening network overhead and enhancing connectivity and coordination in the SIoT context.展开更多
Developing and optimizing fuzzy relation equations are of great relevance in system modeling,which involves analysis of numerous fuzzy rules.As each rule varies with respect to its level of influence,it is advocated t...Developing and optimizing fuzzy relation equations are of great relevance in system modeling,which involves analysis of numerous fuzzy rules.As each rule varies with respect to its level of influence,it is advocated that the performance of a fuzzy relation equation is strongly related to a subset of fuzzy rules obtained by removing those without significant relevance.In this study,we establish a novel framework of developing granular fuzzy relation equations that concerns the determination of an optimal subset of fuzzy rules.The subset of rules is selected by maximizing their performance of the obtained solutions.The originality of this study is conducted in the following ways.Starting with developing granular fuzzy relation equations,an interval-valued fuzzy relation is determined based on the selected subset of fuzzy rules(the subset of rules is transformed to interval-valued fuzzy sets and subsequently the interval-valued fuzzy sets are utilized to form interval-valued fuzzy relations),which can be used to represent the fuzzy relation of the entire rule base with high performance and efficiency.Then,the particle swarm optimization(PSO)is implemented to solve a multi-objective optimization problem,in which not only an optimal subset of rules is selected but also a parameterεfor specifying a level of information granularity is determined.A series of experimental studies are performed to verify the feasibility of this framework and quantify its performance.A visible improvement of particle swarm optimization(about 78.56%of the encoding mechanism of particle swarm optimization,or 90.42%of particle swarm optimization with an exploration operator)is gained over the method conducted without using the particle swarm optimization algorithm.展开更多
This paper concentrates on the problem of data redundancy under the extended-possibility-based model. Based on the information gain in data classification, a measure - relation redundancy - is proposed to evaluate the...This paper concentrates on the problem of data redundancy under the extended-possibility-based model. Based on the information gain in data classification, a measure - relation redundancy - is proposed to evaluate the degree of a given relation being redundant in whole. The properties of relation redundancy are also investigated. This new measure is useful in dealing with data redundancy.展开更多
As there is datum redundancy in tradition database and temporal database in existence and the quantities of temporal database are increasing fleetly. We put forward compress storage tactics for temporal datum which co...As there is datum redundancy in tradition database and temporal database in existence and the quantities of temporal database are increasing fleetly. We put forward compress storage tactics for temporal datum which combine compress technology in existence in order to settle datum redundancy in the course of temporal datum storage and temporal datum of slow acting domain and momentary acting domain are accessed by using each from independence clock method and mutual clock method .We also bring forward strategy of gridding storage to resolve the problems of temporal datum rising rapidly.展开更多
In this paper, the authors present the development of a data modelling tool that visualizes the transformation process of an "Entity-Relationship" Diagram (ERD) into a relational database schema. The authors' foc...In this paper, the authors present the development of a data modelling tool that visualizes the transformation process of an "Entity-Relationship" Diagram (ERD) into a relational database schema. The authors' focus is the design of a tool for educational purposes and its implementation on e-learning database course. The tool presents two stages of database design. The first stage is to draw ERD graphically and validate it. The drawing is done by a learner. Then at second stage, the system enables automatically transformation of ERD to relational database schema by using common rules. Thus, the learner could understand more easily how to apply the theoretical material. A detailed description of system functionalities and algorithm for the conversion are proposed. Finally, a user interface and usage aspects are exposed.展开更多
Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only ...Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining al gorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient.展开更多
The abundant entities and entity-attribute relations in medical websites are important data resources for medical research.However,the medical websites are usually characterized of storing entity and attribute values ...The abundant entities and entity-attribute relations in medical websites are important data resources for medical research.However,the medical websites are usually characterized of storing entity and attribute values in different pages.To extract those data records efficiently,we propose an automatic extraction system which is related to entity and attribute relations(attributes and values)of separate storage.Our system includes following modules:(1)rich-information interactive annotation page rendering;(2)separate storage attribute relations annotating;(3)annotated relations for pattern generating and data records extracting.This paper presents the relations about the attributes which are stored in many pages by effective annotation,then generates rules for data records extraction.The experiments show that the system can not only complete attribute relations of separate storage extraction,but also be compatible with regular relation extraction,while maintaining high accuracy.展开更多
An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during t...An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.展开更多
As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the tr...As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.展开更多
基金funded by the Research Project:THTETN.05/24-25,VietnamAcademy of Science and Technology.
文摘Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy clustering techniques,such as Fuzzy C-Means(FCM),face significant challenges in handling uncertainty and the dependencies between different views.To overcome these limitations,we introduce a new multi-view fuzzy clustering approach that integrates picture fuzzy sets with a dual-anchor graph method for multi-view data,aiming to enhance clustering accuracy and robustness,termed Multi-view Picture Fuzzy Clustering(MPFC).In particular,the picture fuzzy set theory extends the capability to represent uncertainty by modeling three membership levels:membership degrees,neutral degrees,and refusal degrees.This allows for a more flexible representation of uncertain and conflicting data than traditional fuzzy models.Meanwhile,dual-anchor graphs exploit the similarity relationships between data points and integrate information across views.This combination improves stability,scalability,and robustness when handling noisy and heterogeneous data.Experimental results on several benchmark datasets demonstrate significant improvements in clustering accuracy and efficiency,outperforming traditional methods.Specifically,the MPFC algorithm demonstrates outstanding clustering performance on a variety of datasets,attaining a Purity(PUR)score of 0.6440 and an Accuracy(ACC)score of 0.6213 for the 3 Sources dataset,underscoring its robustness and efficiency.The proposed approach significantly contributes to fields such as pattern recognition,multi-view relational data analysis,and large-scale clustering problems.Future work will focus on extending the method for semi-supervised multi-view clustering,aiming to enhance adaptability,scalability,and performance in real-world applications.
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
基金supported by the National Nature Science Foundation of China(Grant No.71401052)the National Social Science Foundation of China(Grant No.17BGL156)the Key Project of the National Social Science Foundation of China(Grant No.14AZD024)
文摘Identification of security risk factors for small reservoirs is the basis for implementation of early warning systems.The manner of identification of the factors for small reservoirs is of practical significance when data are incomplete.The existing grey relational models have some disadvantages in measuring the correlation between categorical data sequences.To this end,this paper introduces a new grey relational model to analyze heterogeneous data.In this study,a set of security risk factors for small reservoirs was first constructed based on theoretical analysis,and heterogeneous data of these factors were recorded as sequences.The sequences were regarded as random variables,and the information entropy and conditional entropy between sequences were measured to analyze the relational degree between risk factors.Then,a new grey relational analysis model for heterogeneous data was constructed,and a comprehensive security risk factor identification method was developed.A case study of small reservoirs in Guangxi Zhuang Autonomous Region in China shows that the model constructed in this study is applicable to security risk factor identification for small reservoirs with heterogeneous and sparse data.
基金This research was funded by the National Natural Science Fund of China[grant number 41701415]Science fund project of Wuhan Institute of Technology[grant number K201724]Science and Technology Development Funds Project of Department of Transportation of Hubei Province[grant number 201900001].
文摘Radiometric normalization,as an essential step for multi-source and multi-temporal data processing,has received critical attention.Relative Radiometric Normalization(RRN)method has been primarily used for eliminating the radiometric inconsistency.The radiometric trans-forming relation between the subject image and the reference image is an essential aspect of RRN.Aimed at accurate radiometric transforming relation modeling,the learning-based nonlinear regression method,Support Vector machine Regression(SVR)is used for fitting the complicated radiometric transforming relation for the coarse-resolution data-referenced RRN.To evaluate the effectiveness of the proposed method,a series of experiments are performed,including two synthetic data experiments and one real data experiment.And the proposed method is compared with other methods that use linear regression,Artificial Neural Network(ANN)or Random Forest(RF)for radiometric transforming relation modeling.The results show that the proposed method performs well on fitting the radiometric transforming relation and could enhance the RRN performance.
文摘Missing data handling is vital for multi-sensor information fusion fault diagnosis of motors to prevent the accuracy decay or even model failure,and some promising results have been gained in several current studies.These studies,however,have the following limitations:1)effective supervision is neglected for missing data across different fault types and 2)imbalance in missing rates among fault types results in inadequate learning during model training.To overcome the above limitations,this paper proposes a dynamic relative advantagedriven multi-fault synergistic diagnosis method to accomplish accurate fault diagnosis of motors under imbalanced missing data rates.Firstly,a cross-fault-type generalized synergistic diagnostic strategy is established based on variational information bottleneck theory,which is able to ensure sufficient supervision in handling missing data.Then,a dynamic relative advantage assessment technique is designed to reduce diagnostic accuracy decay caused by imbalanced missing data rates.The proposed method is validated using multi-sensor data from motor fault simulation experiments,and experimental results demonstrate its effectiveness and superiority in improving diagnostic accuracy and generalization under imbalanced missing data rates.
基金supported by the Taiwan Ministry of Economic Affairs and Institute for Information Industry under the project titled "Fundamental Industrial Technology Development Program (1/4)"
文摘For a transaction processing system to operate effectively and efficiently in cloud environments, it is important to distribute huge amount of data while guaranteeing the ACID (atomic, consistent, isolated, and durable) properties. Moreover, database partition and migration tools can help transplanting conventional relational database systems to the cloud environment rather than rebuilding a new system. This paper proposes a database distribution management (DBDM) system, which partitions or replicates the data according to the transaction behaviors of the application system. The principle strategy of DBDM is to keep together the data used in a single transaction, and thus, avoiding massive transmission of records in join operations. The proposed system has been implemented successfully. The preliminary experiments show that the DBDM performs the database partition and migration effectively. Also, the DBDM system is modularly designed to adapt to different database management system (DBMS) or different partition algorithms.
文摘We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel query optimization, transaction processing system and parallel access method in detail.
基金Project(71303269)supported by the National Natural Science Foundation of ChinaProject(14ZZD006)supported by the Economics Major Research Task of Fostering,China
文摘City regions often have great diversity in form and function. To better understand the role of each region, the relations between city regions need to be carefully studied. In this work, the human mobility relations between regions of Shanghai based on mobile phone data is explored. By formulating the regions as nodes in a network and the commuting between each pair of regions as link weights, the distribution of nodes degree, and spatial structures of communities in this relation network are studied. Statistics show that regions locate in urban centers and traffic hubs have significantly larger degrees. Moreover, two kinds of spatial structures of communities are found. In most communities, nodes are spatially neighboring. However, in the communities that cover traffic hubs, nodes often locate along corridors.
基金supported by Universiti Putra Malaysia Grant Scheme(Putra Grant)(GP/2020/9692500).
文摘Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase.
基金Project supported by the National Surveying Technical Fund(No.200_07)
文摘In this paper,the entity_relation data model for integrating spatio_temporal data is designed.In the design,spatio_temporal data can be effectively stored and spatiao_temporal analysis can be easily realized.
基金supported by the NationalNatural Science Foundation of China(61972136)the Hubei Provincial Department of Education Outstanding Youth Scientific Innovation Team Support Foundation(T201410,T2020017)+1 种基金the Natural Science Foundation of Xiaogan City(XGKJ2022010095,XGKJ2022010094)the Science and Technology Research Project of Education Department of Hubei Province(No.Q20222704).
文摘Effective data communication is a crucial aspect of the Social Internet of Things(SIoT)and continues to be a significant research focus.This paper proposes a data forwarding algorithm based on Multidimensional Social Relations(MSRR)in SIoT to solve this problem.The proposed algorithm separates message forwarding into intra-and cross-community forwarding by analyzing interest traits and social connections among nodes.Three new metrics are defined:the intensity of node social relationships,node activity,and community connectivity.Within the community,messages are sent by determining which node is most similar to the sender by weighing the strength of social connections and node activity.When a node performs cross-community forwarding,the message is forwarded to the most reasonable relay community by measuring the node activity and the connection between communities.The proposed algorithm was compared to three existing routing algorithms in simulation experiments.Results indicate that the proposed algorithmsubstantially improves message delivery efficiency while lessening network overhead and enhancing connectivity and coordination in the SIoT context.
基金supported by the National Natural Sci-ence Foundation of China(62006184,62076189,61873277).
文摘Developing and optimizing fuzzy relation equations are of great relevance in system modeling,which involves analysis of numerous fuzzy rules.As each rule varies with respect to its level of influence,it is advocated that the performance of a fuzzy relation equation is strongly related to a subset of fuzzy rules obtained by removing those without significant relevance.In this study,we establish a novel framework of developing granular fuzzy relation equations that concerns the determination of an optimal subset of fuzzy rules.The subset of rules is selected by maximizing their performance of the obtained solutions.The originality of this study is conducted in the following ways.Starting with developing granular fuzzy relation equations,an interval-valued fuzzy relation is determined based on the selected subset of fuzzy rules(the subset of rules is transformed to interval-valued fuzzy sets and subsequently the interval-valued fuzzy sets are utilized to form interval-valued fuzzy relations),which can be used to represent the fuzzy relation of the entire rule base with high performance and efficiency.Then,the particle swarm optimization(PSO)is implemented to solve a multi-objective optimization problem,in which not only an optimal subset of rules is selected but also a parameterεfor specifying a level of information granularity is determined.A series of experimental studies are performed to verify the feasibility of this framework and quantify its performance.A visible improvement of particle swarm optimization(about 78.56%of the encoding mechanism of particle swarm optimization,or 90.42%of particle swarm optimization with an exploration operator)is gained over the method conducted without using the particle swarm optimization algorithm.
基金Supported by the National Natural Science Foundation of China(No.70231010/70321001)the Bilateral Scientific and Technological Cooperation between China and Flanders (No.174B0201)
文摘This paper concentrates on the problem of data redundancy under the extended-possibility-based model. Based on the information gain in data classification, a measure - relation redundancy - is proposed to evaluate the degree of a given relation being redundant in whole. The properties of relation redundancy are also investigated. This new measure is useful in dealing with data redundancy.
文摘As there is datum redundancy in tradition database and temporal database in existence and the quantities of temporal database are increasing fleetly. We put forward compress storage tactics for temporal datum which combine compress technology in existence in order to settle datum redundancy in the course of temporal datum storage and temporal datum of slow acting domain and momentary acting domain are accessed by using each from independence clock method and mutual clock method .We also bring forward strategy of gridding storage to resolve the problems of temporal datum rising rapidly.
文摘In this paper, the authors present the development of a data modelling tool that visualizes the transformation process of an "Entity-Relationship" Diagram (ERD) into a relational database schema. The authors' focus is the design of a tool for educational purposes and its implementation on e-learning database course. The tool presents two stages of database design. The first stage is to draw ERD graphically and validate it. The drawing is done by a learner. Then at second stage, the system enables automatically transformation of ERD to relational database schema by using common rules. Thus, the learner could understand more easily how to apply the theoretical material. A detailed description of system functionalities and algorithm for the conversion are proposed. Finally, a user interface and usage aspects are exposed.
基金Supported by the National Natural Science Foun-dation of China (70371015)
文摘Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining al gorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient.
基金Supported by the Natural Science Foundation of Hubei Province(2013CFB334)
文摘The abundant entities and entity-attribute relations in medical websites are important data resources for medical research.However,the medical websites are usually characterized of storing entity and attribute values in different pages.To extract those data records efficiently,we propose an automatic extraction system which is related to entity and attribute relations(attributes and values)of separate storage.Our system includes following modules:(1)rich-information interactive annotation page rendering;(2)separate storage attribute relations annotating;(3)annotated relations for pattern generating and data records extracting.This paper presents the relations about the attributes which are stored in many pages by effective annotation,then generates rules for data records extraction.The experiments show that the system can not only complete attribute relations of separate storage extraction,but also be compatible with regular relation extraction,while maintaining high accuracy.
基金supported by the National Key Research and Development Program[2020YFB1006302].
文摘An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.
文摘As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.