Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy cl...Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy clustering techniques,such as Fuzzy C-Means(FCM),face significant challenges in handling uncertainty and the dependencies between different views.To overcome these limitations,we introduce a new multi-view fuzzy clustering approach that integrates picture fuzzy sets with a dual-anchor graph method for multi-view data,aiming to enhance clustering accuracy and robustness,termed Multi-view Picture Fuzzy Clustering(MPFC).In particular,the picture fuzzy set theory extends the capability to represent uncertainty by modeling three membership levels:membership degrees,neutral degrees,and refusal degrees.This allows for a more flexible representation of uncertain and conflicting data than traditional fuzzy models.Meanwhile,dual-anchor graphs exploit the similarity relationships between data points and integrate information across views.This combination improves stability,scalability,and robustness when handling noisy and heterogeneous data.Experimental results on several benchmark datasets demonstrate significant improvements in clustering accuracy and efficiency,outperforming traditional methods.Specifically,the MPFC algorithm demonstrates outstanding clustering performance on a variety of datasets,attaining a Purity(PUR)score of 0.6440 and an Accuracy(ACC)score of 0.6213 for the 3 Sources dataset,underscoring its robustness and efficiency.The proposed approach significantly contributes to fields such as pattern recognition,multi-view relational data analysis,and large-scale clustering problems.Future work will focus on extending the method for semi-supervised multi-view clustering,aiming to enhance adaptability,scalability,and performance in real-world applications.展开更多
This paper concentrates on the problem of data redundancy under the extended-possibility-based model. Based on the information gain in data classification, a measure - relation redundancy - is proposed to evaluate the...This paper concentrates on the problem of data redundancy under the extended-possibility-based model. Based on the information gain in data classification, a measure - relation redundancy - is proposed to evaluate the degree of a given relation being redundant in whole. The properties of relation redundancy are also investigated. This new measure is useful in dealing with data redundancy.展开更多
This paper proposes a security policy model for mandatory access control in class B1 database management system whose level of labeling is tuple. The relation hierarchical data model is extended to multilevel relatio...This paper proposes a security policy model for mandatory access control in class B1 database management system whose level of labeling is tuple. The relation hierarchical data model is extended to multilevel relation hierarchical data model. Based on the multilevel relation hierarchical data model, the concept of upper lower layer relational integrity is presented after we analyze and eliminate the covert channels caused by the database integrity. Two SQL statements are extended to process polyinstantiation in the multilevel secure environment. The system is based on the multilevel relation hierarchical data model and is capable of integratively storing and manipulating multilevel complicated objects ( e.g., multilevel spatial data) and multilevel conventional data ( e.g., integer, real number and character string).展开更多
As there is datum redundancy in tradition database and temporal database in existence and the quantities of temporal database are increasing fleetly. We put forward compress storage tactics for temporal datum which co...As there is datum redundancy in tradition database and temporal database in existence and the quantities of temporal database are increasing fleetly. We put forward compress storage tactics for temporal datum which combine compress technology in existence in order to settle datum redundancy in the course of temporal datum storage and temporal datum of slow acting domain and momentary acting domain are accessed by using each from independence clock method and mutual clock method .We also bring forward strategy of gridding storage to resolve the problems of temporal datum rising rapidly.展开更多
Analysis results of the average annual sea levels in the Caspian Sea obtained from ground and satellite observations, corresponding to solar activity characteristics, magnetic field data, and length of day are present...Analysis results of the average annual sea levels in the Caspian Sea obtained from ground and satellite observations, corresponding to solar activity characteristics, magnetic field data, and length of day are presented. Spectra of the indicated processes were investigated and their approximation models were also built. Previously assumed statistical relationships between space-geophysical processes and Caspian Sea level(CSL) changes were confirmed. A close connection was revealed between the low-frequency models of the solar and geomagnetic activity parameters and the CSL changes. Predictions extending into the next decades showed a high probability of an increase in the CSL and a decrease of the compared space-geophysical parameters.展开更多
Within the new model of integrated medical and elderly care services,elderly-related data manifest a composite rights structure that integrates both public and private law dimensions.The granular and multi-dimensional...Within the new model of integrated medical and elderly care services,elderly-related data manifest a composite rights structure that integrates both public and private law dimensions.The granular and multi-dimensional nature and heightened sensitivity of such data,combined with the inherent vulnerability and dependency of elderly-related data subjects,render the regulatory landscape particularly complex.Existing mechanisms for data circulation reveal deficiencies,including fragmented legal norms,indeterminate allocation of data ownership,and supervisory inadequacy.This paper conducts a doctrinal inquiry into the legal relationships among multiple stakeholders across three principal dimensions:data service authorisation,data transmission and operation,and data supervision and safeguard.It proposes a regulatory framework based on a dual-track mechanism-combining top-down harmonisation of existing legal provisions with bottom-up implementation of data trusts-supported by a comprehensive oversight architecture involving government agencies,public interest organisations,and industry associations.This framework is intended to ensure the effective protection of the rights and interests of digitally vulnerable elderly individuals.展开更多
This paper addresses the challenge of efficiently querying multimodal related data in data lakes,a large-scale storage and management system that supports heterogeneous data formats,including structured,semi-structure...This paper addresses the challenge of efficiently querying multimodal related data in data lakes,a large-scale storage and management system that supports heterogeneous data formats,including structured,semi-structured,and unstructured data.Multimodal data queries are crucial because they enable seamless retrieval of related data across modalities,such as tables,images,and text,which has applications in fields like e-commerce,healthcare,and education.However,existing methods primarily focus on single-modality queries,such as joinable or unionable table discovery,and struggle to handle the heterogeneity and lack of metadata in data lakes while balancing accuracy and efficiency.To tackle these challenges,we propose a Multimodal data Query mechanism for Data Lakes(MQDL),which employs a modality-adaptive indexing mechanism raleted and contrastive learning based embeddings to unify representations across modalities.Additionally,we introduce product quantization to optimize candidate verification during queries,reducing computational overhead while maintaining precision.We evaluate MQDL using a table-image dataset across multiple business scenarios,measuring metrics such as precision,recall,and F1-score.Results show that MQDL achieves an accuracy rate of approximately 90%,while demonstrating strong scalability and reduced query response time compared to traditional methods.These findings highlight MQDL's potential to enhance multimodal data retrieval in complex data lake environments.展开更多
This paper presents the recent progress in our project of estimating near real-time electric fields and currents in the ionosphere through our computer system called the Geospace Environment Data Analysis System (GEDA...This paper presents the recent progress in our project of estimating near real-time electric fields and currents in the ionosphere through our computer system called the Geospace Environment Data Analysis System (GEDAS). We show a new technique in which data from ground magnetometers are collected by the system and used as input for the KRM and AMIE programs to calculate the distribution of ionospheric electric fields and currents, as well as of other ionospheric parameters, such as electric potential patterns. One of the goals of this project is to specify ionospheric processes. Examples of the near real-time calculation and the data flow of our scheme are presented.展开更多
Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only ...Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining al gorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient.展开更多
The rapid growth of structured data has presented new technological challenges in the research fields of big data and relational database. In this paper, we present an efficient system for managing and analyzing PB le...The rapid growth of structured data has presented new technological challenges in the research fields of big data and relational database. In this paper, we present an efficient system for managing and analyzing PB level structured data called Banian. Banian overcomes the storage structure limitation of relational database and effectively integrates interactive query with large-scale storage management. It provides a uniform query interface for cross-platform datasets and thus shows favorable compatibility and scalability. Banian's system architecture mainly includes three layers:(1) a storage layer using HDFS for the distributed storage of massive data;(2) a scheduling and execution layer employing the splitting and scheduling technology of parallel database; and(3)an application layer providing a cross-platform query interface and supporting standard SQL. We evaluate Banian using PB level Internet data and the TPC-H benchmark. The results show that when compared with Hive, Banian improves the query performance to a maximum of 30 times and achieves better scalability and concurrency.展开更多
Introduction:Research on the well-being of persons with disabilities(PWDs)has predominantly focused on objective living conditions and physical improvements,with insufficient attention to subjective experiences.This s...Introduction:Research on the well-being of persons with disabilities(PWDs)has predominantly focused on objective living conditions and physical improvements,with insufficient attention to subjective experiences.This study addresses this gap by examining how rehabilitation service utilization enhances economic participation,thereby alleviating subjective relative deprivation(SRD).Methods:Data from 5,288 certified PWDs were analyzed using the National Sample Survey on Subjective Perceptions and Evaluation of Persons with Disabilities’Protection and Development(2023)in China.Linear regression and the Karlson,Holm,and Breen(KHB)method were employed.A heterogeneity analysis was conducted to evaluate subgroup variations.Results:Rehabilitation service utilization is negatively associated with SRD[β=−0.532,95%confidence interval(CI):−0.832,−0.231,P<0.001],with economic participation serving as a mediator in this relationship(KHB:β=−0.044,95%CI:−0.087,−0.001,P<0.05).The SRD-reducing effect of rehabilitation services is stronger among individuals with mild to moderate disabilities(β=−0.634,95%CI:−1.070,−0.197,P<0.01),those with at least a middle school education(β=−0.850,95%CI:−1.250,−0.450,P<0.001),and urban residents(β=−0.803,95%CI:−1.370,−0.236,P<0.01).The mediating effects are also more pronounced within these subgroups.Conclusions:Policies should prioritize enhancing rehabilitation services and employment support for PWDs,with particular focus on groups with mild to moderate disabilities,higher education backgrounds,and urban residents.Psychological interventions should also be implemented to mitigate SRD-related mental health risks.展开更多
In this paper, an attempt has been made to find out the vertical distribution of RH at levels of 850, 700 and 500 hPa by using satellite-derived radiation parameters (i.e., albedo, outgoing longwave fluxes, absorb- ed...In this paper, an attempt has been made to find out the vertical distribution of RH at levels of 850, 700 and 500 hPa by using satellite-derived radiation parameters (i.e., albedo, outgoing longwave fluxes, absorb- ed solar radiation and net radiation). For this purpose, multiple regression equations are derived from MONEX-79 upsonde and dropsonde data over the Arabian Sea for the period 11--20 June 1979. Satellite- estimated RH fields have been compared with ECMWF RH fields obtained from FGGE level ⅢB data. The RMS error and error variance for satellite-estimated RH fields have been found to be less than for those of ECMWF. Satellite-estimated isohygric patterns show good agreement with the cloudiness patterns of GOES satellite, whereas ECMWF isohygric patterns do not show much resemblance with the cloudiness patterns. The results of the study suggest that satellite-estimated RH fields could be more useful than ECMWF RH fields and they can be used with some confidence in NWP models.展开更多
Background The "National" Health Insurance (NHI) in Taiwan, China is a single-payer system that was introduced in 1995 to provide universal health care. It is worth noting that three stakeholders are involved in T...Background The "National" Health Insurance (NHI) in Taiwan, China is a single-payer system that was introduced in 1995 to provide universal health care. It is worth noting that three stakeholders are involved in Taiwan's NHI, which can be seen as a triangular governance regime between the Bureau of "National" Health Insurance (BNHI), the insured and providers. Accordingly, this study intended to assess the efficiency of various different production processes that occur among these stakeholders in Taiwan's NHI system. Methods A two-stage relational Data Envelopment Analysis (DEA) model is adopted to investigate the sub-process efficiencies of the health care resources held by 23 cities and counties through stages I or II, where the outputs of the first stage serve the inputs of the second. The dataset was collected from the annual reports published by the Department of Health, Taiwan, China. Results Under the proposed framework, the efficiency of the whole process can be obtained from the product of productivity and allocative efficiency. Ten DMUs are efficient either in stages I or II, with only two DMUs being efficient with regard to both sub-processes. Conclusion The relational DEA model not only demonstrates the physical relationship between the whole process and the sub-process components, but also produces reliable outcomes in efficiency measurement among different stakeholders in Taiwan's NHI system.展开更多
基金funded by the Research Project:THTETN.05/24-25,VietnamAcademy of Science and Technology.
文摘Multi-view clustering is a critical research area in computer science aimed at effectively extracting meaningful patterns from complex,high-dimensional data that single-view methods cannot capture.Traditional fuzzy clustering techniques,such as Fuzzy C-Means(FCM),face significant challenges in handling uncertainty and the dependencies between different views.To overcome these limitations,we introduce a new multi-view fuzzy clustering approach that integrates picture fuzzy sets with a dual-anchor graph method for multi-view data,aiming to enhance clustering accuracy and robustness,termed Multi-view Picture Fuzzy Clustering(MPFC).In particular,the picture fuzzy set theory extends the capability to represent uncertainty by modeling three membership levels:membership degrees,neutral degrees,and refusal degrees.This allows for a more flexible representation of uncertain and conflicting data than traditional fuzzy models.Meanwhile,dual-anchor graphs exploit the similarity relationships between data points and integrate information across views.This combination improves stability,scalability,and robustness when handling noisy and heterogeneous data.Experimental results on several benchmark datasets demonstrate significant improvements in clustering accuracy and efficiency,outperforming traditional methods.Specifically,the MPFC algorithm demonstrates outstanding clustering performance on a variety of datasets,attaining a Purity(PUR)score of 0.6440 and an Accuracy(ACC)score of 0.6213 for the 3 Sources dataset,underscoring its robustness and efficiency.The proposed approach significantly contributes to fields such as pattern recognition,multi-view relational data analysis,and large-scale clustering problems.Future work will focus on extending the method for semi-supervised multi-view clustering,aiming to enhance adaptability,scalability,and performance in real-world applications.
基金Supported by the National Natural Science Foundation of China(No.70231010/70321001)the Bilateral Scientific and Technological Cooperation between China and Flanders (No.174B0201)
文摘This paper concentrates on the problem of data redundancy under the extended-possibility-based model. Based on the information gain in data classification, a measure - relation redundancy - is proposed to evaluate the degree of a given relation being redundant in whole. The properties of relation redundancy are also investigated. This new measure is useful in dealing with data redundancy.
文摘This paper proposes a security policy model for mandatory access control in class B1 database management system whose level of labeling is tuple. The relation hierarchical data model is extended to multilevel relation hierarchical data model. Based on the multilevel relation hierarchical data model, the concept of upper lower layer relational integrity is presented after we analyze and eliminate the covert channels caused by the database integrity. Two SQL statements are extended to process polyinstantiation in the multilevel secure environment. The system is based on the multilevel relation hierarchical data model and is capable of integratively storing and manipulating multilevel complicated objects ( e.g., multilevel spatial data) and multilevel conventional data ( e.g., integer, real number and character string).
文摘As there is datum redundancy in tradition database and temporal database in existence and the quantities of temporal database are increasing fleetly. We put forward compress storage tactics for temporal datum which combine compress technology in existence in order to settle datum redundancy in the course of temporal datum storage and temporal datum of slow acting domain and momentary acting domain are accessed by using each from independence clock method and mutual clock method .We also bring forward strategy of gridding storage to resolve the problems of temporal datum rising rapidly.
文摘Analysis results of the average annual sea levels in the Caspian Sea obtained from ground and satellite observations, corresponding to solar activity characteristics, magnetic field data, and length of day are presented. Spectra of the indicated processes were investigated and their approximation models were also built. Previously assumed statistical relationships between space-geophysical processes and Caspian Sea level(CSL) changes were confirmed. A close connection was revealed between the low-frequency models of the solar and geomagnetic activity parameters and the CSL changes. Predictions extending into the next decades showed a high probability of an increase in the CSL and a decrease of the compared space-geophysical parameters.
文摘Within the new model of integrated medical and elderly care services,elderly-related data manifest a composite rights structure that integrates both public and private law dimensions.The granular and multi-dimensional nature and heightened sensitivity of such data,combined with the inherent vulnerability and dependency of elderly-related data subjects,render the regulatory landscape particularly complex.Existing mechanisms for data circulation reveal deficiencies,including fragmented legal norms,indeterminate allocation of data ownership,and supervisory inadequacy.This paper conducts a doctrinal inquiry into the legal relationships among multiple stakeholders across three principal dimensions:data service authorisation,data transmission and operation,and data supervision and safeguard.It proposes a regulatory framework based on a dual-track mechanism-combining top-down harmonisation of existing legal provisions with bottom-up implementation of data trusts-supported by a comprehensive oversight architecture involving government agencies,public interest organisations,and industry associations.This framework is intended to ensure the effective protection of the rights and interests of digitally vulnerable elderly individuals.
文摘This paper addresses the challenge of efficiently querying multimodal related data in data lakes,a large-scale storage and management system that supports heterogeneous data formats,including structured,semi-structured,and unstructured data.Multimodal data queries are crucial because they enable seamless retrieval of related data across modalities,such as tables,images,and text,which has applications in fields like e-commerce,healthcare,and education.However,existing methods primarily focus on single-modality queries,such as joinable or unionable table discovery,and struggle to handle the heterogeneity and lack of metadata in data lakes while balancing accuracy and efficiency.To tackle these challenges,we propose a Multimodal data Query mechanism for Data Lakes(MQDL),which employs a modality-adaptive indexing mechanism raleted and contrastive learning based embeddings to unify representations across modalities.Additionally,we introduce product quantization to optimize candidate verification during queries,reducing computational overhead while maintaining precision.We evaluate MQDL using a table-image dataset across multiple business scenarios,measuring metrics such as precision,recall,and F1-score.Results show that MQDL achieves an accuracy rate of approximately 90%,while demonstrating strong scalability and reduced query response time compared to traditional methods.These findings highlight MQDL's potential to enhance multimodal data retrieval in complex data lake environments.
文摘This paper presents the recent progress in our project of estimating near real-time electric fields and currents in the ionosphere through our computer system called the Geospace Environment Data Analysis System (GEDAS). We show a new technique in which data from ground magnetometers are collected by the system and used as input for the KRM and AMIE programs to calculate the distribution of ionospheric electric fields and currents, as well as of other ionospheric parameters, such as electric potential patterns. One of the goals of this project is to specify ionospheric processes. Examples of the near real-time calculation and the data flow of our scheme are presented.
基金Supported by the National Natural Science Foun-dation of China (70371015)
文摘Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining al gorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient.
基金supported by the National High-Tech Research and Development (863) Program of China (No. 2012AA012609)
文摘The rapid growth of structured data has presented new technological challenges in the research fields of big data and relational database. In this paper, we present an efficient system for managing and analyzing PB level structured data called Banian. Banian overcomes the storage structure limitation of relational database and effectively integrates interactive query with large-scale storage management. It provides a uniform query interface for cross-platform datasets and thus shows favorable compatibility and scalability. Banian's system architecture mainly includes three layers:(1) a storage layer using HDFS for the distributed storage of massive data;(2) a scheduling and execution layer employing the splitting and scheduling technology of parallel database; and(3)an application layer providing a cross-platform query interface and supporting standard SQL. We evaluate Banian using PB level Internet data and the TPC-H benchmark. The results show that when compared with Hive, Banian improves the query performance to a maximum of 30 times and achieves better scalability and concurrency.
基金Supported by the General Program of National Natural Science Foundation of China(72474012)The Major Statistical Projects of the National Bureau of Statistics(2022ZX21).
文摘Introduction:Research on the well-being of persons with disabilities(PWDs)has predominantly focused on objective living conditions and physical improvements,with insufficient attention to subjective experiences.This study addresses this gap by examining how rehabilitation service utilization enhances economic participation,thereby alleviating subjective relative deprivation(SRD).Methods:Data from 5,288 certified PWDs were analyzed using the National Sample Survey on Subjective Perceptions and Evaluation of Persons with Disabilities’Protection and Development(2023)in China.Linear regression and the Karlson,Holm,and Breen(KHB)method were employed.A heterogeneity analysis was conducted to evaluate subgroup variations.Results:Rehabilitation service utilization is negatively associated with SRD[β=−0.532,95%confidence interval(CI):−0.832,−0.231,P<0.001],with economic participation serving as a mediator in this relationship(KHB:β=−0.044,95%CI:−0.087,−0.001,P<0.05).The SRD-reducing effect of rehabilitation services is stronger among individuals with mild to moderate disabilities(β=−0.634,95%CI:−1.070,−0.197,P<0.01),those with at least a middle school education(β=−0.850,95%CI:−1.250,−0.450,P<0.001),and urban residents(β=−0.803,95%CI:−1.370,−0.236,P<0.01).The mediating effects are also more pronounced within these subgroups.Conclusions:Policies should prioritize enhancing rehabilitation services and employment support for PWDs,with particular focus on groups with mild to moderate disabilities,higher education backgrounds,and urban residents.Psychological interventions should also be implemented to mitigate SRD-related mental health risks.
文摘In this paper, an attempt has been made to find out the vertical distribution of RH at levels of 850, 700 and 500 hPa by using satellite-derived radiation parameters (i.e., albedo, outgoing longwave fluxes, absorb- ed solar radiation and net radiation). For this purpose, multiple regression equations are derived from MONEX-79 upsonde and dropsonde data over the Arabian Sea for the period 11--20 June 1979. Satellite- estimated RH fields have been compared with ECMWF RH fields obtained from FGGE level ⅢB data. The RMS error and error variance for satellite-estimated RH fields have been found to be less than for those of ECMWF. Satellite-estimated isohygric patterns show good agreement with the cloudiness patterns of GOES satellite, whereas ECMWF isohygric patterns do not show much resemblance with the cloudiness patterns. The results of the study suggest that satellite-estimated RH fields could be more useful than ECMWF RH fields and they can be used with some confidence in NWP models.
文摘Background The "National" Health Insurance (NHI) in Taiwan, China is a single-payer system that was introduced in 1995 to provide universal health care. It is worth noting that three stakeholders are involved in Taiwan's NHI, which can be seen as a triangular governance regime between the Bureau of "National" Health Insurance (BNHI), the insured and providers. Accordingly, this study intended to assess the efficiency of various different production processes that occur among these stakeholders in Taiwan's NHI system. Methods A two-stage relational Data Envelopment Analysis (DEA) model is adopted to investigate the sub-process efficiencies of the health care resources held by 23 cities and counties through stages I or II, where the outputs of the first stage serve the inputs of the second. The dataset was collected from the annual reports published by the Department of Health, Taiwan, China. Results Under the proposed framework, the efficiency of the whole process can be obtained from the product of productivity and allocative efficiency. Ten DMUs are efficient either in stages I or II, with only two DMUs being efficient with regard to both sub-processes. Conclusion The relational DEA model not only demonstrates the physical relationship between the whole process and the sub-process components, but also produces reliable outcomes in efficiency measurement among different stakeholders in Taiwan's NHI system.