The study of plant diversity is often hindered by the challenge of integrating data from different sources and different data types.A standardized data system would facilitate detailed exploration of plant distributio...The study of plant diversity is often hindered by the challenge of integrating data from different sources and different data types.A standardized data system would facilitate detailed exploration of plant distribution patterns and dynamics for botanists,ecologists,conservation biologists,and biogeographers.This study proposes a gridded vector data integration method,combining grid-based techniques with vectorization to integrate diverse data types from multiple sources into grids of the same scale.Here we demonstrate the methodology by creating a comprehensive 1°×1°database of western China that includes plant distribution information and environmental factor data.This approach addresses the need for a standardized data system to facilitate exploration of plant distribution patterns and dynamic changes in the region.展开更多
This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contra...This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contrary to frequency estimation of a single attribute,the multidimensional aspect demands particular attention to the privacy budget.Besides,when collecting user statistics longitudinally,privacy progressively degrades.Indeed,the“multiple”settings in combination(i.e.,many attributes and several collections throughout time)impose several challenges,for which this paper proposes the first solution for frequency estimates under LDP.To tackle these issues,we extend the analysis of three state-of-the-art LDP protocols(Generalized Randomized Response–GRR,Optimized Unary Encoding–OUE,and Symmetric Unary Encoding–SUE)for both longitudinal and multidimensional data collections.While the known literature uses OUE and SUE for two rounds of sanitization(a.k.a.memoization),i.e.,L-OUE and L-SUE,respectively,we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility(i.e.,L-OSUE).Also,for attributes with small domain sizes,we propose Longitudinal GRR(L-GRR),which provides higher utility than the other protocols based on unary encoding.Last,we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates(ALLOMFREE),which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol,i.e.,either L-GRR or L-OSUE.As shown in the results,ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.展开更多
In order to explore the travel characteristics and space-time distribution of different groups of bikeshare users,an online analytical processing(OLAP)tool called data cube was used for treating and displaying multi-d...In order to explore the travel characteristics and space-time distribution of different groups of bikeshare users,an online analytical processing(OLAP)tool called data cube was used for treating and displaying multi-dimensional data.We extended and modified the traditionally threedimensional data cube into four dimensions,which are space,date,time,and user,each with a user-specified hierarchy,and took transaction numbers and travel time as two quantitative measures.The results suggest that there are two obvious transaction peaks during the morning and afternoon rush hours on weekdays,while the volume at weekends has an approximate even distribution.Bad weather condition significantly restricts the bikeshare usage.Besides,seamless smartcard users generally take a longer trip than exclusive smartcard users;and non-native users ride faster than native users.These findings not only support the applicability and efficiency of data cube in the field of visualizing massive smartcard data,but also raise equity concerns among bikeshare users with different demographic backgrounds.展开更多
In e-commerce the multidimensional data analysis based on the Web data needs integrating various data sources such as XML data and relational data on the conceptual level. A conceptual data description approach to mul...In e-commerce the multidimensional data analysis based on the Web data needs integrating various data sources such as XML data and relational data on the conceptual level. A conceptual data description approach to multidimensional data model the UML galaxy diagram is presented in order to conduct multidimensional data analysis for multiple subjects. The approach is illuminated using a case of 2_roots UML galaxy diagram that takes marketing analysis of TV products involved one retailer and several suppliers into consideration.展开更多
Surface quality has been one of the key factors influencing the ongoing improvement of the quality of steel. Therefore,it is urgent to provide methods for efficient supervision of surface defects. This paper first exp...Surface quality has been one of the key factors influencing the ongoing improvement of the quality of steel. Therefore,it is urgent to provide methods for efficient supervision of surface defects. This paper first expressed the main problems existing in defect management and then focused on constructing a data platform of surface defect management using a multidimensional database. Finally, some onqine applications of the platform at Baosteel were demonstrated. Results show that the constructed multidimensional database provides more structured defect data, and thus it is suitable for swift and multi-angle analysis of the defect data.展开更多
3D city models are widely used in many disciplines and applications,such as urban planning,disaster management,and environmental simulation.Usually,the terrain and embedded objects like buildings are taken into consid...3D city models are widely used in many disciplines and applications,such as urban planning,disaster management,and environmental simulation.Usually,the terrain and embedded objects like buildings are taken into consideration.A consistent model integrating these elements is vital for GIS analysis,especially if the geometry is accompanied by the topological relations between neighboring objects.Such a model allows for more efficient and errorless analysis.The memory consumption is another crucial aspect when the wide area of a city is considered-light models are highly desirable.Three methods of the terrain representation using the geometrical-topological data structure-the dual half-edge-are proposed in this article.The integration of buildings and other structures like bridges with the terrain is also presented.展开更多
Recently,the expertise accumulated in the field of geovisualization has found application in the visualization of abstract multidimensional data,on the basis of methods called spatialization methods.Spatialization met...Recently,the expertise accumulated in the field of geovisualization has found application in the visualization of abstract multidimensional data,on the basis of methods called spatialization methods.Spatialization methods aim at visualizing multidimensional data into low-dimensional representational spaces by making use of spatial metaphors and applying dimension reduction techniques.Spatial metaphors are able to provide a metaphoric framework for the visualization of information at different levels of granularity.The present paper makes an investigation on how the issue of granularity is handled in the context of representative examples of spatialization methods.Furthermore,this paper introduces the prototyping tool Geo-Scape,which provides an interactive spatialization environment for representing and exploring multidimensional data at different levels of granularity,by making use of a kernel density estimation technique and on the landscape "smoothness" metaphor.A demonstration scenario is presented next to show how Geo-Scape helps to discover knowledge into a large set of data,by grouping them into meaningful clusters on the basis of a similarity measure and organizing them at different levels of granularity.展开更多
Data warehouses (DW) must integrate information from the different areas and sources of an organization in order to extract knowledge relevant to decision-making. The DW development is not an easy task, which is why v...Data warehouses (DW) must integrate information from the different areas and sources of an organization in order to extract knowledge relevant to decision-making. The DW development is not an easy task, which is why various design approaches have been put forward. These approaches can be classified in three different paradigms according to the origin of the information requirements: supply-driven, demand-driven, and hybrids of these. This article compares the methodologies for the multidimensional design of DW through a systematic mapping as research methodology. The study is presented for each paradigm, the main characteristics of the methodologies, their notations and problem areas exhibited in each one of them. The results indicate that there is no follow-up to the complete process of implementing a DW in either an academic or industrial environment;however, there is also no evidence that the attempt is made to address the design and development of a DW by applying and comparing different methodologies existing in the field.展开更多
Data structure and semantics of the traditional data model cannot effectively represent the data warehouse, it is difficult to effectively support online analytical processing (referred to as OLAP). This paper is pr...Data structure and semantics of the traditional data model cannot effectively represent the data warehouse, it is difficult to effectively support online analytical processing (referred to as OLAP). This paper is propose a new multidimensional data model based on the partial ordering and mapping. The data model can fully express the complex data structure and semantics of data warehouse, and provide an OLAP operation as the core of the operation of algebra, support structure in levels of complex aggregation operation sequence, which can effectively support the application of OLAE The data model supports the concept of aggregation function constraint, and provides constraint mechanism of the hierarchy aggregation function.展开更多
In order to exchange and share information among the conceptual models of data warehouse, and to build a solid base for the integration and share of metadata, a new multidimensional concept model is presented based on...In order to exchange and share information among the conceptual models of data warehouse, and to build a solid base for the integration and share of metadata, a new multidimensional concept model is presented based on XML and its DTD is defined, which can perfectly describe various semantic characteristics of multidimensional conceptual model. According to the multidimensional conceptual modeling technique which is based on UML, the mapping algorithm between the multidimensional conceptual model is described based on XML and UML class diagram, and an application base for the wide use of this technique is given.展开更多
In this study, rural poverty in Iran is investigated applying a multidimensional approach, association rules mining technique, and Levine, F and Tukey tests to household data of 2008. The results indicate that poverty...In this study, rural poverty in Iran is investigated applying a multidimensional approach, association rules mining technique, and Levine, F and Tukey tests to household data of 2008. The results indicate that poverty in its multi-dimensions is an epidemic problem in rural Iran. The results also exhibit that there are 11 patterns of poverty in the rural areas including four main patterns with 99.62% coverage and seven sub-patterns with nearly 0.38% coverage. In these patterns, housing and household education are the most important dimensions of poverty and income poverty is the least important dimension. Government income support policy to households, in enforcement the law of targeting subsidies, cannot be regarded as pro poor policy but it follows other political aspects.展开更多
Knowledge discovery, as an increasingly adopted information technology in biomedical science, has shown great promise in the field of Traditional Chinese Medicine (TCM). In this paper, we provided a kind of multidimen...Knowledge discovery, as an increasingly adopted information technology in biomedical science, has shown great promise in the field of Traditional Chinese Medicine (TCM). In this paper, we provided a kind of multidimensional table which was well suited for organizing and analyzing the data in ancient Chinese books on Materia Medica. Moreover, we demonstrated its capability of facilitating further mining works in TCM through two illustrative studies of discovering meaningful patterns in the three-dimensional table of Shennong’s Classic of Materia Medica. This work might provide an appropriate data model for the development of knowledge discovery in TCM.展开更多
Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In exist...Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes.展开更多
In the satellite-to-ground high-speed data transmission link,there are signal self-interference problems of symbols in the co-channel,as well as between orthogonal and polarized channels.A multichannel adaptive filter...In the satellite-to-ground high-speed data transmission link,there are signal self-interference problems of symbols in the co-channel,as well as between orthogonal and polarized channels.A multichannel adaptive filter is designed by constructing a multichannel Wiener-Hopf equation,and the influence of five channel nonideal factors is suppressed to improve the BER performance.Experiments show that this method is effective to suppress the signal selfinterference,and the BER floor is optimized from 1E3 to 1E-7.展开更多
基金supported by the Second Tibetan Plateau Scientific Expedition and Research(STEP)program(2019QZKK0502)the National Natural Science Foundation of China(32322006)+1 种基金the Major Program for Basic Research Project of Yunnan Province(202103AF140005 and 202101BC070002)the Practice Innovation Fund for Professional Degree Graduates of Yunnan University(ZC-22222401).
文摘The study of plant diversity is often hindered by the challenge of integrating data from different sources and different data types.A standardized data system would facilitate detailed exploration of plant distribution patterns and dynamics for botanists,ecologists,conservation biologists,and biogeographers.This study proposes a gridded vector data integration method,combining grid-based techniques with vectorization to integrate diverse data types from multiple sources into grids of the same scale.Here we demonstrate the methodology by creating a comprehensive 1°×1°database of western China that includes plant distribution information and environmental factor data.This approach addresses the need for a standardized data system to facilitate exploration of plant distribution patterns and dynamic changes in the region.
基金supported by the Agence Nationale de la Recherche(ANR)(contract“ANR-17-EURE-0002”)by the Region of Bourgogne Franche-ComtéCADRAN Projectsupported by the European Research Council(ERC)project HYPATIA under the European Union's Horizon 2020 research and innovation programme.Grant agreement n.835294。
文摘This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contrary to frequency estimation of a single attribute,the multidimensional aspect demands particular attention to the privacy budget.Besides,when collecting user statistics longitudinally,privacy progressively degrades.Indeed,the“multiple”settings in combination(i.e.,many attributes and several collections throughout time)impose several challenges,for which this paper proposes the first solution for frequency estimates under LDP.To tackle these issues,we extend the analysis of three state-of-the-art LDP protocols(Generalized Randomized Response–GRR,Optimized Unary Encoding–OUE,and Symmetric Unary Encoding–SUE)for both longitudinal and multidimensional data collections.While the known literature uses OUE and SUE for two rounds of sanitization(a.k.a.memoization),i.e.,L-OUE and L-SUE,respectively,we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility(i.e.,L-OSUE).Also,for attributes with small domain sizes,we propose Longitudinal GRR(L-GRR),which provides higher utility than the other protocols based on unary encoding.Last,we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates(ALLOMFREE),which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol,i.e.,either L-GRR or L-OSUE.As shown in the results,ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.
基金Supported by Projects of International Cooperation and Exchange of the National Natural Science Foundation of China(51561135003)Key Project of National Natural Science Foundation of China(51338003)Scientific Research Foundation of Graduated School of Southeast University(YBJJ1842)
文摘In order to explore the travel characteristics and space-time distribution of different groups of bikeshare users,an online analytical processing(OLAP)tool called data cube was used for treating and displaying multi-dimensional data.We extended and modified the traditionally threedimensional data cube into four dimensions,which are space,date,time,and user,each with a user-specified hierarchy,and took transaction numbers and travel time as two quantitative measures.The results suggest that there are two obvious transaction peaks during the morning and afternoon rush hours on weekdays,while the volume at weekends has an approximate even distribution.Bad weather condition significantly restricts the bikeshare usage.Besides,seamless smartcard users generally take a longer trip than exclusive smartcard users;and non-native users ride faster than native users.These findings not only support the applicability and efficiency of data cube in the field of visualizing massive smartcard data,but also raise equity concerns among bikeshare users with different demographic backgrounds.
基金This project was supported by China Postdoctoral Science Foundation (2005037506) and the National Natural ScienceFoundation of China (70472029)
文摘In e-commerce the multidimensional data analysis based on the Web data needs integrating various data sources such as XML data and relational data on the conceptual level. A conceptual data description approach to multidimensional data model the UML galaxy diagram is presented in order to conduct multidimensional data analysis for multiple subjects. The approach is illuminated using a case of 2_roots UML galaxy diagram that takes marketing analysis of TV products involved one retailer and several suppliers into consideration.
文摘Surface quality has been one of the key factors influencing the ongoing improvement of the quality of steel. Therefore,it is urgent to provide methods for efficient supervision of surface defects. This paper first expressed the main problems existing in defect management and then focused on constructing a data platform of surface defect management using a multidimensional database. Finally, some onqine applications of the platform at Baosteel were demonstrated. Results show that the constructed multidimensional database provides more structured defect data, and thus it is suitable for swift and multi-angle analysis of the defect data.
基金The authors would like to thank sponsors for their support:research on the dual half-edge data structure was funded by the EPSRC and Ordnance Survey,UK(New CASE Award,2006−2010)Technical University of Malaysia and the Ministry of Science,Technology and Innovation,Malaysia(eScience 01-01-06-SF1046,Vot No.4S049)(2011−2014).
文摘3D city models are widely used in many disciplines and applications,such as urban planning,disaster management,and environmental simulation.Usually,the terrain and embedded objects like buildings are taken into consideration.A consistent model integrating these elements is vital for GIS analysis,especially if the geometry is accompanied by the topological relations between neighboring objects.Such a model allows for more efficient and errorless analysis.The memory consumption is another crucial aspect when the wide area of a city is considered-light models are highly desirable.Three methods of the terrain representation using the geometrical-topological data structure-the dual half-edge-are proposed in this article.The integration of buildings and other structures like bridges with the terrain is also presented.
文摘Recently,the expertise accumulated in the field of geovisualization has found application in the visualization of abstract multidimensional data,on the basis of methods called spatialization methods.Spatialization methods aim at visualizing multidimensional data into low-dimensional representational spaces by making use of spatial metaphors and applying dimension reduction techniques.Spatial metaphors are able to provide a metaphoric framework for the visualization of information at different levels of granularity.The present paper makes an investigation on how the issue of granularity is handled in the context of representative examples of spatialization methods.Furthermore,this paper introduces the prototyping tool Geo-Scape,which provides an interactive spatialization environment for representing and exploring multidimensional data at different levels of granularity,by making use of a kernel density estimation technique and on the landscape "smoothness" metaphor.A demonstration scenario is presented next to show how Geo-Scape helps to discover knowledge into a large set of data,by grouping them into meaningful clusters on the basis of a similarity measure and organizing them at different levels of granularity.
文摘Data warehouses (DW) must integrate information from the different areas and sources of an organization in order to extract knowledge relevant to decision-making. The DW development is not an easy task, which is why various design approaches have been put forward. These approaches can be classified in three different paradigms according to the origin of the information requirements: supply-driven, demand-driven, and hybrids of these. This article compares the methodologies for the multidimensional design of DW through a systematic mapping as research methodology. The study is presented for each paradigm, the main characteristics of the methodologies, their notations and problem areas exhibited in each one of them. The results indicate that there is no follow-up to the complete process of implementing a DW in either an academic or industrial environment;however, there is also no evidence that the attempt is made to address the design and development of a DW by applying and comparing different methodologies existing in the field.
文摘Data structure and semantics of the traditional data model cannot effectively represent the data warehouse, it is difficult to effectively support online analytical processing (referred to as OLAP). This paper is propose a new multidimensional data model based on the partial ordering and mapping. The data model can fully express the complex data structure and semantics of data warehouse, and provide an OLAP operation as the core of the operation of algebra, support structure in levels of complex aggregation operation sequence, which can effectively support the application of OLAE The data model supports the concept of aggregation function constraint, and provides constraint mechanism of the hierarchy aggregation function.
文摘In order to exchange and share information among the conceptual models of data warehouse, and to build a solid base for the integration and share of metadata, a new multidimensional concept model is presented based on XML and its DTD is defined, which can perfectly describe various semantic characteristics of multidimensional conceptual model. According to the multidimensional conceptual modeling technique which is based on UML, the mapping algorithm between the multidimensional conceptual model is described based on XML and UML class diagram, and an application base for the wide use of this technique is given.
文摘In this study, rural poverty in Iran is investigated applying a multidimensional approach, association rules mining technique, and Levine, F and Tukey tests to household data of 2008. The results indicate that poverty in its multi-dimensions is an epidemic problem in rural Iran. The results also exhibit that there are 11 patterns of poverty in the rural areas including four main patterns with 99.62% coverage and seven sub-patterns with nearly 0.38% coverage. In these patterns, housing and household education are the most important dimensions of poverty and income poverty is the least important dimension. Government income support policy to households, in enforcement the law of targeting subsidies, cannot be regarded as pro poor policy but it follows other political aspects.
文摘Knowledge discovery, as an increasingly adopted information technology in biomedical science, has shown great promise in the field of Traditional Chinese Medicine (TCM). In this paper, we provided a kind of multidimensional table which was well suited for organizing and analyzing the data in ancient Chinese books on Materia Medica. Moreover, we demonstrated its capability of facilitating further mining works in TCM through two illustrative studies of discovering meaningful patterns in the three-dimensional table of Shennong’s Classic of Materia Medica. This work might provide an appropriate data model for the development of knowledge discovery in TCM.
基金supported by National Natural Sciences Foundation of China(No.62271165,62027802,62201307)the Guangdong Basic and Applied Basic Research Foundation(No.2023A1515030297)+2 种基金the Shenzhen Science and Technology Program ZDSYS20210623091808025Stable Support Plan Program GXWD20231129102638002the Major Key Project of PCL(No.PCL2024A01)。
文摘Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes.
基金supported by the Natural Science Foundation for Outstanding Young Scholars of Heilongjiang Province under Grant YQ2020F001the National Key Research and Development Program of China under Grant 2021YFB2900500the Fundamental Research Funds for the Central Universities under Grant FRFCU 9803503821
文摘In the satellite-to-ground high-speed data transmission link,there are signal self-interference problems of symbols in the co-channel,as well as between orthogonal and polarized channels.A multichannel adaptive filter is designed by constructing a multichannel Wiener-Hopf equation,and the influence of five channel nonideal factors is suppressed to improve the BER performance.Experiments show that this method is effective to suppress the signal selfinterference,and the BER floor is optimized from 1E3 to 1E-7.