Fine art authentication plays a significant role in protecting cultural heritage and ensuring the integrity of artworks.Traditional authentication methods require professionals to collect many reference materials and ...Fine art authentication plays a significant role in protecting cultural heritage and ensuring the integrity of artworks.Traditional authentication methods require professionals to collect many reference materials and conduct detailed analyses.To ease the difficulty,we collaborate with domain experts to develop a GPT-based agent,namely ArtEyer,that offers accurate attributions,determines the origin and authorship,and executes visual analytics.Despite the convenience of the conversational user interface,novice users may still face challenges due to the hallucination issue and the steep learning curve associated with prompting.To face these obstacles,we propose a novel solution that places interactive data visualizations into the conversations.We create contextual visualizations from an external domain-dependent database to ensure data trustworthiness and allow users to provide precise instructions to the agent by interacting directly with these visualizations,thus overcoming the vagueness inherent in natural language-based prompting.We evaluate ArtEyer through an in-lab user study and demonstrate its usage with a real-world case.展开更多
Many bioinformatics applications require determining the class of a newly sequenced Deoxyribonucleic acid(DNA)sequence,making DNA sequence classification an integral step in performing bioinformatics analysis,where la...Many bioinformatics applications require determining the class of a newly sequenced Deoxyribonucleic acid(DNA)sequence,making DNA sequence classification an integral step in performing bioinformatics analysis,where large biomedical datasets are transformed into valuable knowledge.Existing methods rely on a feature extraction step and suffer from high computational time requirements.In contrast,newer approaches leveraging deep learning have shown significant promise in enhancing accuracy and efficiency.In this paper,we investigate the performance of various deep learning architectures:Convolutional Neural Network(CNN),CNN-Long Short-Term Memory(CNNLSTM),CNN-Bidirectional Long Short-Term Memory(CNN-BiLSTM),Residual Network(ResNet),and InceptionV3 for DNA sequence classification.Various numerical and visual data representation techniques are utilized to represent the input datasets,including:label encoding,k-mer sentence encoding,k-mer one-hot vector,Frequency Chaos Game Representation(FCGR)and 5-Color Map(ColorSquare).Three datasets are used for the training of the models including H3,H4 and DNA Sequence Dataset(Yeast,Human,Arabidopsis Thaliana).Experiments are performed to determine which combination of DNA representation and deep learning architecture yields improved performance for the classification task.Our results indicate that using a hybrid CNN-LSTM neural network trained on DNA sequences represented as one-hot encoded k-mer sequences yields the best performance,achieving an accuracy of 92.1%.展开更多
A visualization tool was developed through a web browser based on Java applets embedded into HTML pages, in order to provide a world access to the EAST experimental data. It can display data from various trees in diff...A visualization tool was developed through a web browser based on Java applets embedded into HTML pages, in order to provide a world access to the EAST experimental data. It can display data from various trees in different servers in a single panel. With WebScope, it is easier to make a comparison between different data sources and perform a simple calculation over different data sources.展开更多
Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing tmtrained intrusion detection systems (IDSs). Therefore, greater attention has been di...Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing tmtrained intrusion detection systems (IDSs). Therefore, greater attention has been directed on being able deciphering better methods for identifying attack types to train IDSs more effectively. Keycyber-attack insights exist in big data; however, an efficient approach is required to determine strong attack types to train IDSs to become more effective in key areas. Despite the rising growth in IDS research, there is a lack of studies involving big data visualization, which is key. The KDD99 data set has served as a strong benchmark since 1999; therefore, we utilized this data set in our experiment. In this study, we utilized hash algorithm, a weight table, and sampling method to deal with the inherent problems caused by analyzing big data; volume, variety, and velocity. By utilizing a visualization algorithm, we were able to gain insights into the KDD99 data set with a clear iden- tification of "normal" clusters and described distinct clusters of effective attacks.展开更多
The availability and quantity of remotely sensed and terrestrial geospatial data sets are on the rise.Historically,these data sets have been analyzed and quarried on 2D desktop computers;however,immersive technologies...The availability and quantity of remotely sensed and terrestrial geospatial data sets are on the rise.Historically,these data sets have been analyzed and quarried on 2D desktop computers;however,immersive technologies and specifically immersive virtual reality(iVR)allow for the integration,visualization,analysis,and exploration of these 3D geospatial data sets.iVR can deliver remote and large-scale geospatial data sets to the laboratory,providing embodied experiences of field sites across the earth and beyond.We describe a workflow for the ingestion of geospatial data sets and the development of an iVR workbench,and present the application of these for an experience of Iceland’s Thrihnukar volcano where we:(1)combined satellite imagery with terrain elevation data to create a basic reconstruction of the physical site;(2)used terrestrial LiDAR data to provide a geo-referenced point cloud model of the magmatic-volcanic system,as well as the LiDAR intensity values for the identification of rock types;and(3)used Structure-from-Motion(SfM)to construct a photorealistic point cloud of the inside volcano.The workbench provides tools for the direct manipulation of the georeferenced data sets,including scaling,rotation,and translation,and a suite of geometric measurement tools,including length,area,and volume.Future developments will be inspired by an ongoing user study that formally evaluates the workbench’s mature components in the context of fieldwork and analyses activities.展开更多
This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while ...This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while maintaining data quality.We contributed to meeting the challenges of big data visualization using the embedded method based“Select from model(SFM)”method by using“Random forest Importance algorithm(RFI)”and comparing it with the filter method by using“Select percentile(SP)”method based chi square“Chi2”tool for selecting the most important features,which are then fed into a classification process using the logistic regression(LR)algorithm and the k-nearest neighbor(KNN)algorithm.Thus,the classification accuracy(AC)performance of LRis also compared to theKNN approach in python on eight data sets to see which method produces the best rating when feature selection methods are applied.Consequently,the study concluded that the feature selection methods have a significant impact on the analysis and visualization of the data after removing the repetitive data and the data that do not affect the goal.After making several comparisons,the study suggests(SFMLR)using SFM based on RFI algorithm for feature selection,with LR algorithm for data classify.The proposal proved its efficacy by comparing its results with recent literature.展开更多
In recent years,with the wide application of image data visual extraction technology in the field of industrial engineering,the development of industrial economy has reached a new situation.To explore the interaction ...In recent years,with the wide application of image data visual extraction technology in the field of industrial engineering,the development of industrial economy has reached a new situation.To explore the interaction between the pellet microstructure and compressive strength,firstly,the pellet microstructure needed for the experiment was obtained using a Leica DM4500P microscope.The area proportions of hematite,calcium ferrite,magnetite,calcium silicate and pore in pellet microstructure were extracted by visual extraction technology of image data.Moreover,the relationship between the area proportions of mineral components and compressive strength was established by backpropagation neural network(BPNN),generalized regression neural network(GRNN)and beetle antennae search-generalized regression neural network(BAS-GRNN)algorithms,which proves that the pellet microstructure can be used as the prediction standard of compressive strength.The errors of BPNN and BAS-GRNN are 5.13%and 3.37%,respectively,both of which are less than 5.5%.Therefore,through data visualization,we are able to discuss the connection between various components of pellet microstructure and compressive strength and provide new research ideas for improving the compressive strength and metallurgical performance of pellet.展开更多
A database system,known as the large PMT characterization and instrumentation database system(LPMT-CIDS),was designed and implemented for the Jiangmen Underground Neutrino Observatory(JUNO).The system is based on a Li...A database system,known as the large PMT characterization and instrumentation database system(LPMT-CIDS),was designed and implemented for the Jiangmen Underground Neutrino Observatory(JUNO).The system is based on a Linux+Apache+MySQL+PHP(LAMP)server and focuses on modularization and architecture separation.It covers all the testing stages for the 20-inch photomultiplier tubes(PMTs)at JUNO and provides its users with data storage,analysis,and visualization services.Based on the successful use of the system in the 20-inch PMT testing program,its design approach and construction elements can be extended to other projects.展开更多
Exploration of artworks is enjoyable but often time consuming.For example,it is not always easy to discover the favorite types of unknown painting works.It is not also always easy to explore unpopular painting works w...Exploration of artworks is enjoyable but often time consuming.For example,it is not always easy to discover the favorite types of unknown painting works.It is not also always easy to explore unpopular painting works which looks similar to painting works created by famous artists.This paper presents a painting image browser which assists the explorative discovery of user-interested painting works.The presented browser applies a new multidimensional data visualization technique that highlights particular ranges of particular numeric values based on association rules to suggest cues to find favorite painting images.This study assumes a large number of painting images are provided where categorical information(e.g.,names of artists,created year)is assigned to the images.The presented system firstly calculates the feature values of the images as a preprocessing step.Then the browser visualizes the multidimensional feature values as a heatmap and highlights association rules discovered from the relationships between the feature values and categorical information.This mechanism enables users to explore favorite painting images or painting images that look similar to famous painting works.Our case study and user evaluation demonstrates the effectiveness of the presented image browser.展开更多
Water resources are one of the basic resources for human survival,and water protection has been becoming a major problem for countries around the world.However,most of the traditional water quality monitoring research...Water resources are one of the basic resources for human survival,and water protection has been becoming a major problem for countries around the world.However,most of the traditional water quality monitoring research work is still concerned with the collection of water quality indicators,and ignored the analysis of water quality monitoring data and its value.In this paper,by adopting Laravel and AdminTE framework,we introduced how to design and implement a water quality data visualization platform based on Baidu ECharts.Through the deployed water quality sensor,the collected water quality indicator data is transmitted to the big data processing platform that deployed on Tencent Cloud in real time through the 4G network.The collected monitoring data is analyzed,and the processing result is visualized by Baidu ECharts.The test results showed that the designed system could run well and will provide decision support for water resource protection.展开更多
With the rapid development of the Internet,many enterprises have launched their network platforms.When users browse,search,and click the products of these platforms,most platforms will keep records of these network be...With the rapid development of the Internet,many enterprises have launched their network platforms.When users browse,search,and click the products of these platforms,most platforms will keep records of these network behaviors,these records are often heterogeneous,and it is called log data.To effectively to analyze and manage these heterogeneous log data,so that enterprises can grasp the behavior characteristics of their platform users in time,to realize targeted recommendation of users,increase the sales volume of enterprises’products,and accelerate the development of enterprises.Firstly,we follow the process of big data collection,storage,analysis,and visualization to design the system,then,we adopt HDFS storage technology,Yarn resource management technology,and gink load balancing technology to build a Hadoop cluster to process the log data,and adopt MapReduce processing technology and data warehouse hive technology analyze the log data to obtain the results.Finally,the obtained results are displayed visually,and a log data analysis system is successfully constructed.It has been proved by practice that the system effectively realizes the collection,analysis and visualization of log data,and can accurately realize the recommendation of products by enterprises.The system is stable and effective.展开更多
The study of marine data visualization is of great value. Marine data, due to its large scale, random variation and multiresolution in nature, are hard to be visualized and analyzed. Nowadays, constructing an ocean mo...The study of marine data visualization is of great value. Marine data, due to its large scale, random variation and multiresolution in nature, are hard to be visualized and analyzed. Nowadays, constructing an ocean model and visualizing model results have become some of the most important research topics of ‘Digital Ocean'. In this paper, a spherical ray casting method is developed to improve the traditional ray-casting algorithm and to make efficient use of GPUs. Aiming at the ocean current data, a 3D view-dependent line integral convolution method is used, in which the spatial frequency is adapted according to the distance from a camera. The study is based on a 3D virtual reality and visualization engine, namely the VV-Ocean. Some interactive operations are also provided to highlight the interesting structures and the characteristics of volumetric data. Finally, the marine data gathered in the East China Sea are displayed and analyzed. The results show that the method meets the requirements of real-time and interactive rendering.展开更多
One of the most indispensable needs of life is food and its worldwide availability endorsement has made agriculture an essential sector in recent years. As the technology evolved, the need to maintain a good and suita...One of the most indispensable needs of life is food and its worldwide availability endorsement has made agriculture an essential sector in recent years. As the technology evolved, the need to maintain a good and suitable climate in the greenhouse became imperative to ensure that the indoor plants are more productive hence the agriculture sector was not left behind. That notwithstanding, the introduction and deployment of IoT technology in agriculture solves many problems and increases crop production. This paper focuses mainly on the deployment of the Internet of Things (IoT) in acquiring real- time data of environmental parameters in the greenhouse. Various IoT technologies that can be applicable in greenhouse monitoring system was presented and in the proposed model, a method is developed to send the air temperature and humidity data obtained by the DHT11 sensor to the cloud using an ESP8266-based NodeMCU and firstly to the cloud platform Thing- Speak, and then to Adafruit.IO in which MQTT protocol was used for the reception of sensor data to the application layer referred as Human-Machine Interface. The system has been completely implemented in an actual prototype, allowing the acquiring of data and the publisher/subscriber concept used for communication. The data is published with a broker’s aid, which is responsible for transferring messages to the intended clients based on topic choice. Lastly, the functionality testing of MQTT was carried out and the results showed that the messages are successfully published.展开更多
This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technol...This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technology,focusing on analyzing the current application status of computer science and technology in big data,including data storage,data processing,and data analysis.Then,it proposes development strategies for big data processing.Computer science and technology play a vital role in big data processing by providing strong technical support.展开更多
The Growth Value Model(GVM)proposed theoretical closed form formulas consist-ing of Return on Equity(ROE)and the Price-to-Book value ratio(P/B)for fair stock prices and expected rates of return.Although regression ana...The Growth Value Model(GVM)proposed theoretical closed form formulas consist-ing of Return on Equity(ROE)and the Price-to-Book value ratio(P/B)for fair stock prices and expected rates of return.Although regression analysis can be employed to verify these theoretical closed form formulas,they cannot be explored by classical quintile or decile sorting approaches with intuition due to the essence of multi-factors and dynamical processes.This article uses visualization techniques to help intuitively explore GVM.The discerning findings and contributions of this paper is that we put forward the concept of the smart frontier,which can be regarded as the reasonable lower limit of P/B at a specific ROE by exploring fair P/B with ROE-P/B 2D dynamical process visualization.The coefficients in the formula can be determined by the quantile regression analysis with market data.The moving paths of the ROE and P/B in the cur-rent quarter and the subsequent quarters show that the portfolios at the lower right of the curve approaches this curve and stagnates here after the portfolios are formed.Furthermore,exploring expected rates of return with ROE-P/B-Return 3D dynamical process visualization,the results show that the data outside of the lower right edge of the“smart frontier”has positive quarterly return rates not only in the t+1 quarter but also in the t+2 quarter.The farther away the data in the t quarter is from the“smart frontier”,the larger the return rates in the t+1 and t+2 quarter.展开更多
Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challen...Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challenges that hinder the performance of supervised learning classifiers on key evaluation metrics,limiting their overall effectiveness.This study presents a comprehensive review of both common and recently developed Supervised Learning Classifiers(SLCs)and evaluates their performance in data-driven decision-making.The evaluation uses various metrics,with a particular focus on the Harmonic Mean Score(F-1 score)on an imbalanced real-world bank target marketing dataset.The findings indicate that grid-search random forest and random-search random forest excel in Precision and area under the curve,while Extreme Gradient Boosting(XGBoost)outperforms other traditional classifiers in terms of F-1 score.Employing oversampling methods to address the imbalanced data shows significant performance improvement in XGBoost,delivering superior results across all metrics,particularly when using the SMOTE variant known as the BorderlineSMOTE2 technique.The study concludes several key factors for effectively addressing the challenges of supervised learning with imbalanced datasets.These factors include the importance of selecting appropriate datasets for training and testing,choosing the right classifiers,employing effective techniques for processing and handling imbalanced datasets,and identifying suitable metrics for performance evaluation.Additionally,factors also entail the utilisation of effective exploratory data analysis in conjunction with visualisation techniques to yield insights conducive to data-driven decision-making.展开更多
Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the presen...Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the present digital era of big data,there is a lack of visual representations of the knowledge present in scholarly articles,and a time-saving approach for a literature search and visual navigation is warranted.The majority of knowledge display tools cannot cope with current big data trends and pose limitations in meeting the requirements of automatic knowledge representation,storage,and dynamic visualization.To address this limitation,the main aim of this paper is to model the visualization of unstructured data and explore the feasibility of achieving visual navigation for researchers to gain insight into the knowledge hidden in scientific articles of digital repositories.Contemporary topics of research and practice,including modifiable risk factors leading to a dramatic increase in Alzheimer’s disease and other forms of dementia,warrant deeper insight into the evidence-based knowledge available in the literature.The goal is to provide researchers with a visual-based easy traversal through a digital repository of research articles.This paper takes the first step in proposing a novel integrated model using knowledge maps and next-generation graph datastores to achieve a semantic visualization with domain-specific knowledge,such as dementia risk factors.The model facilitates a deep conceptual understanding of the literature by automatically establishing visual relationships among the extracted knowledge from the big data resources of research articles.It also serves as an automated tool for a visual navigation through the knowledge repository for faster identification of dementia risk factors reported in scholarly articles.Further,it facilitates a semantic visualization and domain-specific knowledge discovery from a large digital repository and their associations.In this study,the implementation of the proposed model in the Neo4j graph data repository,along with the results achieved,is presented as a proof of concept.Using scholarly research articles on dementia risk factors as a case study,automatic knowledge extraction,storage,intelligent search,and visual navigation are illustrated.The implementation of contextual knowledge and its relationship for a visual exploration by researchers show promising results in the knowledge discovery of dementia risk factors.Overall,this study demonstrates the significance of a semantic visualization with the effective use of knowledge maps and paves the way for extending visual modeling capabilities in the future.展开更多
Visual data mining is one of important approach of data mining techniques. Most of them are based on computer graphic techniques but few of them exploit image-processing techniques. This paper proposes an image proces...Visual data mining is one of important approach of data mining techniques. Most of them are based on computer graphic techniques but few of them exploit image-processing techniques. This paper proposes an image processing method, named RNAM (resemble neighborhood averaging method), to facilitate visual data mining, which is used to post-process the data mining result-image and help users to discover significant features and useful patterns effectively. The experiments show that the method is intuitive, easily-understanding and effectiveness. It provides a new approach for visual data mining.展开更多
The Dynamic Database of Solid-State Electrolyte(DDSE)is an advanced online platform offering a comprehensive suite of tools for solid-state battery research and development.Its key features include statistical analysi...The Dynamic Database of Solid-State Electrolyte(DDSE)is an advanced online platform offering a comprehensive suite of tools for solid-state battery research and development.Its key features include statistical analysis of both experimental and computational solid-state electrolyte(SSE)data,interactive visualization through dynamic charts,user data assessment,and literature analysis powered by a large language model.By facilitating the design and optimization of novel SSEs,DDSE serves as a critical resource for advancing solid-state battery technology.This Technical Report provides detailed tutorials and practical examples to guide users in effectively utilizing the platform.展开更多
Data visualization blends art and science to convey stories from data via graphical representations.Considering different problems,applications,requirements,and design goals,it is challenging to combine these two comp...Data visualization blends art and science to convey stories from data via graphical representations.Considering different problems,applications,requirements,and design goals,it is challenging to combine these two components at their full force.While the art component involves creating visually appealing and easily interpreted graphics for users,the science component requires accurate representations of a large amount of input data.With a lack of the science component,visualization cannot serve its role of creating correct representations of the actual data,thus leading to wrong perception,interpretation,and decision.It might be even worse if incorrect visual representations were intentionally produced to deceive the viewers.To address common pitfalls in graphical representations,this paper focuses on identifying and understanding the root causes of misinformation in graphical representations.We reviewed the misleading data visualization examples in the scientific publications collected from indexing databases and then projected them onto the fundamental units of visual communication such as color,shape,size,and spatial orientation.Moreover,a text mining technique was applied to extract practical insights from common visualization pitfalls.Cochran’s Q test and McNemar’s test were conducted to examine if there is any difference in the proportions of common errors among color,shape,size,and spatial orientation.The findings showed that the pie chart is the most misused graphical representation,and size is the most critical issue.It was also observed that there were statistically significant differences in the proportion of errors among color,shape,size,and spatial orientation.展开更多
基金This document contains the results of the research project funded by the National Social Science Fund of China (19ZDA046)NSF of China (62302440,U22A2032)+1 种基金China Postdoctoral Science Foundation (2023TQ0288)the Fundamental Research Funds for the Central Universities,China.
文摘Fine art authentication plays a significant role in protecting cultural heritage and ensuring the integrity of artworks.Traditional authentication methods require professionals to collect many reference materials and conduct detailed analyses.To ease the difficulty,we collaborate with domain experts to develop a GPT-based agent,namely ArtEyer,that offers accurate attributions,determines the origin and authorship,and executes visual analytics.Despite the convenience of the conversational user interface,novice users may still face challenges due to the hallucination issue and the steep learning curve associated with prompting.To face these obstacles,we propose a novel solution that places interactive data visualizations into the conversations.We create contextual visualizations from an external domain-dependent database to ensure data trustworthiness and allow users to provide precise instructions to the agent by interacting directly with these visualizations,thus overcoming the vagueness inherent in natural language-based prompting.We evaluate ArtEyer through an in-lab user study and demonstrate its usage with a real-world case.
基金funded by the Researchers Supporting Project number(RSPD2025R857),King Saud University,Riyadh,Saudi Arabia.
文摘Many bioinformatics applications require determining the class of a newly sequenced Deoxyribonucleic acid(DNA)sequence,making DNA sequence classification an integral step in performing bioinformatics analysis,where large biomedical datasets are transformed into valuable knowledge.Existing methods rely on a feature extraction step and suffer from high computational time requirements.In contrast,newer approaches leveraging deep learning have shown significant promise in enhancing accuracy and efficiency.In this paper,we investigate the performance of various deep learning architectures:Convolutional Neural Network(CNN),CNN-Long Short-Term Memory(CNNLSTM),CNN-Bidirectional Long Short-Term Memory(CNN-BiLSTM),Residual Network(ResNet),and InceptionV3 for DNA sequence classification.Various numerical and visual data representation techniques are utilized to represent the input datasets,including:label encoding,k-mer sentence encoding,k-mer one-hot vector,Frequency Chaos Game Representation(FCGR)and 5-Color Map(ColorSquare).Three datasets are used for the training of the models including H3,H4 and DNA Sequence Dataset(Yeast,Human,Arabidopsis Thaliana).Experiments are performed to determine which combination of DNA representation and deep learning architecture yields improved performance for the classification task.Our results indicate that using a hybrid CNN-LSTM neural network trained on DNA sequences represented as one-hot encoded k-mer sequences yields the best performance,achieving an accuracy of 92.1%.
基金supported by National Natural Science Foundation of China (No.10835009)Chinese Academy of Sciences for the Key Project of Knowledge Innovation Program (No.KJCX3.SYW.N4)Chinese Ministry of Sciences for the 973 project (No.2009GB103000)
文摘A visualization tool was developed through a web browser based on Java applets embedded into HTML pages, in order to provide a world access to the EAST experimental data. It can display data from various trees in different servers in a single panel. With WebScope, it is easier to make a comparison between different data sources and perform a simple calculation over different data sources.
文摘Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing tmtrained intrusion detection systems (IDSs). Therefore, greater attention has been directed on being able deciphering better methods for identifying attack types to train IDSs more effectively. Keycyber-attack insights exist in big data; however, an efficient approach is required to determine strong attack types to train IDSs to become more effective in key areas. Despite the rising growth in IDS research, there is a lack of studies involving big data visualization, which is key. The KDD99 data set has served as a strong benchmark since 1999; therefore, we utilized this data set in our experiment. In this study, we utilized hash algorithm, a weight table, and sampling method to deal with the inherent problems caused by analyzing big data; volume, variety, and velocity. By utilizing a visualization algorithm, we were able to gain insights into the KDD99 data set with a clear iden- tification of "normal" clusters and described distinct clusters of effective attacks.
基金This work was supported by the National Science Foundation[grant numbers 1526520 to AK and 0711456 to PL].
文摘The availability and quantity of remotely sensed and terrestrial geospatial data sets are on the rise.Historically,these data sets have been analyzed and quarried on 2D desktop computers;however,immersive technologies and specifically immersive virtual reality(iVR)allow for the integration,visualization,analysis,and exploration of these 3D geospatial data sets.iVR can deliver remote and large-scale geospatial data sets to the laboratory,providing embodied experiences of field sites across the earth and beyond.We describe a workflow for the ingestion of geospatial data sets and the development of an iVR workbench,and present the application of these for an experience of Iceland’s Thrihnukar volcano where we:(1)combined satellite imagery with terrain elevation data to create a basic reconstruction of the physical site;(2)used terrestrial LiDAR data to provide a geo-referenced point cloud model of the magmatic-volcanic system,as well as the LiDAR intensity values for the identification of rock types;and(3)used Structure-from-Motion(SfM)to construct a photorealistic point cloud of the inside volcano.The workbench provides tools for the direct manipulation of the georeferenced data sets,including scaling,rotation,and translation,and a suite of geometric measurement tools,including length,area,and volume.Future developments will be inspired by an ongoing user study that formally evaluates the workbench’s mature components in the context of fieldwork and analyses activities.
文摘This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods.To reduce the volume of big data and minimize model training time(Tt)while maintaining data quality.We contributed to meeting the challenges of big data visualization using the embedded method based“Select from model(SFM)”method by using“Random forest Importance algorithm(RFI)”and comparing it with the filter method by using“Select percentile(SP)”method based chi square“Chi2”tool for selecting the most important features,which are then fed into a classification process using the logistic regression(LR)algorithm and the k-nearest neighbor(KNN)algorithm.Thus,the classification accuracy(AC)performance of LRis also compared to theKNN approach in python on eight data sets to see which method produces the best rating when feature selection methods are applied.Consequently,the study concluded that the feature selection methods have a significant impact on the analysis and visualization of the data after removing the repetitive data and the data that do not affect the goal.After making several comparisons,the study suggests(SFMLR)using SFM based on RFI algorithm for feature selection,with LR algorithm for data classify.The proposal proved its efficacy by comparing its results with recent literature.
基金supported by the National Natural Science Foundation of China(51674121)Fund for Distinguished Youth Scholars in North China University of Science and Technology(JQ201705).
文摘In recent years,with the wide application of image data visual extraction technology in the field of industrial engineering,the development of industrial economy has reached a new situation.To explore the interaction between the pellet microstructure and compressive strength,firstly,the pellet microstructure needed for the experiment was obtained using a Leica DM4500P microscope.The area proportions of hematite,calcium ferrite,magnetite,calcium silicate and pore in pellet microstructure were extracted by visual extraction technology of image data.Moreover,the relationship between the area proportions of mineral components and compressive strength was established by backpropagation neural network(BPNN),generalized regression neural network(GRNN)and beetle antennae search-generalized regression neural network(BAS-GRNN)algorithms,which proves that the pellet microstructure can be used as the prediction standard of compressive strength.The errors of BPNN and BAS-GRNN are 5.13%and 3.37%,respectively,both of which are less than 5.5%.Therefore,through data visualization,we are able to discuss the connection between various components of pellet microstructure and compressive strength and provide new research ideas for improving the compressive strength and metallurgical performance of pellet.
基金supported by the National Natural Science Foundation of China (No. 11675273)the Strategic Priority Research Program of the Chinese Academy of Sciences (No. XDA10011102)
文摘A database system,known as the large PMT characterization and instrumentation database system(LPMT-CIDS),was designed and implemented for the Jiangmen Underground Neutrino Observatory(JUNO).The system is based on a Linux+Apache+MySQL+PHP(LAMP)server and focuses on modularization and architecture separation.It covers all the testing stages for the 20-inch photomultiplier tubes(PMTs)at JUNO and provides its users with data storage,analysis,and visualization services.Based on the successful use of the system in the 20-inch PMT testing program,its design approach and construction elements can be extended to other projects.
文摘Exploration of artworks is enjoyable but often time consuming.For example,it is not always easy to discover the favorite types of unknown painting works.It is not also always easy to explore unpopular painting works which looks similar to painting works created by famous artists.This paper presents a painting image browser which assists the explorative discovery of user-interested painting works.The presented browser applies a new multidimensional data visualization technique that highlights particular ranges of particular numeric values based on association rules to suggest cues to find favorite painting images.This study assumes a large number of painting images are provided where categorical information(e.g.,names of artists,created year)is assigned to the images.The presented system firstly calculates the feature values of the images as a preprocessing step.Then the browser visualizes the multidimensional feature values as a heatmap and highlights association rules discovered from the relationships between the feature values and categorical information.This mechanism enables users to explore favorite painting images or painting images that look similar to famous painting works.Our case study and user evaluation demonstrates the effectiveness of the presented image browser.
基金This work is supported by National Natural Science Foundation of China 61304208by the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property Open Fund Project 20181901CRP04+2 种基金by the Scientific Research Fund of Hunan Province Education Department 18C0003by the Research Project on Teaching Reform in General Colleges and Universities,Hunan Provincial Education Department 20190147by the Hunan Normal University Ungraduated Innovation and Entrepreneurship Training Plan Project 2019127.
文摘Water resources are one of the basic resources for human survival,and water protection has been becoming a major problem for countries around the world.However,most of the traditional water quality monitoring research work is still concerned with the collection of water quality indicators,and ignored the analysis of water quality monitoring data and its value.In this paper,by adopting Laravel and AdminTE framework,we introduced how to design and implement a water quality data visualization platform based on Baidu ECharts.Through the deployed water quality sensor,the collected water quality indicator data is transmitted to the big data processing platform that deployed on Tencent Cloud in real time through the 4G network.The collected monitoring data is analyzed,and the processing result is visualized by Baidu ECharts.The test results showed that the designed system could run well and will provide decision support for water resource protection.
基金supported by the Huaihua University Science Foundation under Grant HHUY2019-24.
文摘With the rapid development of the Internet,many enterprises have launched their network platforms.When users browse,search,and click the products of these platforms,most platforms will keep records of these network behaviors,these records are often heterogeneous,and it is called log data.To effectively to analyze and manage these heterogeneous log data,so that enterprises can grasp the behavior characteristics of their platform users in time,to realize targeted recommendation of users,increase the sales volume of enterprises’products,and accelerate the development of enterprises.Firstly,we follow the process of big data collection,storage,analysis,and visualization to design the system,then,we adopt HDFS storage technology,Yarn resource management technology,and gink load balancing technology to build a Hadoop cluster to process the log data,and adopt MapReduce processing technology and data warehouse hive technology analyze the log data to obtain the results.Finally,the obtained results are displayed visually,and a log data analysis system is successfully constructed.It has been proved by practice that the system effectively realizes the collection,analysis and visualization of log data,and can accurately realize the recommendation of products by enterprises.The system is stable and effective.
基金supported by the Natural Science Foundation of China under Project 41076115the Global Change Research Program of China under project 2012CB955603the Public Science and Technology Research Funds of the Ocean under project 201005019
文摘The study of marine data visualization is of great value. Marine data, due to its large scale, random variation and multiresolution in nature, are hard to be visualized and analyzed. Nowadays, constructing an ocean model and visualizing model results have become some of the most important research topics of ‘Digital Ocean'. In this paper, a spherical ray casting method is developed to improve the traditional ray-casting algorithm and to make efficient use of GPUs. Aiming at the ocean current data, a 3D view-dependent line integral convolution method is used, in which the spatial frequency is adapted according to the distance from a camera. The study is based on a 3D virtual reality and visualization engine, namely the VV-Ocean. Some interactive operations are also provided to highlight the interesting structures and the characteristics of volumetric data. Finally, the marine data gathered in the East China Sea are displayed and analyzed. The results show that the method meets the requirements of real-time and interactive rendering.
文摘One of the most indispensable needs of life is food and its worldwide availability endorsement has made agriculture an essential sector in recent years. As the technology evolved, the need to maintain a good and suitable climate in the greenhouse became imperative to ensure that the indoor plants are more productive hence the agriculture sector was not left behind. That notwithstanding, the introduction and deployment of IoT technology in agriculture solves many problems and increases crop production. This paper focuses mainly on the deployment of the Internet of Things (IoT) in acquiring real- time data of environmental parameters in the greenhouse. Various IoT technologies that can be applicable in greenhouse monitoring system was presented and in the proposed model, a method is developed to send the air temperature and humidity data obtained by the DHT11 sensor to the cloud using an ESP8266-based NodeMCU and firstly to the cloud platform Thing- Speak, and then to Adafruit.IO in which MQTT protocol was used for the reception of sensor data to the application layer referred as Human-Machine Interface. The system has been completely implemented in an actual prototype, allowing the acquiring of data and the publisher/subscriber concept used for communication. The data is published with a broker’s aid, which is responsible for transferring messages to the intended clients based on topic choice. Lastly, the functionality testing of MQTT was carried out and the results showed that the messages are successfully published.
文摘This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technology,focusing on analyzing the current application status of computer science and technology in big data,including data storage,data processing,and data analysis.Then,it proposes development strategies for big data processing.Computer science and technology play a vital role in big data processing by providing strong technical support.
文摘The Growth Value Model(GVM)proposed theoretical closed form formulas consist-ing of Return on Equity(ROE)and the Price-to-Book value ratio(P/B)for fair stock prices and expected rates of return.Although regression analysis can be employed to verify these theoretical closed form formulas,they cannot be explored by classical quintile or decile sorting approaches with intuition due to the essence of multi-factors and dynamical processes.This article uses visualization techniques to help intuitively explore GVM.The discerning findings and contributions of this paper is that we put forward the concept of the smart frontier,which can be regarded as the reasonable lower limit of P/B at a specific ROE by exploring fair P/B with ROE-P/B 2D dynamical process visualization.The coefficients in the formula can be determined by the quantile regression analysis with market data.The moving paths of the ROE and P/B in the cur-rent quarter and the subsequent quarters show that the portfolios at the lower right of the curve approaches this curve and stagnates here after the portfolios are formed.Furthermore,exploring expected rates of return with ROE-P/B-Return 3D dynamical process visualization,the results show that the data outside of the lower right edge of the“smart frontier”has positive quarterly return rates not only in the t+1 quarter but also in the t+2 quarter.The farther away the data in the t quarter is from the“smart frontier”,the larger the return rates in the t+1 and t+2 quarter.
基金support from the Cyber Technology Institute(CTI)at the School of Computer Science and Informatics,De Montfort University,United Kingdom,along with financial assistance from Universiti Tun Hussein Onn Malaysia and the UTHM Publisher’s office through publication fund E15216.
文摘Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challenges that hinder the performance of supervised learning classifiers on key evaluation metrics,limiting their overall effectiveness.This study presents a comprehensive review of both common and recently developed Supervised Learning Classifiers(SLCs)and evaluates their performance in data-driven decision-making.The evaluation uses various metrics,with a particular focus on the Harmonic Mean Score(F-1 score)on an imbalanced real-world bank target marketing dataset.The findings indicate that grid-search random forest and random-search random forest excel in Precision and area under the curve,while Extreme Gradient Boosting(XGBoost)outperforms other traditional classifiers in terms of F-1 score.Employing oversampling methods to address the imbalanced data shows significant performance improvement in XGBoost,delivering superior results across all metrics,particularly when using the SMOTE variant known as the BorderlineSMOTE2 technique.The study concludes several key factors for effectively addressing the challenges of supervised learning with imbalanced datasets.These factors include the importance of selecting appropriate datasets for training and testing,choosing the right classifiers,employing effective techniques for processing and handling imbalanced datasets,and identifying suitable metrics for performance evaluation.Additionally,factors also entail the utilisation of effective exploratory data analysis in conjunction with visualisation techniques to yield insights conducive to data-driven decision-making.
文摘Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the present digital era of big data,there is a lack of visual representations of the knowledge present in scholarly articles,and a time-saving approach for a literature search and visual navigation is warranted.The majority of knowledge display tools cannot cope with current big data trends and pose limitations in meeting the requirements of automatic knowledge representation,storage,and dynamic visualization.To address this limitation,the main aim of this paper is to model the visualization of unstructured data and explore the feasibility of achieving visual navigation for researchers to gain insight into the knowledge hidden in scientific articles of digital repositories.Contemporary topics of research and practice,including modifiable risk factors leading to a dramatic increase in Alzheimer’s disease and other forms of dementia,warrant deeper insight into the evidence-based knowledge available in the literature.The goal is to provide researchers with a visual-based easy traversal through a digital repository of research articles.This paper takes the first step in proposing a novel integrated model using knowledge maps and next-generation graph datastores to achieve a semantic visualization with domain-specific knowledge,such as dementia risk factors.The model facilitates a deep conceptual understanding of the literature by automatically establishing visual relationships among the extracted knowledge from the big data resources of research articles.It also serves as an automated tool for a visual navigation through the knowledge repository for faster identification of dementia risk factors reported in scholarly articles.Further,it facilitates a semantic visualization and domain-specific knowledge discovery from a large digital repository and their associations.In this study,the implementation of the proposed model in the Neo4j graph data repository,along with the results achieved,is presented as a proof of concept.Using scholarly research articles on dementia risk factors as a case study,automatic knowledge extraction,storage,intelligent search,and visual navigation are illustrated.The implementation of contextual knowledge and its relationship for a visual exploration by researchers show promising results in the knowledge discovery of dementia risk factors.Overall,this study demonstrates the significance of a semantic visualization with the effective use of knowledge maps and paves the way for extending visual modeling capabilities in the future.
基金Supported by the National Natural Science Foun-dation of China (60173051) ,the Teaching and Research Award Pro-gramfor Outstanding Young Teachers in Higher Education Institu-tions of Ministry of Education of China ,and Liaoning Province HigherEducation Research Foundation (20040206)
文摘Visual data mining is one of important approach of data mining techniques. Most of them are based on computer graphic techniques but few of them exploit image-processing techniques. This paper proposes an image processing method, named RNAM (resemble neighborhood averaging method), to facilitate visual data mining, which is used to post-process the data mining result-image and help users to discover significant features and useful patterns effectively. The experiments show that the method is intuitive, easily-understanding and effectiveness. It provides a new approach for visual data mining.
文摘The Dynamic Database of Solid-State Electrolyte(DDSE)is an advanced online platform offering a comprehensive suite of tools for solid-state battery research and development.Its key features include statistical analysis of both experimental and computational solid-state electrolyte(SSE)data,interactive visualization through dynamic charts,user data assessment,and literature analysis powered by a large language model.By facilitating the design and optimization of novel SSEs,DDSE serves as a critical resource for advancing solid-state battery technology.This Technical Report provides detailed tutorials and practical examples to guide users in effectively utilizing the platform.
文摘Data visualization blends art and science to convey stories from data via graphical representations.Considering different problems,applications,requirements,and design goals,it is challenging to combine these two components at their full force.While the art component involves creating visually appealing and easily interpreted graphics for users,the science component requires accurate representations of a large amount of input data.With a lack of the science component,visualization cannot serve its role of creating correct representations of the actual data,thus leading to wrong perception,interpretation,and decision.It might be even worse if incorrect visual representations were intentionally produced to deceive the viewers.To address common pitfalls in graphical representations,this paper focuses on identifying and understanding the root causes of misinformation in graphical representations.We reviewed the misleading data visualization examples in the scientific publications collected from indexing databases and then projected them onto the fundamental units of visual communication such as color,shape,size,and spatial orientation.Moreover,a text mining technique was applied to extract practical insights from common visualization pitfalls.Cochran’s Q test and McNemar’s test were conducted to examine if there is any difference in the proportions of common errors among color,shape,size,and spatial orientation.The findings showed that the pie chart is the most misused graphical representation,and size is the most critical issue.It was also observed that there were statistically significant differences in the proportion of errors among color,shape,size,and spatial orientation.