The exponential expansion of the Internet of Things(IoT),Industrial Internet of Things(IIoT),and Transportation Management of Things(TMoT)produces vast amounts of real-time streaming data.Ensuring system dependability...The exponential expansion of the Internet of Things(IoT),Industrial Internet of Things(IIoT),and Transportation Management of Things(TMoT)produces vast amounts of real-time streaming data.Ensuring system dependability,operational efficiency,and security depends on the identification of anomalies in these dynamic and resource-constrained systems.Due to their high computational requirements and inability to efficiently process continuous data streams,traditional anomaly detection techniques often fail in IoT systems.This work presents a resource-efficient adaptive anomaly detection model for real-time streaming data in IoT systems.Extensive experiments were carried out on multiple real-world datasets,achieving an average accuracy score of 96.06%with an execution time close to 7.5 milliseconds for each individual streaming data point,demonstrating its potential for real-time,resourceconstrained applications.The model uses Principal Component Analysis(PCA)for dimensionality reduction and a Z-score technique for anomaly detection.It maintains a low computational footprint with a sliding window mechanism,enabling incremental data processing and identification of both transient and sustained anomalies without storing historical data.The system uses a Multivariate Linear Regression(MLR)based imputation technique that estimates missing or corrupted sensor values,preserving data integrity prior to anomaly detection.The suggested solution is appropriate for many uses in smart cities,industrial automation,environmental monitoring,IoT security,and intelligent transportation systems,and is particularly well-suited for resource-constrained edge devices.展开更多
Continuous response of range query on steaming data provides useful information for many practical applications as well as the risk of privacy disclosure.The existing research on differential privacy streaming data pu...Continuous response of range query on steaming data provides useful information for many practical applications as well as the risk of privacy disclosure.The existing research on differential privacy streaming data publication mostly pay close attention to boosting query accuracy,but pay less attention to query efficiency,and ignore the effect of timeliness on data weight.In this paper,we propose an effective algorithm of differential privacy streaming data publication under exponential decay mode.Firstly,by introducing the Fenwick tree to divide and reorganize data items in the stream,we achieve a constant time complexity for inserting a new item and getting the prefix sum.Meanwhile,we achieve time complicity linear to the number of data item for building a tree.After that,we use the advantage of matrix mechanism to deal with relevant queries and reduce the global sensitivity.In addition,we choose proper diagonal matrix further improve the range query accuracy.Finally,considering about exponential decay,every data item is weighted by the decay factor.By putting the Fenwick tree and matrix optimization together,we present complete algorithm for differentiate private real-time streaming data publication.The experiment is designed to compare the algorithm in this paper with similar algorithms for streaming data release in exponential decay.Experimental results show that the algorithm in this paper effectively improve the query efficiency while ensuring the quality of the query.展开更多
In the era of big data, huge volumes of data are generated from online social networks, sensor networks, mobile devices, and organizations’ enterprise systems. This phenomenon provides organizations with unprecedente...In the era of big data, huge volumes of data are generated from online social networks, sensor networks, mobile devices, and organizations’ enterprise systems. This phenomenon provides organizations with unprecedented opportunities to tap into big data to mine valuable business intelligence. However, traditional business analytics methods may not be able to cope with the flood of big data. The main contribution of this paper is the illustration of the development of a novel big data stream analytics framework named BDSASA that leverages a probabilistic language model to analyze the consumer sentiments embedded in hundreds of millions of online consumer reviews. In particular, an inference model is embedded into the classical language modeling framework to enhance the prediction of consumer sentiments. The practical implication of our research work is that organizations can apply our big data stream analytics framework to analyze consumers’ product preferences, and hence develop more effective marketing and production strategies.展开更多
Opinion (sentiment) analysis on big data streams from the constantly generated text streams on social media networks to hundreds of millions of online consumer reviews provides many organizations in every field with o...Opinion (sentiment) analysis on big data streams from the constantly generated text streams on social media networks to hundreds of millions of online consumer reviews provides many organizations in every field with opportunities to discover valuable intelligence from the massive user generated text streams. However, the traditional content analysis frameworks are inefficient to handle the unprecedentedly big volume of unstructured text streams and the complexity of text analysis tasks for the real time opinion analysis on the big data streams. In this paper, we propose a parallel real time sentiment analysis system: Social Media Data Stream Sentiment Analysis Service (SMDSSAS) that performs multiple phases of sentiment analysis of social media text streams effectively in real time with two fully analytic opinion mining models to combat the scale of text data streams and the complexity of sentiment analysis processing on unstructured text streams. We propose two aspect based opinion mining models: Deterministic and Probabilistic sentiment models for a real time sentiment analysis on the user given topic related data streams. Experiments on the social media Twitter stream traffic captured during the pre-election weeks of the 2016 Presidential election for real-time analysis of public opinions toward two presidential candidates showed that the proposed system was able to predict correctly Donald Trump as the winner of the 2016 Presidential election. The cross validation results showed that the proposed sentiment models with the real-time streaming components in our proposed framework delivered effectively the analysis of the opinions on two presidential candidates with average 81% accuracy for the Deterministic model and 80% for the Probabilistic model, which are 1% - 22% improvements from the results of the existing literature.展开更多
The interleaving/multiplexing technique was used to realize a 200?MHz real time data acquisition system. Two 100?MHz ADC modules worked parallelly and every ADC plays out data in ping pang fashion. The design improv...The interleaving/multiplexing technique was used to realize a 200?MHz real time data acquisition system. Two 100?MHz ADC modules worked parallelly and every ADC plays out data in ping pang fashion. The design improved the system conversion rata to 200?MHz and reduced the speed of data transporting and storing to 50?MHz. The high speed HDPLD and ECL logic parts were used to control system timing and the memory address. The multi layer print board and the shield were used to decrease interference produced by the high speed circuit. The system timing was designed carefully. The interleaving/multiplexing technique could improve the system conversion rata greatly while reducing the speed of external digital interfaces greatly. The design resolved the difficulties in high speed system effectively. The experiment proved the data acquisition system is stable and accurate.展开更多
Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great s...Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great significance for exploiting and protecting the ocean.We used hourly mean wave height,temperature,and pressure real-time observation data taken in the Xiaomaidao station(in Qingdao,China)from June 1,2017,to May 31,2018,to explore the data quality using eight quality control methods,and to discriminate the most effective method for Xiaomaidao station.After using the eight quality control methods,the percentages of the mean wave height,temperature,and pressure data that passed the tests were 89.6%,88.3%,and 98.6%,respectively.With the marine disaster(wave alarm report)data,the values failed in the test mainly due to the influence of aging observation equipment and missing data transmissions.The mean wave height is often affected by dynamic marine disasters,so the continuity test method is not effective.The correlation test with other related parameters would be more useful for the mean wave height.展开更多
In the era of Big Data, typical architecture of distributed real-time stream processing systems is the combination of Flume, Kafka, and Storm. As a kind of distributed message system, Kafka has the characteristics of ...In the era of Big Data, typical architecture of distributed real-time stream processing systems is the combination of Flume, Kafka, and Storm. As a kind of distributed message system, Kafka has the characteristics of horizontal scalability and high throughput, which is manly deployed in many areas in order to address the problem of speed mismatch between message producers and consumers. When using Kafka, we need to quickly receive data sent by producers. In addition, we need to send data to consumers quickly. Therefore, the performance of Kafka is of critical importance to the performance of the whole stream processing system. In this paper, we propose the improved design of real-time stream processing systems, and focus on improving the Kafka's data loading process.We use Kafka cat to transfer data from the source to Kafka topic directly, which can reduce the network transmission. We also utilize the memory file system to accelerate the process of data loading, which can address the bottleneck and performance problems caused by disk I/O. Extensive experiments are conducted to evaluate the performance, which show the superiority of our improved design.展开更多
High-resolution vehicular emissions inventories are important for managing vehicular pollution and improving urban air quality. This study developed a vehicular emission inventory with high spatio-temporal resolution ...High-resolution vehicular emissions inventories are important for managing vehicular pollution and improving urban air quality. This study developed a vehicular emission inventory with high spatio-temporal resolution in the main urban area of Chongqing, based on realtime traffic data from 820 RFID detectors covering 454 roads, and the differences in spatiotemporal emission characteristics between inner and outer districts were analysed. The result showed that the daily vehicular emission intensities of CO, hydrocarbons, PM2.5, PM10,and NO_(x) were 30.24, 3.83, 0.18, 0.20, and 8.65 kg/km per day, respectively, in the study area during 2018. The pollutants emission intensities in inner district were higher than those in outer district. Light passenger cars(LPCs) were the main contributors of all-day CO emissions in the inner and outer districts, from which the contributors of NO_(x) emissions were different. Diesel and natural gas buses were major contributors of daytime NO_(x) emissions in inner districts, accounting for 40.40%, but buses and heavy duty trucks(HDTs) were major contributors in outer districts. At nighttime, due to the lifting of truck restrictions and suspension of buses, HDTs become the main NO_(x) contributor in both inner and outer districts,and its three NO_(x) emission peak hours were found, which are different to the peak hours of total NO_(x) emission by all vehicles. Unlike most other cities, bridges and connecting channels are always emission hotspots due to long-time traffic congestion. This knowledge will help fully understand vehicular emissions characteristics and is useful for policymakers to design precise prevention and control measures.展开更多
The application and development of a wide-area measurement system(WAMS)has enabled many applications and led to several requirements based on dynamic measurement data.Such data are transmitted as big data information ...The application and development of a wide-area measurement system(WAMS)has enabled many applications and led to several requirements based on dynamic measurement data.Such data are transmitted as big data information flow.To ensure effective transmission of wide-frequency electrical information by the communication protocol of a WAMS,this study performs real-time traffic monitoring and analysis of the data network of a power information system,and establishes corresponding network optimization strategies to solve existing transmission problems.This study utilizes the traffic analysis results obtained using the current real-time dynamic monitoring system to design an optimization strategy,covering the optimization in three progressive levels:the underlying communication protocol,source data,and transmission process.Optimization of the system structure and scheduling optimization of data information are validated to be feasible and practical via tests.展开更多
Predicting the mechanical behaviors of structure and perceiving the anomalies in advance are essential to ensuring the safe operation of infrastructures in the long run.In addition to the incomplete consideration of i...Predicting the mechanical behaviors of structure and perceiving the anomalies in advance are essential to ensuring the safe operation of infrastructures in the long run.In addition to the incomplete consideration of influencing factors,the prediction time scale of existing studies is rough.Therefore,this study focuses on the development of a real-time prediction model by coupling the spatio-temporal correlation with external load through autoencoder network(ATENet)based on structural health monitoring(SHM)data.An autoencoder mechanism is performed to acquire the high-level representation of raw monitoring data at different spatial positions,and the recurrent neural network is applied to understanding the temporal correlation from the time series.Then,the obtained temporal-spatial information is coupled with dynamic loads through a fully connected layer to predict structural performance in next 12 h.As a case study,the proposed model is formulated on the SHM data collected from a representative underwater shield tunnel.The robustness study is carried out to verify the reliability and the prediction capability of the proposed model.Finally,the ATENet model is compared with some typical models,and the results indicate that it has the best performance.ATENet model is of great value to predict the realtime evolution trend of tunnel structure.展开更多
The Internet of Things(IoT)and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.Recognizing Medical-Related Human Activities(MRHA)is pivotal for he...The Internet of Things(IoT)and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.Recognizing Medical-Related Human Activities(MRHA)is pivotal for healthcare systems,particularly for identifying actions critical to patient well-being.However,challenges such as high computational demands,low accuracy,and limited adaptability persist in Human Motion Recognition(HMR).While some studies have integrated HMR with IoT for real-time healthcare applications,limited research has focused on recognizing MRHA as essential for effective patient monitoring.This study proposes a novel HMR method tailored for MRHA detection,leveraging multi-stage deep learning techniques integrated with IoT.The approach employs EfficientNet to extract optimized spatial features from skeleton frame sequences using seven Mobile Inverted Bottleneck Convolutions(MBConv)blocks,followed by Convolutional Long Short Term Memory(ConvLSTM)to capture spatio-temporal patterns.A classification module with global average pooling,a fully connected layer,and a dropout layer generates the final predictions.The model is evaluated on the NTU RGB+D 120 and HMDB51 datasets,focusing on MRHA such as sneezing,falling,walking,sitting,etc.It achieves 94.85%accuracy for cross-subject evaluations and 96.45%for cross-view evaluations on NTU RGB+D 120,along with 89.22%accuracy on HMDB51.Additionally,the system integrates IoT capabilities using a Raspberry Pi and GSM module,delivering real-time alerts via Twilios SMS service to caregivers and patients.This scalable and efficient solution bridges the gap between HMR and IoT,advancing patient monitoring,improving healthcare outcomes,and reducing costs.展开更多
Considering the increasing use of information technology with established standards, such as TCP/IP and XML in modem industrial automation, we present a high cost performance solution with FPGA (field programmable ga...Considering the increasing use of information technology with established standards, such as TCP/IP and XML in modem industrial automation, we present a high cost performance solution with FPGA (field programmable gate array) implementation of a novel reliable real-time data transfer system based on EPA (Ethemet for plant automation) protocol and IEEE 1588 standard. This combination can provide more predictable and real-time communication between automation equipments and precise synchronization between devices. The designed EPA system has been verified on Xilinx Spartan3 XC3S1500 and it consumed 75% of the total slices. The experimental results show that the novel industrial control system achieves high synchronization precision and provides a 1.59-ps standard deviation between the master device and the slave ones. Such a real-time data transfer system is an excellent candidate for automation equipments which require precise synchronization based on Ethemet at a comparatively low price.展开更多
With the continual growth of the variety and complexity of network crime means, the traditional packet feature matching cannot detect all kinds of intrusion behaviors completely. It is urgent to reassemble network str...With the continual growth of the variety and complexity of network crime means, the traditional packet feature matching cannot detect all kinds of intrusion behaviors completely. It is urgent to reassemble network stream to perform packet processing at a semantic level above the network layer. This paper presents an efficient TCP stream reassembly mechanism for real-time processing of high-speed network traffic. By analyzing the characteristics of network stream in high-speed network and TCP connection establishment process, several polices for designing the reassembly mechanism are built. Then, the reassembly implementation is elaborated in accordance with the policies. Finally, the reassembly mechanism is compared with the traditional reassembly mechanism by the network traffic captured in a typical gigabit gateway. Experiment results illustrate that the reassembly mechanism is efficient and can satisfy the real-time property requirement of traffic analysis system in high-speed network.展开更多
This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extract...This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extraction transformation loading) tier,data processing tier,data storage tier and data display tier,achieving long-term store,real-time analysis and inquiry for massive data.Finally,a real dataset cluster is simulated,which are made up of 39 nodes including 2 master nodes and 37 data nodes,and performing function tests of data importing module and real-time query module,and performance tests of HDFS's I/O,the MapReduce cluster,batch-loading and real-time query of massive data.The test results indicate that this platform achieves high performance in terms of response time and linear scalability.展开更多
Recently, use of mobile communicational devices in field data collection is increasing such as smart phones and cellular phones due to emergence of embedded Global Position System GPS and Wi-Fi Internet access. Accura...Recently, use of mobile communicational devices in field data collection is increasing such as smart phones and cellular phones due to emergence of embedded Global Position System GPS and Wi-Fi Internet access. Accurate timely and handy field data collection is required for disaster management and emergency quick responses. In this article, we introduce web-based GIS system to collect the field data by personal mobile phone through Post Office Protocol POP3 mail server. The main objective of this work is to demonstrate real-time field data collection method to the students using their mobile phone to collect field data by timely and handy manners, either individual or group survey in local or global scale research.展开更多
This paper focuses on the time efficiency for machine vision and intelligent photogrammetry, especially high accuracy on-board real-time cloud detection method. With the development of technology, the data acquisition...This paper focuses on the time efficiency for machine vision and intelligent photogrammetry, especially high accuracy on-board real-time cloud detection method. With the development of technology, the data acquisition ability is growing continuously and the volume of raw data is increasing explosively. Meanwhile, because of the higher requirement of data accuracy, the computation load is also becoming heavier. This situation makes time efficiency extremely important. Moreover, the cloud cover rate of optical satellite imagery is up to approximately 50%, which is seriously restricting the applications of on-board intelligent photogrammetry services. To meet the on-board cloud detection requirements and offer valid input data to subsequent processing, this paper presents a stream-computing of high accuracy on-board real-time cloud detection solution which follows the “bottom-up” understanding strategy of machine vision and uses multiple embedded GPU with significant potential to be applied on-board. Without external memory, the data parallel pipeline system based on multiple processing modules of this solution could afford the “stream-in, processing, stream-out” real-time stream computing. In experiments, images of GF-2 satellite are used to validate the accuracy and performance of this approach, and the experimental results show that this solution could not only bring up cloud detection accuracy, but also match the on-board real-time processing requirements.展开更多
A DMVOCC-MVDA (distributed multiversion optimistic concurrency control with multiversion dynamic adjustment) protocol was presented to process mobile distributed real-time transaction in mobile broadcast environment...A DMVOCC-MVDA (distributed multiversion optimistic concurrency control with multiversion dynamic adjustment) protocol was presented to process mobile distributed real-time transaction in mobile broadcast environments. At the mobile hosts, all transactions perform local pre-validation. The local pre-validation process is carried out against the committed transactions at the server in the last broadcast cycle. Transactions that survive in local pre-validation must be submitted to the server for local final validation. The new protocol eliminates conflicts between mobile read-only and mobile update transactions, and resolves data conflicts flexibly by using multiversion dynamic adjustment of serialization order to avoid unnecessary restarts of transactions. Mobile read-only transactions can be committed with no-blocking, and respond time of mobile read-only transactions is greatly shortened. The tolerance of mobile transactions of disconnections from the broadcast channel is increased. In global validation mobile distributed transactions have to do check to ensure distributed serializability in all participants. The simulation results show that the new concurrency control protocol proposed offers better performance than other protocols in terms of miss rate, restart rate, commit rate. Under high work load (think time is ls) the miss rate of DMVOCC-MVDA is only 14.6%, is significantly lower than that of other protocols. The restart rate of DMVOCC-MVDA is only 32.3%, showing that DMVOCC-MVDA can effectively reduce the restart rate of mobile transactions. And the commit rate of DMVOCC-MVDA is up to 61.2%, which is obviously higher than that of other protocols.展开更多
To evaluate and improve the real-time performance of Ethernet for plant automation(EPA) industrial Ethernet,the real-time performance of EPA periodic data transmission was theoretically and experimentally studied.By...To evaluate and improve the real-time performance of Ethernet for plant automation(EPA) industrial Ethernet,the real-time performance of EPA periodic data transmission was theoretically and experimentally studied.By analyzing information transmission regularity and EPA deterministic scheduling mechanism,periodic messages were categorized as different modes according to their entering-queue time.The scheduling characteristics and delivery time of each mode and their interacting relations were studied,during which the models of real-time performance of periodic information transmission in EPA system were established.On this basis,an experimental platform is developed to test the delivery time of periodic messages transmission in EPA system.According to the analysis and the experiment,the main factors that limit the real-time performance of EPA periodic data transmission and the improvement methods were proposed.展开更多
Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for...Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for SLRT. However, making a large-scale and diverse sign language dataset is difficult as sign language data on the Internet is scarce. In making a large-scale and diverse sign language dataset, some sign language data qualities are not up to standard. This paper proposes a two information streams transformer(TIST) model to judge whether the quality of sign language data is qualified. To verify that TIST effectively improves sign language recognition(SLR), we make two datasets, the screened dataset and the unscreened dataset. In this experiment, this paper uses visual alignment constraint(VAC) as the baseline model. The experimental results show that the screened dataset can achieve better word error rate(WER) than the unscreened dataset.展开更多
With the widespread application of Internet of Things(IoT)technology,the processing of massive realtime streaming data poses significant challenges to the computational and data-processing capabilities of systems.Alth...With the widespread application of Internet of Things(IoT)technology,the processing of massive realtime streaming data poses significant challenges to the computational and data-processing capabilities of systems.Although distributed streaming data processing frameworks such asApache Flink andApache Spark Streaming provide solutions,meeting stringent response time requirements while ensuring high throughput and resource utilization remains an urgent problem.To address this,the study proposes a formal modeling approach based on Performance Evaluation Process Algebra(PEPA),which abstracts the core components and interactions of cloud-based distributed streaming data processing systems.Additionally,a generic service flow generation algorithmis introduced,enabling the automatic extraction of service flows fromthe PEPAmodel and the computation of key performance metrics,including response time,throughput,and resource utilization.The novelty of this work lies in the integration of PEPA-based formal modeling with the service flow generation algorithm,bridging the gap between formal modeling and practical performance evaluation for IoT systems.Simulation experiments demonstrate that optimizing the execution efficiency of components can significantly improve system performance.For instance,increasing the task execution rate from 10 to 100 improves system performance by 9.53%,while further increasing it to 200 results in a 21.58%improvement.However,diminishing returns are observed when the execution rate reaches 500,with only a 0.42%gain.Similarly,increasing the number of TaskManagers from 10 to 20 improves response time by 18.49%,but the improvement slows to 6.06% when increasing from 20 to 50,highlighting the importance of co-optimizing component efficiency and resource management to achieve substantial performance gains.This study provides a systematic framework for analyzing and optimizing the performance of IoT systems for large-scale real-time streaming data processing.The proposed approach not only identifies performance bottlenecks but also offers insights into improving system efficiency under different configurations and workloads.展开更多
基金funded by the Ongoing Research Funding Program(ORF-2025-890)King Saud University,Riyadh,Saudi Arabia and was supported by the Competitive Research Fund of theUniversity of Aizu,Japan.
文摘The exponential expansion of the Internet of Things(IoT),Industrial Internet of Things(IIoT),and Transportation Management of Things(TMoT)produces vast amounts of real-time streaming data.Ensuring system dependability,operational efficiency,and security depends on the identification of anomalies in these dynamic and resource-constrained systems.Due to their high computational requirements and inability to efficiently process continuous data streams,traditional anomaly detection techniques often fail in IoT systems.This work presents a resource-efficient adaptive anomaly detection model for real-time streaming data in IoT systems.Extensive experiments were carried out on multiple real-world datasets,achieving an average accuracy score of 96.06%with an execution time close to 7.5 milliseconds for each individual streaming data point,demonstrating its potential for real-time,resourceconstrained applications.The model uses Principal Component Analysis(PCA)for dimensionality reduction and a Z-score technique for anomaly detection.It maintains a low computational footprint with a sliding window mechanism,enabling incremental data processing and identification of both transient and sustained anomalies without storing historical data.The system uses a Multivariate Linear Regression(MLR)based imputation technique that estimates missing or corrupted sensor values,preserving data integrity prior to anomaly detection.The suggested solution is appropriate for many uses in smart cities,industrial automation,environmental monitoring,IoT security,and intelligent transportation systems,and is particularly well-suited for resource-constrained edge devices.
基金This work is supported,in part,by the National Natural Science Foundation of China under grant numbers 61300026in part,by the Natural Science Foundation of Fujian Province under grant numbers 2017J01754, 2018J01797.
文摘Continuous response of range query on steaming data provides useful information for many practical applications as well as the risk of privacy disclosure.The existing research on differential privacy streaming data publication mostly pay close attention to boosting query accuracy,but pay less attention to query efficiency,and ignore the effect of timeliness on data weight.In this paper,we propose an effective algorithm of differential privacy streaming data publication under exponential decay mode.Firstly,by introducing the Fenwick tree to divide and reorganize data items in the stream,we achieve a constant time complexity for inserting a new item and getting the prefix sum.Meanwhile,we achieve time complicity linear to the number of data item for building a tree.After that,we use the advantage of matrix mechanism to deal with relevant queries and reduce the global sensitivity.In addition,we choose proper diagonal matrix further improve the range query accuracy.Finally,considering about exponential decay,every data item is weighted by the decay factor.By putting the Fenwick tree and matrix optimization together,we present complete algorithm for differentiate private real-time streaming data publication.The experiment is designed to compare the algorithm in this paper with similar algorithms for streaming data release in exponential decay.Experimental results show that the algorithm in this paper effectively improve the query efficiency while ensuring the quality of the query.
文摘In the era of big data, huge volumes of data are generated from online social networks, sensor networks, mobile devices, and organizations’ enterprise systems. This phenomenon provides organizations with unprecedented opportunities to tap into big data to mine valuable business intelligence. However, traditional business analytics methods may not be able to cope with the flood of big data. The main contribution of this paper is the illustration of the development of a novel big data stream analytics framework named BDSASA that leverages a probabilistic language model to analyze the consumer sentiments embedded in hundreds of millions of online consumer reviews. In particular, an inference model is embedded into the classical language modeling framework to enhance the prediction of consumer sentiments. The practical implication of our research work is that organizations can apply our big data stream analytics framework to analyze consumers’ product preferences, and hence develop more effective marketing and production strategies.
文摘Opinion (sentiment) analysis on big data streams from the constantly generated text streams on social media networks to hundreds of millions of online consumer reviews provides many organizations in every field with opportunities to discover valuable intelligence from the massive user generated text streams. However, the traditional content analysis frameworks are inefficient to handle the unprecedentedly big volume of unstructured text streams and the complexity of text analysis tasks for the real time opinion analysis on the big data streams. In this paper, we propose a parallel real time sentiment analysis system: Social Media Data Stream Sentiment Analysis Service (SMDSSAS) that performs multiple phases of sentiment analysis of social media text streams effectively in real time with two fully analytic opinion mining models to combat the scale of text data streams and the complexity of sentiment analysis processing on unstructured text streams. We propose two aspect based opinion mining models: Deterministic and Probabilistic sentiment models for a real time sentiment analysis on the user given topic related data streams. Experiments on the social media Twitter stream traffic captured during the pre-election weeks of the 2016 Presidential election for real-time analysis of public opinions toward two presidential candidates showed that the proposed system was able to predict correctly Donald Trump as the winner of the 2016 Presidential election. The cross validation results showed that the proposed sentiment models with the real-time streaming components in our proposed framework delivered effectively the analysis of the opinions on two presidential candidates with average 81% accuracy for the Deterministic model and 80% for the Probabilistic model, which are 1% - 22% improvements from the results of the existing literature.
文摘The interleaving/multiplexing technique was used to realize a 200?MHz real time data acquisition system. Two 100?MHz ADC modules worked parallelly and every ADC plays out data in ping pang fashion. The design improved the system conversion rata to 200?MHz and reduced the speed of data transporting and storing to 50?MHz. The high speed HDPLD and ECL logic parts were used to control system timing and the memory address. The multi layer print board and the shield were used to decrease interference produced by the high speed circuit. The system timing was designed carefully. The interleaving/multiplexing technique could improve the system conversion rata greatly while reducing the speed of external digital interfaces greatly. The design resolved the difficulties in high speed system effectively. The experiment proved the data acquisition system is stable and accurate.
基金Supported by the National Key Research and Development Program of China(Nos.2016YFC1402000,2018YFC1407003,2017YFC1405300)
文摘Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great significance for exploiting and protecting the ocean.We used hourly mean wave height,temperature,and pressure real-time observation data taken in the Xiaomaidao station(in Qingdao,China)from June 1,2017,to May 31,2018,to explore the data quality using eight quality control methods,and to discriminate the most effective method for Xiaomaidao station.After using the eight quality control methods,the percentages of the mean wave height,temperature,and pressure data that passed the tests were 89.6%,88.3%,and 98.6%,respectively.With the marine disaster(wave alarm report)data,the values failed in the test mainly due to the influence of aging observation equipment and missing data transmissions.The mean wave height is often affected by dynamic marine disasters,so the continuity test method is not effective.The correlation test with other related parameters would be more useful for the mean wave height.
基金supported by the Research Fund of National Key Laboratory of Computer Architecture under Grant No.CARCH201501the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing under Grant No.2016A09
文摘In the era of Big Data, typical architecture of distributed real-time stream processing systems is the combination of Flume, Kafka, and Storm. As a kind of distributed message system, Kafka has the characteristics of horizontal scalability and high throughput, which is manly deployed in many areas in order to address the problem of speed mismatch between message producers and consumers. When using Kafka, we need to quickly receive data sent by producers. In addition, we need to send data to consumers quickly. Therefore, the performance of Kafka is of critical importance to the performance of the whole stream processing system. In this paper, we propose the improved design of real-time stream processing systems, and focus on improving the Kafka's data loading process.We use Kafka cat to transfer data from the source to Kafka topic directly, which can reduce the network transmission. We also utilize the memory file system to accelerate the process of data loading, which can address the bottleneck and performance problems caused by disk I/O. Extensive experiments are conducted to evaluate the performance, which show the superiority of our improved design.
基金supported by the National Key Research Program(No.2018YFB1601105,No.2018YFB1601102)the Natural Science Foundation of China(No.41975165,No.U1811463)Chongqing Science and Technology Project(No.cstc2019jscxfxydX0035)。
文摘High-resolution vehicular emissions inventories are important for managing vehicular pollution and improving urban air quality. This study developed a vehicular emission inventory with high spatio-temporal resolution in the main urban area of Chongqing, based on realtime traffic data from 820 RFID detectors covering 454 roads, and the differences in spatiotemporal emission characteristics between inner and outer districts were analysed. The result showed that the daily vehicular emission intensities of CO, hydrocarbons, PM2.5, PM10,and NO_(x) were 30.24, 3.83, 0.18, 0.20, and 8.65 kg/km per day, respectively, in the study area during 2018. The pollutants emission intensities in inner district were higher than those in outer district. Light passenger cars(LPCs) were the main contributors of all-day CO emissions in the inner and outer districts, from which the contributors of NO_(x) emissions were different. Diesel and natural gas buses were major contributors of daytime NO_(x) emissions in inner districts, accounting for 40.40%, but buses and heavy duty trucks(HDTs) were major contributors in outer districts. At nighttime, due to the lifting of truck restrictions and suspension of buses, HDTs become the main NO_(x) contributor in both inner and outer districts,and its three NO_(x) emission peak hours were found, which are different to the peak hours of total NO_(x) emission by all vehicles. Unlike most other cities, bridges and connecting channels are always emission hotspots due to long-time traffic congestion. This knowledge will help fully understand vehicular emissions characteristics and is useful for policymakers to design precise prevention and control measures.
文摘The application and development of a wide-area measurement system(WAMS)has enabled many applications and led to several requirements based on dynamic measurement data.Such data are transmitted as big data information flow.To ensure effective transmission of wide-frequency electrical information by the communication protocol of a WAMS,this study performs real-time traffic monitoring and analysis of the data network of a power information system,and establishes corresponding network optimization strategies to solve existing transmission problems.This study utilizes the traffic analysis results obtained using the current real-time dynamic monitoring system to design an optimization strategy,covering the optimization in three progressive levels:the underlying communication protocol,source data,and transmission process.Optimization of the system structure and scheduling optimization of data information are validated to be feasible and practical via tests.
基金This work is supported by the National Natural Science Foundation of China(Grant No.51991392)Key Deployment Projects of Chinese Academy of Sciences(Grant No.ZDRW-ZS-2021-3-3)the Second Tibetan Plateau Scientific Expedition and Research Program(STEP)(Grant No.2019QZKK0904).
文摘Predicting the mechanical behaviors of structure and perceiving the anomalies in advance are essential to ensuring the safe operation of infrastructures in the long run.In addition to the incomplete consideration of influencing factors,the prediction time scale of existing studies is rough.Therefore,this study focuses on the development of a real-time prediction model by coupling the spatio-temporal correlation with external load through autoencoder network(ATENet)based on structural health monitoring(SHM)data.An autoencoder mechanism is performed to acquire the high-level representation of raw monitoring data at different spatial positions,and the recurrent neural network is applied to understanding the temporal correlation from the time series.Then,the obtained temporal-spatial information is coupled with dynamic loads through a fully connected layer to predict structural performance in next 12 h.As a case study,the proposed model is formulated on the SHM data collected from a representative underwater shield tunnel.The robustness study is carried out to verify the reliability and the prediction capability of the proposed model.Finally,the ATENet model is compared with some typical models,and the results indicate that it has the best performance.ATENet model is of great value to predict the realtime evolution trend of tunnel structure.
基金funded by the ICT Division of theMinistry of Posts,Telecommunications,and Information Technology of Bangladesh under Grant Number 56.00.0000.052.33.005.21-7(Tracking No.22FS15306)support from the University of Rajshahi.
文摘The Internet of Things(IoT)and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.Recognizing Medical-Related Human Activities(MRHA)is pivotal for healthcare systems,particularly for identifying actions critical to patient well-being.However,challenges such as high computational demands,low accuracy,and limited adaptability persist in Human Motion Recognition(HMR).While some studies have integrated HMR with IoT for real-time healthcare applications,limited research has focused on recognizing MRHA as essential for effective patient monitoring.This study proposes a novel HMR method tailored for MRHA detection,leveraging multi-stage deep learning techniques integrated with IoT.The approach employs EfficientNet to extract optimized spatial features from skeleton frame sequences using seven Mobile Inverted Bottleneck Convolutions(MBConv)blocks,followed by Convolutional Long Short Term Memory(ConvLSTM)to capture spatio-temporal patterns.A classification module with global average pooling,a fully connected layer,and a dropout layer generates the final predictions.The model is evaluated on the NTU RGB+D 120 and HMDB51 datasets,focusing on MRHA such as sneezing,falling,walking,sitting,etc.It achieves 94.85%accuracy for cross-subject evaluations and 96.45%for cross-view evaluations on NTU RGB+D 120,along with 89.22%accuracy on HMDB51.Additionally,the system integrates IoT capabilities using a Raspberry Pi and GSM module,delivering real-time alerts via Twilios SMS service to caregivers and patients.This scalable and efficient solution bridges the gap between HMR and IoT,advancing patient monitoring,improving healthcare outcomes,and reducing costs.
文摘Considering the increasing use of information technology with established standards, such as TCP/IP and XML in modem industrial automation, we present a high cost performance solution with FPGA (field programmable gate array) implementation of a novel reliable real-time data transfer system based on EPA (Ethemet for plant automation) protocol and IEEE 1588 standard. This combination can provide more predictable and real-time communication between automation equipments and precise synchronization between devices. The designed EPA system has been verified on Xilinx Spartan3 XC3S1500 and it consumed 75% of the total slices. The experimental results show that the novel industrial control system achieves high synchronization precision and provides a 1.59-ps standard deviation between the master device and the slave ones. Such a real-time data transfer system is an excellent candidate for automation equipments which require precise synchronization based on Ethemet at a comparatively low price.
基金National High-Tech Research and Development Program of China (863 Program) (No.2007AA01Z309)
文摘With the continual growth of the variety and complexity of network crime means, the traditional packet feature matching cannot detect all kinds of intrusion behaviors completely. It is urgent to reassemble network stream to perform packet processing at a semantic level above the network layer. This paper presents an efficient TCP stream reassembly mechanism for real-time processing of high-speed network traffic. By analyzing the characteristics of network stream in high-speed network and TCP connection establishment process, several polices for designing the reassembly mechanism are built. Then, the reassembly implementation is elaborated in accordance with the policies. Finally, the reassembly mechanism is compared with the traditional reassembly mechanism by the network traffic captured in a typical gigabit gateway. Experiment results illustrate that the reassembly mechanism is efficient and can satisfy the real-time property requirement of traffic analysis system in high-speed network.
基金Supported by the National Science and Technology Support Project(No.2012BAH01F02)from Ministry of Science and Technology of Chinathe Director Fund(No.IS201116002)from Institute of Seismology,CEA
文摘This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extraction transformation loading) tier,data processing tier,data storage tier and data display tier,achieving long-term store,real-time analysis and inquiry for massive data.Finally,a real dataset cluster is simulated,which are made up of 39 nodes including 2 master nodes and 37 data nodes,and performing function tests of data importing module and real-time query module,and performance tests of HDFS's I/O,the MapReduce cluster,batch-loading and real-time query of massive data.The test results indicate that this platform achieves high performance in terms of response time and linear scalability.
文摘Recently, use of mobile communicational devices in field data collection is increasing such as smart phones and cellular phones due to emergence of embedded Global Position System GPS and Wi-Fi Internet access. Accurate timely and handy field data collection is required for disaster management and emergency quick responses. In this article, we introduce web-based GIS system to collect the field data by personal mobile phone through Post Office Protocol POP3 mail server. The main objective of this work is to demonstrate real-time field data collection method to the students using their mobile phone to collect field data by timely and handy manners, either individual or group survey in local or global scale research.
基金The National Natural Science Foundation of China (91438203,91638301,91438111,41601476).
文摘This paper focuses on the time efficiency for machine vision and intelligent photogrammetry, especially high accuracy on-board real-time cloud detection method. With the development of technology, the data acquisition ability is growing continuously and the volume of raw data is increasing explosively. Meanwhile, because of the higher requirement of data accuracy, the computation load is also becoming heavier. This situation makes time efficiency extremely important. Moreover, the cloud cover rate of optical satellite imagery is up to approximately 50%, which is seriously restricting the applications of on-board intelligent photogrammetry services. To meet the on-board cloud detection requirements and offer valid input data to subsequent processing, this paper presents a stream-computing of high accuracy on-board real-time cloud detection solution which follows the “bottom-up” understanding strategy of machine vision and uses multiple embedded GPU with significant potential to be applied on-board. Without external memory, the data parallel pipeline system based on multiple processing modules of this solution could afford the “stream-in, processing, stream-out” real-time stream computing. In experiments, images of GF-2 satellite are used to validate the accuracy and performance of this approach, and the experimental results show that this solution could not only bring up cloud detection accuracy, but also match the on-board real-time processing requirements.
基金Project(20030533011)supported by the National Research Foundation for the Doctoral Program of Higher Education of China
文摘A DMVOCC-MVDA (distributed multiversion optimistic concurrency control with multiversion dynamic adjustment) protocol was presented to process mobile distributed real-time transaction in mobile broadcast environments. At the mobile hosts, all transactions perform local pre-validation. The local pre-validation process is carried out against the committed transactions at the server in the last broadcast cycle. Transactions that survive in local pre-validation must be submitted to the server for local final validation. The new protocol eliminates conflicts between mobile read-only and mobile update transactions, and resolves data conflicts flexibly by using multiversion dynamic adjustment of serialization order to avoid unnecessary restarts of transactions. Mobile read-only transactions can be committed with no-blocking, and respond time of mobile read-only transactions is greatly shortened. The tolerance of mobile transactions of disconnections from the broadcast channel is increased. In global validation mobile distributed transactions have to do check to ensure distributed serializability in all participants. The simulation results show that the new concurrency control protocol proposed offers better performance than other protocols in terms of miss rate, restart rate, commit rate. Under high work load (think time is ls) the miss rate of DMVOCC-MVDA is only 14.6%, is significantly lower than that of other protocols. The restart rate of DMVOCC-MVDA is only 32.3%, showing that DMVOCC-MVDA can effectively reduce the restart rate of mobile transactions. And the commit rate of DMVOCC-MVDA is up to 61.2%, which is obviously higher than that of other protocols.
基金Supported by the National High Technology Research and Development Program of China (2006AA040301-4,2007AA041301-6)
文摘To evaluate and improve the real-time performance of Ethernet for plant automation(EPA) industrial Ethernet,the real-time performance of EPA periodic data transmission was theoretically and experimentally studied.By analyzing information transmission regularity and EPA deterministic scheduling mechanism,periodic messages were categorized as different modes according to their entering-queue time.The scheduling characteristics and delivery time of each mode and their interacting relations were studied,during which the models of real-time performance of periodic information transmission in EPA system were established.On this basis,an experimental platform is developed to test the delivery time of periodic messages transmission in EPA system.According to the analysis and the experiment,the main factors that limit the real-time performance of EPA periodic data transmission and the improvement methods were proposed.
基金supported by the National Language Commission to research on sign language data specifications for artificial intelligence applications and test standards for language service translation systems (No.ZDI145-70)。
文摘Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for SLRT. However, making a large-scale and diverse sign language dataset is difficult as sign language data on the Internet is scarce. In making a large-scale and diverse sign language dataset, some sign language data qualities are not up to standard. This paper proposes a two information streams transformer(TIST) model to judge whether the quality of sign language data is qualified. To verify that TIST effectively improves sign language recognition(SLR), we make two datasets, the screened dataset and the unscreened dataset. In this experiment, this paper uses visual alignment constraint(VAC) as the baseline model. The experimental results show that the screened dataset can achieve better word error rate(WER) than the unscreened dataset.
基金funded by the Joint Project of Industry-University-Research of Jiangsu Province(Grant:BY20231146).
文摘With the widespread application of Internet of Things(IoT)technology,the processing of massive realtime streaming data poses significant challenges to the computational and data-processing capabilities of systems.Although distributed streaming data processing frameworks such asApache Flink andApache Spark Streaming provide solutions,meeting stringent response time requirements while ensuring high throughput and resource utilization remains an urgent problem.To address this,the study proposes a formal modeling approach based on Performance Evaluation Process Algebra(PEPA),which abstracts the core components and interactions of cloud-based distributed streaming data processing systems.Additionally,a generic service flow generation algorithmis introduced,enabling the automatic extraction of service flows fromthe PEPAmodel and the computation of key performance metrics,including response time,throughput,and resource utilization.The novelty of this work lies in the integration of PEPA-based formal modeling with the service flow generation algorithm,bridging the gap between formal modeling and practical performance evaluation for IoT systems.Simulation experiments demonstrate that optimizing the execution efficiency of components can significantly improve system performance.For instance,increasing the task execution rate from 10 to 100 improves system performance by 9.53%,while further increasing it to 200 results in a 21.58%improvement.However,diminishing returns are observed when the execution rate reaches 500,with only a 0.42%gain.Similarly,increasing the number of TaskManagers from 10 to 20 improves response time by 18.49%,but the improvement slows to 6.06% when increasing from 20 to 50,highlighting the importance of co-optimizing component efficiency and resource management to achieve substantial performance gains.This study provides a systematic framework for analyzing and optimizing the performance of IoT systems for large-scale real-time streaming data processing.The proposed approach not only identifies performance bottlenecks but also offers insights into improving system efficiency under different configurations and workloads.