In recent years, the rapid decline of Arctic sea ice area (SIA) and sea ice extent (SIE), especially for the multiyear (MY) ice, has led to significant effect on climate change. The accurate retrieval of MY ice ...In recent years, the rapid decline of Arctic sea ice area (SIA) and sea ice extent (SIE), especially for the multiyear (MY) ice, has led to significant effect on climate change. The accurate retrieval of MY ice concentration retrieval is very important and challenging to understand the ongoing changes. Three MY ice concentration retrieval algorithms were systematically evaluated. A similar total ice concentration was yielded by these algorithms, while the retrieved MY sea ice concentrations differs from each other. The MY SIA derived from NASA TEAM algorithm is relatively stable. Other two algorithms created seasonal fluctuations of MY SIA, particularly in autumn and winter. In this paper, we proposed an ice concentration retrieval algorithm, which developed the NASA TEAM algorithm by adding to use AMSR-E 6.9 GHz brightness temperature data and sea ice concentration using 89.0 GHz data. Comparison with the reference MY SIA from reference MY ice, indicates that the mean difference and root mean square (rms) difference of MY SIA derived from the algorithm of this study are 0.65×10^6 km^2 and 0.69×10^6 km^2 during January to March, -0.06×10^6 km^2 and 0.14×10^6 km^2during September to December respectively. Comparison with MY SIE obtained from weekly ice age data provided by University of Colorado show that, the mean difference and rms difference are 0.69×10^6 km^2 and 0.84×10^6 km^2, respectively. The developed algorithm proposed in this study has smaller difference compared with the reference MY ice and MY SIE from ice age data than the Wang's, Lomax' and NASA TEAM algorithms.展开更多
Based on the atmospheric horizontal visibility data from forty-seven observational stations along the eastern coast of China near the Taiwan Strait and simultaneous NOAA/AVHRR multichannel satellite data during Januar...Based on the atmospheric horizontal visibility data from forty-seven observational stations along the eastern coast of China near the Taiwan Strait and simultaneous NOAA/AVHRR multichannel satellite data during January 2001 to December 2002, the spectral characters associated with visibility were investigated. Successful retrieval of visibility from multichannel NOAA/AVHRR data was performed using the principal component regression (PCR) method. A sample of retrieved visibility distribution was discussed with a sea fog process. The correlation coefficient between the observed and retrieved visibility was about 0.82, which is far higher than the 99.9% confidence level by statistical test. The rate of successful retrieval is 94.98% of the 458 cases during 2001 2002. The error distribution showed that high visibilities were usually under-estimated and low visibilities were over-estimated and the relative error between the observed and retrieved visibilities was about 21.4%.展开更多
This paper adopts satellite channel brightness temperature simulation to study M-estimator variational retrieval. This approach combines both the advantages of classical variational inversion and robust M-estimators. ...This paper adopts satellite channel brightness temperature simulation to study M-estimator variational retrieval. This approach combines both the advantages of classical variational inversion and robust M-estimators. Classical variational inversion depends on prior quality control to elim- inate outliers, and its errors follow a Gaussian distribution. We coupled the M-estimators to the framework of classical variational inversion to obtain a M-estimator variational inversion. The cost function contains the M-estimator to guarantee the robustness to outliers and improve the retrieval re- sults. The experimental evaluation adopts Feng Yun-3A (FY-3A) simulated data to add to the Gaussian and Non-Gaussian error. The variational in- version is used to obtain the inversion brightness temperature, and temperature and humidity data are used for validation. The preliminary results demonstrate the potential of M-estimator variational retrieval.展开更多
Operation control of power systems has become challenging with an increase in the scale and complexity of power distribution systems and extensive access to renewable energy.Therefore,improvement of the ability of dat...Operation control of power systems has become challenging with an increase in the scale and complexity of power distribution systems and extensive access to renewable energy.Therefore,improvement of the ability of data-driven operation management,intelligent analysis,and mining is urgently required.To investigate and explore similar regularities of the historical operating section of the power distribution system and assist the power grid in obtaining high-value historical operation,maintenance experience,and knowledge by rule and line,a neural information retrieval model with an attention mechanism is proposed based on graph data computing technology.Based on the processing flow of the operating data of the power distribution system,a technical framework of neural information retrieval is established.Combined with the natural graph characteristics of the power distribution system,a unified graph data structure and a data fusion method of data access,data complement,and multi-source data are constructed.Further,a graph node feature-embedding representation learning algorithm and a neural information retrieval algorithm model are constructed.The neural information retrieval algorithm model is trained and tested using the generated graph node feature representation vector set.The model is verified on the operating section of the power distribution system of a provincial grid area.The results show that the proposed method demonstrates high accuracy in the similarity matching of historical operation characteristics and effectively supports intelligent fault diagnosis and elimination in power distribution systems.展开更多
Big data analytics in business intelligence do not provide effective data retrieval methods and job scheduling that will cause execution inefficiency and low system throughput.This paper aims to enhance the capability...Big data analytics in business intelligence do not provide effective data retrieval methods and job scheduling that will cause execution inefficiency and low system throughput.This paper aims to enhance the capability of data retrieval and job scheduling to speed up the operation of big data analytics to overcome inefficiency and low throughput problems.First,integrating stacked sparse autoencoder and Elasticsearch indexing explored fast data searching and distributed indexing,which reduces the search scope of the database and dramatically speeds up data searching.Next,exploiting a deep neural network to predict the approximate execution time of a job gives prioritized job scheduling based on the shortest job first,which reduces the average waiting time of job execution.As a result,the proposed data retrieval approach outperforms the previous method using a deep autoencoder and Solr indexing,significantly improving the speed of data retrieval up to 53%and increasing system throughput by 53%.On the other hand,the proposed job scheduling algorithmdefeats both first-in-first-out andmemory-sensitive heterogeneous early finish time scheduling algorithms,effectively shortening the average waiting time up to 5%and average weighted turnaround time by 19%,respectively.展开更多
In view of the study of finance and economics information, we research on the real-time financial news posted on the authority sites in the world's major advanced economies. Analyzing the massive financial news of...In view of the study of finance and economics information, we research on the real-time financial news posted on the authority sites in the world's major advanced economies. Analyzing the massive financial news of different information sources and language origins, we come up with a basic theory model and its algorithm on financial news, which is capable of intelligent collection, quick access, deduplication, correction and integration with financial news' backgrounds. Furthermore, we can find out connections between financial news and readers' interest. So we can achieve a real-time and on-demand financial news feed, as well as provide a theoretical basis and verification of the scientific problems on real-time processing of massive information. Finally, the simulation experiment shows that the multilingual financial news matching technology can give more help to distinguish the similar financial news in different languages than the traditional method.展开更多
With the development of information technology,the online retrieval of remote electronic data has become an important method for investigative agencies to collect evidence.In the current normative documents,the online...With the development of information technology,the online retrieval of remote electronic data has become an important method for investigative agencies to collect evidence.In the current normative documents,the online retrieval of electronic data is positioned as a new type of arbitrary investigative measure.However,study of its actual operation has found that the online retrieval of electronic data does not fully comply with the characteristics of arbitrary investigative measures.The root cause is its inaccurately defined nature due to analogy errors,an emphasis on the authenticity of electronic data at the cost of rights protection,insufficient effectiveness of normative documents to break through the boundaries of law,and superficial inconsistency found in the mechanical comparison with the nature of existing investigative measures causes.The nature of electronic data retrieved online should be defined according to different circumstances.The retrieval of electronic data disclosed on the Internet is an arbitrary investigative measure,and following procedural specifications should be sufficient.When investigators conceal their true identities and enter the cyberspace of the suspected crime through a registered account to extract dynamic electronic data for criminal activities,it is essentially a covert investigation in cyberspace,and they should follow the normative requirements for covert investigations.The retrieval of dynamic electronic data from private spaces is a technical investigative measure and should be implemented in accordance with the technical investigative procedures.Retrieval of remote“non-public electronic data involving privacy”is a mandatory investigative measure,and is essentially a search in the virtual space.Therefore,procedural specifications should be set in accordance with the standards of searching.展开更多
This paper focuses on developing a system that allows presentation authors to effectively retrieve presentation slides for reuse from a large volume of existing presentation materials. We assume episodic memories of t...This paper focuses on developing a system that allows presentation authors to effectively retrieve presentation slides for reuse from a large volume of existing presentation materials. We assume episodic memories of the authors can be used as contextual keywords in query expressions to efficiently dig out the expected slides for reuse rather than using only the part-of-slide-descriptions-based keyword queries. As a system, a new slide repository is proposed, composed of slide material collections, slide content data and pieces of information from authors' episodic memories related to each slide and presentation together with a slide retrieval application enabling authors to use the episodic memories as part of queries. The result of our experiment shows that the episodic memory-used queries can give more discoverability than the keyword-based queries. Additionally, an improvement model is discussed on the slide retrieval for further slide-finding efficiency by expanding the episodic memories model in the repository taking in the links with the author-and-slide-related data and events having been post on the private and social media sites.展开更多
We develop a data driven method(probability model) to construct a composite shape descriptor by combining a pair of scale-based shape descriptors. The selection of a pair of scale-based shape descriptors is modeled as...We develop a data driven method(probability model) to construct a composite shape descriptor by combining a pair of scale-based shape descriptors. The selection of a pair of scale-based shape descriptors is modeled as the computation of the union of two events, i.e.,retrieving similar shapes by using a single scale-based shape descriptor. The pair of scale-based shape descriptors with the highest probability forms the composite shape descriptor. Given a shape database, the composite shape descriptors for the shapes constitute a planar point set.A VoR-Tree of the planar point set is then used as an indexing structure for efficient query operation. Experiments and comparisons show the effectiveness and efficiency of the proposed composite shape descriptor.展开更多
A kind of single linked lists named aggregative chain is introduced to the algorithm, thus improving the architecture of FP tree. The new FP tree is a one-way tree and only the pointers that point its parent at each n...A kind of single linked lists named aggregative chain is introduced to the algorithm, thus improving the architecture of FP tree. The new FP tree is a one-way tree and only the pointers that point its parent at each node are kept. Route information of different nodes in a same item are compressed into aggregative chains so that the frequent patterns will be produced in aggregative chains without generating node links and conditional pattern bases. An example of Web key words retrieval is given to analyze and verify the frequent pattern algorithm in this paper.展开更多
It is well known that retrieval of parameters is usually ill-posed and highly nonlinear, so parameter retrieval problems are very difficult. There are still many important theoretical issues under research, although g...It is well known that retrieval of parameters is usually ill-posed and highly nonlinear, so parameter retrieval problems are very difficult. There are still many important theoretical issues under research, although great success has been achieved in data assimilation in meteorology and oceanography. This paper reviews the recent research on parameter retrieval, especially that of the authors. First, some concepts and issues of parameter retrieval are introduced and the state-of-the-art parameter retrieval technology in meteorology and oceanography is reviewed briefly, and then atmospheric and oceanic parameters are retrieved using the variational data assimilation method combined with the regularization techniques in four examples: retrieval of the vertical eddy diffusion coefficient; of the turbulivity of the atmospheric boundary layer; of wind from Doppler radar data, and of the physical process parameters. Model parameter retrieval with global and local observations is also introduced.展开更多
With the increasing popularity of cloud computing,privacy has become one of the key problem in cloud security.When data is outsourced to the cloud,for data owners,they need to ensure the security of their privacy;for ...With the increasing popularity of cloud computing,privacy has become one of the key problem in cloud security.When data is outsourced to the cloud,for data owners,they need to ensure the security of their privacy;for cloud service providers,they need some information of the data to provide high QoS services;and for authorized users,they need to access to the true value of data.The existing privacy-preserving methods can't meet all the needs of the three parties at the same time.To address this issue,we propose a retrievable data perturbation method and use it in the privacy-preserving in data outsourcing in cloud computing.Our scheme comes in four steps.Firstly,an improved random generator is proposed to generate an accurate"noise".Next,a perturbation algorithm is introduced to add noise to the original data.By doing this,the privacy information is hidden,but the mean and covariance of data which the service providers may need remain unchanged.Then,a retrieval algorithm is proposed to get the original data back from the perturbed data.Finally,we combine the retrievable perturbation with the access control process to ensure only the authorized users can retrieve the original data.The experiments show that our scheme perturbs date correctly,efficiently,and securely.展开更多
In this paper we investigate the impact of the Atmospheric Infra-Red Sounder (AIRS) temperature retrievals on data assimilation and the resulting forecasts using the four-dimensional Local Ensemble Transform Kalman Fi...In this paper we investigate the impact of the Atmospheric Infra-Red Sounder (AIRS) temperature retrievals on data assimilation and the resulting forecasts using the four-dimensional Local Ensemble Transform Kalman Filter (LETKF) data assimilation scheme and a reduced resolution version of the NCEP Global Forecast System (GFS).Our results indicate that the AIRS temperature retrievals have a significant and consistent positive impact in the Southern Hemispheric extratropics on both analyses and forecasts,which is found not only in the temperature field but also in other variables.In tropics and the Northern Hemispheric extratropics these impacts are smaller,but are still generally positive or neutral.展开更多
The drastic growth of coastal observation sensors results in copious data that provide weather information.The intricacies in sensor-generated big data are heterogeneity and interpretation,driving high-end Information...The drastic growth of coastal observation sensors results in copious data that provide weather information.The intricacies in sensor-generated big data are heterogeneity and interpretation,driving high-end Information Retrieval(IR)systems.The Semantic Web(SW)can solve this issue by integrating data into a single platform for information exchange and knowledge retrieval.This paper focuses on exploiting the SWbase systemto provide interoperability through ontologies by combining the data concepts with ontology classes.This paper presents a 4-phase weather data model:data processing,ontology creation,SW processing,and query engine.The developed Oceanographic Weather Ontology helps to enhance data analysis,discovery,IR,and decision making.In addition to that,it also evaluates the developed ontology with other state-of-the-art ontologies.The proposed ontology’s quality has improved by 39.28%in terms of completeness,and structural complexity has decreased by 45.29%,11%and 37.7%in Precision and Accuracy.Indian Meteorological Satellite INSAT-3D’s ocean data is a typical example of testing the proposed model.The experimental result shows the effectiveness of the proposed data model and its advantages in machine understanding and IR.展开更多
A simple fast method is given for sequentially retrieving all the records in a B tree. A file structure for database is proposed. The records in its primary data file are sorted according to the key order. A B tree ...A simple fast method is given for sequentially retrieving all the records in a B tree. A file structure for database is proposed. The records in its primary data file are sorted according to the key order. A B tree is used as its dense index. It is easy to insert, delete or search a record, and it is also convenient to retrieve records in the sequential order of the keys. The merits and efficiencies of these methods or structures are discussed in detail.展开更多
In this paper, we present machine learning algorithms and systems for similar video retrieval. Here, the query is itself a video. For the similarity measurement, exemplars, or representative frames in each video, are ...In this paper, we present machine learning algorithms and systems for similar video retrieval. Here, the query is itself a video. For the similarity measurement, exemplars, or representative frames in each video, are extracted by unsupervised learning. For this learning, we chose the order-aware competitive learning. After obtaining a set of exemplars for each video, the similarity is computed. Because the numbers and positions of the exemplars are different in each video, we use a similarity computing method called M-distance, which generalizes existing global and local alignment methods using followers to the exemplars. To represent each frame in the video, this paper emphasizes the Frame Signature of the ISO/IEC standard so that the total system, along with its graphical user interface, becomes practical. Experiments on the detection of inserted plagiaristic scenes showed excellent precision-recall curves, with precision values very close to 1. Thus, the proposed system can work as a plagiarism detector for videos. In addition, this method can be regarded as the structuring of unstructured data via numerical labeling by exemplars. Finally, further sophistication of this labeling is discussed.展开更多
In this paper, we conduct research on the multimedia information retrieval algorithm based on the information restructuring and image reconstruction. With the massive growth of information resources, people through va...In this paper, we conduct research on the multimedia information retrieval algorithm based on the information restructuring and image reconstruction. With the massive growth of information resources, people through various retrieval tools for too much information, led directly to information overload. In vector space model and probability retrieval model based on information retrieval tools rarely consider the user' s personalized information needs and features, has resulted in a large amount of information retrieval result and correlation information the user' s information demand is not big. In order to improve the existing retrieval system, in recent years, scholars to study looked that context information retrieval context factors need to be considered, such as the retrieval time, place and the interactive history, mission, environment and other factors stated or implied in the retrieval process. At present, the context research has become the information behavior, information search process and the research hotspot in the field of information retrieval interaction.展开更多
Mining the content from an information database provides challenging solutions to the industry experts and researchers, due to the overcrowded information in huge data. In web searching, the information retrieved is n...Mining the content from an information database provides challenging solutions to the industry experts and researchers, due to the overcrowded information in huge data. In web searching, the information retrieved is not an appropriate, because it gives ambiguous information for the user query, and the user cannot get relevant information within the stipulated time. To overcome these issues, we propose a new methodology for information retrieval EPCRR by providing the top most exact information to the user, by using the collaborative clustered automated filter which makes use of the collaborative data set and filter works on the prediction by providing the highest ranking for the exact data retrieved. The retrieval works on the basis of recommendation of data which consists of relevant data set with highest priority from the cluster of data which is on high usage. In this work, we make use of the automated wrapper which works similar to the meta crawler functionality and it obtains the content in the semantic usage data format. Obtained information from the user to the agent will be ranked based on the Enabled Pile clustered data with respect to the metadata information from the agent and end-user. The information is given to the end-user with the top most ranking data within the stipulated time and the remaining top information will be moved to the data repository for future use. The data collected will remain stable based on the user preference and works on the intelligence system approach in which the user can choose any information under any instances and can be provided with suitable high range of exact content. In this approach, we find that the proposed algorithm has produced better results than existing work and it costs less online computation time.展开更多
Decision Support Systems(DSS)are man-machine interaction systems,which support the de-cision-makers to solve the unstructured and semi-structured decisions,this paper advances that thefunction of problem-oriented info...Decision Support Systems(DSS)are man-machine interaction systems,which support the de-cision-makers to solve the unstructured and semi-structured decisions,this paper advances that thefunction of problem-oriented information retrieval DSS can meet the needs of enterprise’s topmanagement effectively in comparison with other information retrieval functions,in accordancewith the features of supporting information for decision.An architecture of this system is presented,which dissolves a problem put forward or recognized by the user into the problem recognized by thecomputer,forming retrieval tactics and searching the data the user needs.Designed and developedaccording to the architecture of this system,a prototype system is introduced,which is CF Econom-ic Environment Information Retrieval DSS.展开更多
Secure and automated sharing of medical information among different medical entities/stakeholders like patients,hospitals,doctors,law enforcement agencies,health insurance companies etc.,in a standard format has alway...Secure and automated sharing of medical information among different medical entities/stakeholders like patients,hospitals,doctors,law enforcement agencies,health insurance companies etc.,in a standard format has always been a challenging problem.Current methods for ensuring compliance with medical privacy laws require specialists who are deeply familiar with these laws'complex requirements to verify the lawful exchange of medical information.This article introduces a Smart Medical Data Exchange Engine(SDEE)designed to automate the extracting of logical rules from medical privacy legislation using advanced techniques.These rules facilitate the secure extraction of information,safeguarding patient privacy and confidentiality.In addition,SMDEE can generate standardised clinical documents according to Health Level 7(HL7)standards and also standardise the nomenclature of requested medical data,enabling accurate decision-making when accessing patient data.All access requests to patient information are processed through SMDEE to ensure authorised access.The proposed system's efficacy is evaluated using the Health Insurance Portability and Accountability Act(HIPAA),a fundamental privacy law in the United States.However,SMDEE's flexibility allows its application worldwide,accommodating various medical privacy laws.Beyond facilitating global information exchange,SMDEE aims to enhance international patients'timely and appropriate treatment.展开更多
基金The National Natural Science Foundation of China under contract Nos 41330960 and 41276193 and 41206184
文摘In recent years, the rapid decline of Arctic sea ice area (SIA) and sea ice extent (SIE), especially for the multiyear (MY) ice, has led to significant effect on climate change. The accurate retrieval of MY ice concentration retrieval is very important and challenging to understand the ongoing changes. Three MY ice concentration retrieval algorithms were systematically evaluated. A similar total ice concentration was yielded by these algorithms, while the retrieved MY sea ice concentrations differs from each other. The MY SIA derived from NASA TEAM algorithm is relatively stable. Other two algorithms created seasonal fluctuations of MY SIA, particularly in autumn and winter. In this paper, we proposed an ice concentration retrieval algorithm, which developed the NASA TEAM algorithm by adding to use AMSR-E 6.9 GHz brightness temperature data and sea ice concentration using 89.0 GHz data. Comparison with the reference MY SIA from reference MY ice, indicates that the mean difference and root mean square (rms) difference of MY SIA derived from the algorithm of this study are 0.65×10^6 km^2 and 0.69×10^6 km^2 during January to March, -0.06×10^6 km^2 and 0.14×10^6 km^2during September to December respectively. Comparison with MY SIE obtained from weekly ice age data provided by University of Colorado show that, the mean difference and rms difference are 0.69×10^6 km^2 and 0.84×10^6 km^2, respectively. The developed algorithm proposed in this study has smaller difference compared with the reference MY ice and MY SIE from ice age data than the Wang's, Lomax' and NASA TEAM algorithms.
基金This research is supported by the National High Technology Development Project (863) of China (Grant No. 2002AA639500) the Natural Science Foundation of Guangdong Province (Grant No. 032212)+1 种基金 National Basic Research Program of China (973 Program) (No. 2005CB422301) Program for New Century Excellent Talents in University ( NCET-05-0591 ).
文摘Based on the atmospheric horizontal visibility data from forty-seven observational stations along the eastern coast of China near the Taiwan Strait and simultaneous NOAA/AVHRR multichannel satellite data during January 2001 to December 2002, the spectral characters associated with visibility were investigated. Successful retrieval of visibility from multichannel NOAA/AVHRR data was performed using the principal component regression (PCR) method. A sample of retrieved visibility distribution was discussed with a sea fog process. The correlation coefficient between the observed and retrieved visibility was about 0.82, which is far higher than the 99.9% confidence level by statistical test. The rate of successful retrieval is 94.98% of the 458 cases during 2001 2002. The error distribution showed that high visibilities were usually under-estimated and low visibilities were over-estimated and the relative error between the observed and retrieved visibilities was about 21.4%.
基金Supported by Special Scientific Research Fund of Meteorological Public Welfare Profession of China(GYHY201406028)Meteorological Open Research Fund for Huaihe River Basin(HRM201407)Anhui Meteorological Bureau Science and Technology Development Fund(RC201506)
文摘This paper adopts satellite channel brightness temperature simulation to study M-estimator variational retrieval. This approach combines both the advantages of classical variational inversion and robust M-estimators. Classical variational inversion depends on prior quality control to elim- inate outliers, and its errors follow a Gaussian distribution. We coupled the M-estimators to the framework of classical variational inversion to obtain a M-estimator variational inversion. The cost function contains the M-estimator to guarantee the robustness to outliers and improve the retrieval re- sults. The experimental evaluation adopts Feng Yun-3A (FY-3A) simulated data to add to the Gaussian and Non-Gaussian error. The variational in- version is used to obtain the inversion brightness temperature, and temperature and humidity data are used for validation. The preliminary results demonstrate the potential of M-estimator variational retrieval.
基金supported by the National Key R&D Program of China(2020YFB0905900).
文摘Operation control of power systems has become challenging with an increase in the scale and complexity of power distribution systems and extensive access to renewable energy.Therefore,improvement of the ability of data-driven operation management,intelligent analysis,and mining is urgently required.To investigate and explore similar regularities of the historical operating section of the power distribution system and assist the power grid in obtaining high-value historical operation,maintenance experience,and knowledge by rule and line,a neural information retrieval model with an attention mechanism is proposed based on graph data computing technology.Based on the processing flow of the operating data of the power distribution system,a technical framework of neural information retrieval is established.Combined with the natural graph characteristics of the power distribution system,a unified graph data structure and a data fusion method of data access,data complement,and multi-source data are constructed.Further,a graph node feature-embedding representation learning algorithm and a neural information retrieval algorithm model are constructed.The neural information retrieval algorithm model is trained and tested using the generated graph node feature representation vector set.The model is verified on the operating section of the power distribution system of a provincial grid area.The results show that the proposed method demonstrates high accuracy in the similarity matching of historical operation characteristics and effectively supports intelligent fault diagnosis and elimination in power distribution systems.
基金supported and granted by the Ministry of Science and Technology,Taiwan(MOST110-2622-E-390-001 and MOST109-2622-E-390-002-CC3).
文摘Big data analytics in business intelligence do not provide effective data retrieval methods and job scheduling that will cause execution inefficiency and low system throughput.This paper aims to enhance the capability of data retrieval and job scheduling to speed up the operation of big data analytics to overcome inefficiency and low throughput problems.First,integrating stacked sparse autoencoder and Elasticsearch indexing explored fast data searching and distributed indexing,which reduces the search scope of the database and dramatically speeds up data searching.Next,exploiting a deep neural network to predict the approximate execution time of a job gives prioritized job scheduling based on the shortest job first,which reduces the average waiting time of job execution.As a result,the proposed data retrieval approach outperforms the previous method using a deep autoencoder and Solr indexing,significantly improving the speed of data retrieval up to 53%and increasing system throughput by 53%.On the other hand,the proposed job scheduling algorithmdefeats both first-in-first-out andmemory-sensitive heterogeneous early finish time scheduling algorithms,effectively shortening the average waiting time up to 5%and average weighted turnaround time by 19%,respectively.
基金the National Social Science Foundation of China(Nos.15CTQ028 and 14@ZH036)the Social Science Foundation of Beijing(No.15SHA002)the Young Faculty Research Fund of Beijing Foreign Studies University(No.2015JT008)
文摘In view of the study of finance and economics information, we research on the real-time financial news posted on the authority sites in the world's major advanced economies. Analyzing the massive financial news of different information sources and language origins, we come up with a basic theory model and its algorithm on financial news, which is capable of intelligent collection, quick access, deduplication, correction and integration with financial news' backgrounds. Furthermore, we can find out connections between financial news and readers' interest. So we can achieve a real-time and on-demand financial news feed, as well as provide a theoretical basis and verification of the scientific problems on real-time processing of massive information. Finally, the simulation experiment shows that the multilingual financial news matching technology can give more help to distinguish the similar financial news in different languages than the traditional method.
基金the phased research result of the Supreme People’s Procuratorate’s procuratorial theory research program“Research on the Governance Problems of the Crime of Aiding Information Network Criminal Activities”(Project Approval Number GJ2023D28)。
文摘With the development of information technology,the online retrieval of remote electronic data has become an important method for investigative agencies to collect evidence.In the current normative documents,the online retrieval of electronic data is positioned as a new type of arbitrary investigative measure.However,study of its actual operation has found that the online retrieval of electronic data does not fully comply with the characteristics of arbitrary investigative measures.The root cause is its inaccurately defined nature due to analogy errors,an emphasis on the authenticity of electronic data at the cost of rights protection,insufficient effectiveness of normative documents to break through the boundaries of law,and superficial inconsistency found in the mechanical comparison with the nature of existing investigative measures causes.The nature of electronic data retrieved online should be defined according to different circumstances.The retrieval of electronic data disclosed on the Internet is an arbitrary investigative measure,and following procedural specifications should be sufficient.When investigators conceal their true identities and enter the cyberspace of the suspected crime through a registered account to extract dynamic electronic data for criminal activities,it is essentially a covert investigation in cyberspace,and they should follow the normative requirements for covert investigations.The retrieval of dynamic electronic data from private spaces is a technical investigative measure and should be implemented in accordance with the technical investigative procedures.Retrieval of remote“non-public electronic data involving privacy”is a mandatory investigative measure,and is essentially a search in the virtual space.Therefore,procedural specifications should be set in accordance with the standards of searching.
文摘This paper focuses on developing a system that allows presentation authors to effectively retrieve presentation slides for reuse from a large volume of existing presentation materials. We assume episodic memories of the authors can be used as contextual keywords in query expressions to efficiently dig out the expected slides for reuse rather than using only the part-of-slide-descriptions-based keyword queries. As a system, a new slide repository is proposed, composed of slide material collections, slide content data and pieces of information from authors' episodic memories related to each slide and presentation together with a slide retrieval application enabling authors to use the episodic memories as part of queries. The result of our experiment shows that the episodic memory-used queries can give more discoverability than the keyword-based queries. Additionally, an improvement model is discussed on the slide retrieval for further slide-finding efficiency by expanding the episodic memories model in the repository taking in the links with the author-and-slide-related data and events having been post on the private and social media sites.
基金supported by the National Key R&D Plan of China(2016YFB1001501)
文摘We develop a data driven method(probability model) to construct a composite shape descriptor by combining a pair of scale-based shape descriptors. The selection of a pair of scale-based shape descriptors is modeled as the computation of the union of two events, i.e.,retrieving similar shapes by using a single scale-based shape descriptor. The pair of scale-based shape descriptors with the highest probability forms the composite shape descriptor. Given a shape database, the composite shape descriptors for the shapes constitute a planar point set.A VoR-Tree of the planar point set is then used as an indexing structure for efficient query operation. Experiments and comparisons show the effectiveness and efficiency of the proposed composite shape descriptor.
基金Supported by the Natural Science Foundation ofLiaoning Province (20042020)
文摘A kind of single linked lists named aggregative chain is introduced to the algorithm, thus improving the architecture of FP tree. The new FP tree is a one-way tree and only the pointers that point its parent at each node are kept. Route information of different nodes in a same item are compressed into aggregative chains so that the frequent patterns will be produced in aggregative chains without generating node links and conditional pattern bases. An example of Web key words retrieval is given to analyze and verify the frequent pattern algorithm in this paper.
基金This work was supported by the National Natural Science Foundation of China (Grant No. 90411006)by the Shanghai Science and Technology Association (Grant No. 02DJ14032).
文摘It is well known that retrieval of parameters is usually ill-posed and highly nonlinear, so parameter retrieval problems are very difficult. There are still many important theoretical issues under research, although great success has been achieved in data assimilation in meteorology and oceanography. This paper reviews the recent research on parameter retrieval, especially that of the authors. First, some concepts and issues of parameter retrieval are introduced and the state-of-the-art parameter retrieval technology in meteorology and oceanography is reviewed briefly, and then atmospheric and oceanic parameters are retrieved using the variational data assimilation method combined with the regularization techniques in four examples: retrieval of the vertical eddy diffusion coefficient; of the turbulivity of the atmospheric boundary layer; of wind from Doppler radar data, and of the physical process parameters. Model parameter retrieval with global and local observations is also introduced.
基金supported in part by NSFC under Grant No.61172090National Science and Technology Major Project under Grant 2012ZX03002001+3 种基金Research Fund for the Doctoral Program of Higher Education of China under Grant No.20120201110013Scientific and Technological Project in Shaanxi Province under Grant(No.2012K06-30,No.2014JQ8322)Basic Science Research Fund in Xi'an Jiaotong University(No.XJJ2014049,No.XKJC2014008)Shaanxi Science and Technology Innovation Project(2013SZS16-Z01/P01/K01)
文摘With the increasing popularity of cloud computing,privacy has become one of the key problem in cloud security.When data is outsourced to the cloud,for data owners,they need to ensure the security of their privacy;for cloud service providers,they need some information of the data to provide high QoS services;and for authorized users,they need to access to the true value of data.The existing privacy-preserving methods can't meet all the needs of the three parties at the same time.To address this issue,we propose a retrievable data perturbation method and use it in the privacy-preserving in data outsourcing in cloud computing.Our scheme comes in four steps.Firstly,an improved random generator is proposed to generate an accurate"noise".Next,a perturbation algorithm is introduced to add noise to the original data.By doing this,the privacy information is hidden,but the mean and covariance of data which the service providers may need remain unchanged.Then,a retrieval algorithm is proposed to get the original data back from the perturbed data.Finally,we combine the retrievable perturbation with the access control process to ensure only the authorized users can retrieve the original data.The experiments show that our scheme perturbs date correctly,efficiently,and securely.
基金National Natural Science Foundation of China (40975067)973 Program (2009CB421500)+1 种基金CMA Grant GYHY200806029NASA grant NNX07AM97G in U.S.A
文摘In this paper we investigate the impact of the Atmospheric Infra-Red Sounder (AIRS) temperature retrievals on data assimilation and the resulting forecasts using the four-dimensional Local Ensemble Transform Kalman Filter (LETKF) data assimilation scheme and a reduced resolution version of the NCEP Global Forecast System (GFS).Our results indicate that the AIRS temperature retrievals have a significant and consistent positive impact in the Southern Hemispheric extratropics on both analyses and forecasts,which is found not only in the temperature field but also in other variables.In tropics and the Northern Hemispheric extratropics these impacts are smaller,but are still generally positive or neutral.
基金This work is financially supported by the Ministry of Earth Science(MoES),Government of India,(Grant.No.MoES/36/OOIS/Extra/45/2015),URL:https://www.moes.gov.in。
文摘The drastic growth of coastal observation sensors results in copious data that provide weather information.The intricacies in sensor-generated big data are heterogeneity and interpretation,driving high-end Information Retrieval(IR)systems.The Semantic Web(SW)can solve this issue by integrating data into a single platform for information exchange and knowledge retrieval.This paper focuses on exploiting the SWbase systemto provide interoperability through ontologies by combining the data concepts with ontology classes.This paper presents a 4-phase weather data model:data processing,ontology creation,SW processing,and query engine.The developed Oceanographic Weather Ontology helps to enhance data analysis,discovery,IR,and decision making.In addition to that,it also evaluates the developed ontology with other state-of-the-art ontologies.The proposed ontology’s quality has improved by 39.28%in terms of completeness,and structural complexity has decreased by 45.29%,11%and 37.7%in Precision and Accuracy.Indian Meteorological Satellite INSAT-3D’s ocean data is a typical example of testing the proposed model.The experimental result shows the effectiveness of the proposed data model and its advantages in machine understanding and IR.
文摘A simple fast method is given for sequentially retrieving all the records in a B tree. A file structure for database is proposed. The records in its primary data file are sorted according to the key order. A B tree is used as its dense index. It is easy to insert, delete or search a record, and it is also convenient to retrieve records in the sequential order of the keys. The merits and efficiencies of these methods or structures are discussed in detail.
文摘In this paper, we present machine learning algorithms and systems for similar video retrieval. Here, the query is itself a video. For the similarity measurement, exemplars, or representative frames in each video, are extracted by unsupervised learning. For this learning, we chose the order-aware competitive learning. After obtaining a set of exemplars for each video, the similarity is computed. Because the numbers and positions of the exemplars are different in each video, we use a similarity computing method called M-distance, which generalizes existing global and local alignment methods using followers to the exemplars. To represent each frame in the video, this paper emphasizes the Frame Signature of the ISO/IEC standard so that the total system, along with its graphical user interface, becomes practical. Experiments on the detection of inserted plagiaristic scenes showed excellent precision-recall curves, with precision values very close to 1. Thus, the proposed system can work as a plagiarism detector for videos. In addition, this method can be regarded as the structuring of unstructured data via numerical labeling by exemplars. Finally, further sophistication of this labeling is discussed.
文摘In this paper, we conduct research on the multimedia information retrieval algorithm based on the information restructuring and image reconstruction. With the massive growth of information resources, people through various retrieval tools for too much information, led directly to information overload. In vector space model and probability retrieval model based on information retrieval tools rarely consider the user' s personalized information needs and features, has resulted in a large amount of information retrieval result and correlation information the user' s information demand is not big. In order to improve the existing retrieval system, in recent years, scholars to study looked that context information retrieval context factors need to be considered, such as the retrieval time, place and the interactive history, mission, environment and other factors stated or implied in the retrieval process. At present, the context research has become the information behavior, information search process and the research hotspot in the field of information retrieval interaction.
文摘Mining the content from an information database provides challenging solutions to the industry experts and researchers, due to the overcrowded information in huge data. In web searching, the information retrieved is not an appropriate, because it gives ambiguous information for the user query, and the user cannot get relevant information within the stipulated time. To overcome these issues, we propose a new methodology for information retrieval EPCRR by providing the top most exact information to the user, by using the collaborative clustered automated filter which makes use of the collaborative data set and filter works on the prediction by providing the highest ranking for the exact data retrieved. The retrieval works on the basis of recommendation of data which consists of relevant data set with highest priority from the cluster of data which is on high usage. In this work, we make use of the automated wrapper which works similar to the meta crawler functionality and it obtains the content in the semantic usage data format. Obtained information from the user to the agent will be ranked based on the Enabled Pile clustered data with respect to the metadata information from the agent and end-user. The information is given to the end-user with the top most ranking data within the stipulated time and the remaining top information will be moved to the data repository for future use. The data collected will remain stable based on the user preference and works on the intelligence system approach in which the user can choose any information under any instances and can be provided with suitable high range of exact content. In this approach, we find that the proposed algorithm has produced better results than existing work and it costs less online computation time.
文摘Decision Support Systems(DSS)are man-machine interaction systems,which support the de-cision-makers to solve the unstructured and semi-structured decisions,this paper advances that thefunction of problem-oriented information retrieval DSS can meet the needs of enterprise’s topmanagement effectively in comparison with other information retrieval functions,in accordancewith the features of supporting information for decision.An architecture of this system is presented,which dissolves a problem put forward or recognized by the user into the problem recognized by thecomputer,forming retrieval tactics and searching the data the user needs.Designed and developedaccording to the architecture of this system,a prototype system is introduced,which is CF Econom-ic Environment Information Retrieval DSS.
基金fully supported by the University of Vaasa and VTT Technical Research Centre of Finland.
文摘Secure and automated sharing of medical information among different medical entities/stakeholders like patients,hospitals,doctors,law enforcement agencies,health insurance companies etc.,in a standard format has always been a challenging problem.Current methods for ensuring compliance with medical privacy laws require specialists who are deeply familiar with these laws'complex requirements to verify the lawful exchange of medical information.This article introduces a Smart Medical Data Exchange Engine(SDEE)designed to automate the extracting of logical rules from medical privacy legislation using advanced techniques.These rules facilitate the secure extraction of information,safeguarding patient privacy and confidentiality.In addition,SMDEE can generate standardised clinical documents according to Health Level 7(HL7)standards and also standardise the nomenclature of requested medical data,enabling accurate decision-making when accessing patient data.All access requests to patient information are processed through SMDEE to ensure authorised access.The proposed system's efficacy is evaluated using the Health Insurance Portability and Accountability Act(HIPAA),a fundamental privacy law in the United States.However,SMDEE's flexibility allows its application worldwide,accommodating various medical privacy laws.Beyond facilitating global information exchange,SMDEE aims to enhance international patients'timely and appropriate treatment.